[llvm-dev] [RFC] Introducing a byte type to LLVM
John McCall via llvm-dev
llvm-dev at lists.llvm.org
Tue Jun 22 11:52:28 PDT 2021
On 22 Jun 2021, at 13:21, Ralf Jung wrote:
>> Your proposal generally relies on certain optimizations not applying
>> to pointers because they mess up provenance as represented in
>> transitive use-dependencies. If those optimizations can be applied
>> to integers, you can lose use-dependencies in exactly the same way as
>> you can with pointers. Not doing the inttoptr(ptrtoint p)) -> p
>> reduction doesn’t matter at all, because in either case that value
>> has a use-dependency on p, whereas the actual problem is that there
>> is no longer a use-dependency on some other value.
>
> Note that "provenance" as we use it in this discussion is an *explicit
> operational artifact* -- it exists as a concrete piece of state in the
> Abstract Machine. That is very different from something that might
> just be used internally in some kind of analysis.
>
> There is no problem with "resetting" that provenance on a "inttoptr",
> and basically forgetting about where the int comes from. Note that
> this is a statement about an operation in the Abstract Machine, not
> merely a statement about some analysis: this is not "forgetting" as in
> "safely overapproximating the real set of possible behaviors", it is
> "forgetting" by *forcing* the provenance to be some kind of
> wildcard/default provenance. All analyses then have to correctly
> account for that.
But it’s not a truly wildcard provenance. At the very least, it’s
restricted to the set of provenances that have been exposed, and
my understanding is that Juneyoung’s non-determinism rule attempts
to readmit a use-dependency analysis and turn `ptrtoint` back into
a simple scalar operation.
>> For example, you have compellingly argued that it’s problematic to
>> do the reduction |a == b ? a : b| to |b| for pointer types. Suppose
>> I instead do this optimization on integers where |a = ptrtoint A|.
>> The result is now simply |b|. If I |inttoptr| that result and access
>> the memory, there will be no record that that access may validly
>> be to |A|. It does not help that the access may be represented
>> as |inttoptr (ptrtoint B)| for some |B| rather than just directly
>> to |B|, because there is no use-dependence on |A|. All there is
>> is an apparently unrelated and unused |ptrtoint A|.
>
> So that would be "ptrtoint A == ptrtoint B ? ptrtoint A : ptrtoint B"
> being replaced by "ptrtoint B"? I don't see any problem with that. Do
> you have a concrete example?
I think you can take any example where pointer-type restrictions
would be necessary to protect against miscompilation and turn it
into an example here by just inserting `inttoptr` and `ptrtoint`
appropriately. Quick example:
```
int A = 0x1;
int B = 0x2;
long a = (long) (A+1);
long b = (long) B;
long result = (a == b ? a : b);
if (a == b)
*(((int*) result) - 1) = 0x4;
else
*((int*) result) = 0x8;
printf(“%d %d\n”, A, B);
```
I submit that this program has unspecified but not undefined behavior,
with printing “1 8” and “4 2” being the only two valid outcomes.
But I think an optimizer which changes the fifth line to
`long result = b;` without leaving any trace behind could easily
compile this to print “1 2” because there would be nothing to
prevent the initialization of `A` from being forwarded to the
final load.
You can prevent this by noting that the provenance of `A` has been
xposed and allowing the `inttoptr` of `result` to alias `A`, but
that seems inconsistent with treating `ptrtoint` as a simple scalar
operation and allowing a use-analysis of a `ptrtoint` to restrict
which `inttoptr` casts are allowed to recreate provenance for the
`ptrtoint` operand.
If you want to keep treating `ptrtoint` as a scalar operation and
doing use-analyses on it, I think the most palatable option is to
recognize whenever you’re cutting a use-dependency and
conservatively record in IR that the original value has now been
exposed. So if you start with this:
```
%eq = icmp eq i32 %a, %b
%result = select i1 %eq, i32 %a, i32 %b
```
You have to transform it like this:
```
%result = %b
call void @llvm.expose.i32(i32 %a)
```
You should be able to remove these exposure events in a lot of
situations, but conservatively they’ll have to be treated as
escapes.
Most optimizations never cut use-dependencies on opaque values
like this and so won’t be affected.
John.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210622/e44e1d7c/attachment.html>
More information about the llvm-dev
mailing list