[llvm-dev] [RFC] Introducing a byte type to LLVM
Ralf Jung via llvm-dev
llvm-dev at lists.llvm.org
Tue Jun 22 12:40:26 PDT 2021
Hi John,
> Note that "provenance" as we use it in this discussion is an *explicit
> operational artifact* -- it exists as a concrete piece of state in the
> Abstract Machine. That is very different from something that might just be
> used internally in some kind of analysis.
>
> There is no problem with "resetting" that provenance on a "inttoptr", and
> basically forgetting about where the int comes from. Note that this is a
> statement about an operation in the Abstract Machine, not merely a statement
> about some analysis: this is not "forgetting" as in "safely
> overapproximating the real set of possible behaviors", it is "forgetting" by
> *forcing* the provenance to be some kind of wildcard/default provenance. All
> analyses then have to correctly account for that.
>
> But it’s not a truly wildcard provenance. At the very least, it’s
> restricted to the set of provenances that have been exposed, and
> my understanding is that Juneyoung’s non-determinism rule attempts
> to readmit a use-dependency analysis and turn |ptrtoint| back into
> a simple scalar operation.
So, at least in the paper Juneyoung and me and a few other people wrote (the one
referenced a couple times in this thread already:
https://people.mpi-sws.org/~jung/twinsem/twinsem.pdf), it is a wildcard provenance.
"exposed" is not a thing in the operational semantics, it is a theorem we can prove.
>
> For example, you have compellingly argued that it’s problematic to
> do the reduction |a == b ? a : b| to |b| for pointer types. Suppose
> I instead do this optimization on integers where |a = ptrtoint A|.
> The result is now simply |b|. If I |inttoptr| that result and access
> the memory, there will be no record that that access may validly
> be to |A|. It does not help that the access may be represented
> as |inttoptr (ptrtoint B)| for some |B| rather than just directly
> to |B|, because there is no use-dependence on |A|. All there is
> is an apparently unrelated and unused |ptrtoint A|.
>
> So that would be "ptrtoint A == ptrtoint B ? ptrtoint A : ptrtoint B" being
> replaced by "ptrtoint B"? I don't see any problem with that. Do you have a
> concrete example?
>
> I think you can take any example where pointer-type restrictions
> would be necessary to protect against miscompilation and turn it
> into an example here by just inserting |inttoptr| and |ptrtoint|
> appropriately. Quick example:
>
> |int A = 0x1; int B = 0x2; long a = (long) (A+1); long b = (long) B; long result
> = (a == b ? a : b); if (a == b) *(((int*) result) - 1) = 0x4; else *((int*)
> result) = 0x8; printf(“%d %d\n”, A, B); |
(Sorry about the bad quote, not sure what Thunderbird does here.)
I assume you mean "&A" and "&B" in lines 3 and 4?
> I submit that this program has unspecified but not undefined behavior,
> with printing “1 8” and “4 2” being the only two valid outcomes.
Okay, I can follow that.
> But I think an optimizer which changes the fifth line to
> |long result = b;| without leaving any trace behind
so then we are at
int A = 0x1;
int B = 0x2;
long a = (long) (&A+1);
long b = (long) &B;
long result = b;
if (a == b)
*(((int*) result) - 1) = 0x4;
else
*((int*) result) = 0x8;
printf(“%d %d\n”, A, B);
> could easily
> compile this to print “1 2” because there would be nothing to
> prevent the initialization of |A| from being forwarded to the
> final load.
I don't think that's right, since "a" is still used in the "if", so a bit of
information about the address is leaked and you can't assume the address was not
guessed. The furthest you can go is
int A = 0x1;
int B = 0x2;
long a = (long) (&A+1);
long b = (long) &B;
if (a == b)
*(((int*) (long) &B) - 1) = 0x4;
else
*((int*) (long) &B) = 0x8;
printf(“%d %d\n”, A, B);
No need for an explicit 'exposed', I think.
That said, things get more tricky once you want to account for C 'restrict' /
LLVM 'noalias', and then it might be necessary to have explicit 'exposed'-like
operations. I haven't seen a properly worked-out model for this, but the
candidate models I saw would need something like this. So maybe it'd not be a
bad idea to have such an operation anyway... I just don't think it should have
any effect for the kind of programs we have been discussing here so far.
Kind regards,
Ralf
>
> You can prevent this by noting that the provenance of |A| has been
> xposed and allowing the |inttoptr| of |result| to alias |A|, but
> that seems inconsistent with treating |ptrtoint| as a simple scalar
> operation and allowing a use-analysis of a |ptrtoint| to restrict
> which |inttoptr| casts are allowed to recreate provenance for the
> |ptrtoint| operand.
>
> If you want to keep treating |ptrtoint| as a scalar operation and
> doing use-analyses on it, I think the most palatable option is to
> recognize whenever you’re cutting a use-dependency and
> conservatively record in IR that the original value has now been
> exposed. So if you start with this:
>
> |%eq = icmp eq i32 %a, %b %result = select i1 %eq, i32 %a, i32 %b |
>
> You have to transform it like this:
>
> |%result = %b call void @llvm.expose.i32(i32 %a) |
>
> You should be able to remove these exposure events in a lot of
> situations, but conservatively they’ll have to be treated as
> escapes.
>
> Most optimizations never cut use-dependencies on opaque values
> like this and so won’t be affected.
>
> John.
>
--
Website: https://people.mpi-sws.org/~jung/
More information about the llvm-dev
mailing list