[llvm-dev] [RFC] Introducing a byte type to LLVM

Wed Jun 23 01:37:54 PDT 2021

On Wed, Jun 23, 2021 at 5:52 AM John McCall <rjmccall at apple.com> wrote:

> On 22 Jun 2021, at 15:40, Ralf Jung wrote:
>
> Note that "provenance" as we use it in this discussion is an *explicit
> operational artifact* -- it exists as a concrete piece of state in the
> Abstract Machine. That is very different from something that might just be
> used internally in some kind of analysis.
>
> There is no problem with "resetting" that provenance on a "inttoptr", and
> basically forgetting about where the int comes from. Note that this is a
> statement about an operation in the Abstract Machine, not merely a
> statement
> about some analysis: this is not "forgetting" as in "safely
> overapproximating the real set of possible behaviors", it is "forgetting"
> by
> *forcing* the provenance to be some kind of wildcard/default provenance.
> All
> analyses then have to correctly account for that.
>
> But it’s not a truly wildcard provenance. At the very least, it’s
> restricted to the set of provenances that have been exposed, and
> my understanding is that Juneyoung’s non-determinism rule attempts
> to readmit a use-dependency analysis and turn |ptrtoint| back into
> a simple scalar operation.
>
> So, at least in the paper Juneyoung and me and a few other people wrote
> (the one referenced a couple times in this thread already:
> https://people.mpi-sws.org/~jung/twinsem/twinsem.pdf), it is a wildcard
> provenance.
> "exposed" is not a thing in the operational semantics, it is a theorem we
> can prove.
>
> Okay, fine, I won’t say “exposure”. :) You intend to still be able to
> prove that the reconstructed provenance cannot be that of certain
> memory locations based on a comprehensive use analysis of those
> locations.
>
> For example, you have compellingly argued that it’s problematic to
> do the reduction |a == b ? a : b| to |b| for pointer types. Suppose
> I instead do this optimization on integers where |a = ptrtoint A|.
> The result is now simply |b|. If I |inttoptr| that result and access
> the memory, there will be no record that that access may validly
> be to |A|. It does not help that the access may be represented
> as |inttoptr (ptrtoint B)| for some |B| rather than just directly
> to |B|, because there is no use-dependence on |A|. All there is
> is an apparently unrelated and unused |ptrtoint A|.
>
> So that would be "ptrtoint A == ptrtoint B ? ptrtoint A : ptrtoint B" being
> replaced by "ptrtoint B"? I don't see any problem with that. Do you have a
> concrete example?
>
> I think you can take any example where pointer-type restrictions
> would be necessary to protect against miscompilation and turn it
> into an example here by just inserting |inttoptr| and |ptrtoint|
> appropriately. Quick example:
>
> |int A = 0x1; int B = 0x2; long a = (long) (A+1); long b = (long) B; long
> result = (a == b ? a : b); if (a == b) *(((int*) result) - 1) = 0x4; else
> *((int*) result) = 0x8; printf(“%d %d\n”, A, B); |
>
> (Sorry about the bad quote, not sure what Thunderbird does here.)
> I assume you mean "&A" and "&B" in lines 3 and 4?
>
> Yes, sorry.
>
> I submit that this program has unspecified but not undefined behavior,
> with printing “1 8” and “4 2” being the only two valid outcomes.
>
> Okay, I can follow that.
>
> But I think an optimizer which changes the fifth line to
> |long result = b;| without leaving any trace behind
>
> so then we are at
>
> int A = 0x1;
> int B = 0x2;
> long a = (long) (&A+1);
> long b = (long) &B;
> long result = b;
> if (a == b)
> *(((int*) result) - 1) = 0x4;
> else
> *((int*) result) = 0x8;
> printf(“%d %d\n”, A, B);
>
> could easily
> compile this to print “1 2” because there would be nothing to
> prevent the initialization of |A| from being forwarded to the
> final load.
>
> I don't think that's right, since "a" is still used in the "if", so a bit
> of information about the address is leaked and you can't assume the address
> was not guessed.
>
> Okay, so you have a baseline expectation that pointer comparisons will be
> treated as escapes. The reason I added a second comparison in my example is
> just because otherwise in the a != b case the access is out of bounds. So
> I guess you’re suggesting that it’s impossible to construct an example that
> avoids that problem that wouldn’t ultimately do something uneliminatable
> that would count as an escape?
>
> That said, things get more tricky once you want to account for C
> 'restrict' / LLVM 'noalias', and then it might be necessary to have
> explicit 'exposed'-like operations. I haven't seen a properly worked-out
> model for this, but the candidate models I saw would need something like
> this. So maybe it'd not be a bad idea to have such an operation anyway... I
> just don't think it should have any effect for the kind of programs we have
> been discussing here so far.
>
> Honestly, part of why I like this approach is that I think the basic idea
> — recognizing when we’re doing a dependency-breaking value replacement and
> using a different replacement sequence — also moves us towards a solution
> for consume dependencies.
>
(Slightly out of topic, but) I'm interested in knowing about what the
consume dependency problem issue is; do you have any suggested thread for
this?

> John.
>

-- 

Juneyoung Lee
Software Foundation Lab, Seoul National University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210623/356f379f/attachment.html>