<div dir="ltr"><div dir="ltr">On Wed, Jun 23, 2021 at 5:52 AM John McCall <<a href="mailto:rjmccall@apple.com">rjmccall@apple.com</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><u></u>
<div>
<div style="font-family:sans-serif"><div style="white-space:normal">
<p dir="auto">On 22 Jun 2021, at 15:40, Ralf Jung wrote:</p>
</div>
<div style="white-space:normal"><blockquote style="border-left:2px solid rgb(57,131,196);color:rgb(57,131,196);margin:0px 0px 5px;padding-left:5px"><blockquote style="border-left:2px solid rgb(124,191,12);color:rgb(124,191,12);margin:0px 0px 5px;padding-left:5px"><p dir="auto"> Note that "provenance" as we use it in this discussion is an *explicit<br>
operational artifact* -- it exists as a concrete piece of state in the<br>
Abstract Machine. That is very different from something that might just be<br>
used internally in some kind of analysis.<br>
<br>
There is no problem with "resetting" that provenance on a "inttoptr", and<br>
basically forgetting about where the int comes from. Note that this is a<br>
statement about an operation in the Abstract Machine, not merely a statement<br>
about some analysis: this is not "forgetting" as in "safely<br>
overapproximating the real set of possible behaviors", it is "forgetting" by<br>
*forcing* the provenance to be some kind of wildcard/default provenance. All<br>
analyses then have to correctly account for that.<br>
<br>
But it’s not a truly wildcard provenance. At the very least, it’s<br>
restricted to the set of provenances that have been exposed, and<br>
my understanding is that Juneyoung’s non-determinism rule attempts<br>
to readmit a use-dependency analysis and turn |ptrtoint| back into<br>
a simple scalar operation.</p>
</blockquote><p dir="auto">So, at least in the paper Juneyoung and me and a few other people wrote (the one referenced a couple times in this thread already: <a href="https://people.mpi-sws.org/~jung/twinsem/twinsem.pdf" target="_blank">https://people.mpi-sws.org/~jung/twinsem/twinsem.pdf</a>), it is a wildcard provenance.<br>
"exposed" is not a thing in the operational semantics, it is a theorem we can prove.</p>
</blockquote></div>
<div style="white-space:normal">
<p dir="auto">Okay, fine, I won’t say “exposure”. :) You intend to still be able to<br>
prove that the reconstructed provenance cannot be that of certain<br>
memory locations based on a comprehensive use analysis of those<br>
locations.</p>
</div>
<div style="white-space:normal"><blockquote style="border-left:2px solid rgb(57,131,196);color:rgb(57,131,196);margin:0px 0px 5px;padding-left:5px"><blockquote style="border-left:2px solid rgb(124,191,12);color:rgb(124,191,12);margin:0px 0px 5px;padding-left:5px"><p dir="auto"> For example, you have compellingly argued that it’s problematic to<br>
do the reduction |a == b ? a : b| to |b| for pointer types. Suppose<br>
I instead do this optimization on integers where |a = ptrtoint A|.<br>
The result is now simply |b|. If I |inttoptr| that result and access<br>
the memory, there will be no record that that access may validly<br>
be to |A|. It does not help that the access may be represented<br>
as |inttoptr (ptrtoint B)| for some |B| rather than just directly<br>
to |B|, because there is no use-dependence on |A|. All there is<br>
is an apparently unrelated and unused |ptrtoint A|.<br>
<br>
So that would be "ptrtoint A == ptrtoint B ? ptrtoint A : ptrtoint B" being<br>
replaced by "ptrtoint B"? I don't see any problem with that. Do you have a<br>
concrete example?<br>
<br>
I think you can take any example where pointer-type restrictions<br>
would be necessary to protect against miscompilation and turn it<br>
into an example here by just inserting |inttoptr| and |ptrtoint|<br>
appropriately. Quick example:<br>
<br>
|int A = 0x1; int B = 0x2; long a = (long) (A+1); long b = (long) B; long result = (a == b ? a : b); if (a == b) *(((int*) result) - 1) = 0x4; else *((int*) result) = 0x8; printf(“%d %d\n”, A, B); |</p>
</blockquote><p dir="auto">(Sorry about the bad quote, not sure what Thunderbird does here.)<br>
I assume you mean "&A" and "&B" in lines 3 and 4?</p>
</blockquote></div>
<div style="white-space:normal">
<p dir="auto">Yes, sorry.</p>
</div>
<div style="white-space:normal"><blockquote style="border-left:2px solid rgb(57,131,196);color:rgb(57,131,196);margin:0px 0px 5px;padding-left:5px"><blockquote style="border-left:2px solid rgb(124,191,12);color:rgb(124,191,12);margin:0px 0px 5px;padding-left:5px"><p dir="auto">I submit that this program has unspecified but not undefined behavior,<br>
with printing “1 8” and “4 2” being the only two valid outcomes.</p>
</blockquote><p dir="auto">Okay, I can follow that.<br>
</p>
<blockquote style="border-left:2px solid rgb(124,191,12);color:rgb(124,191,12);margin:0px 0px 5px;padding-left:5px"><p dir="auto">But I think an optimizer which changes the fifth line to<br>
|long result = b;| without leaving any trace behind</p>
</blockquote><p dir="auto">so then we are at<br>
<br>
int A = 0x1;<br>
int B = 0x2;<br>
long a = (long) (&A+1);<br>
long b = (long) &B;<br>
long result = b;<br>
if (a == b)<br>
*(((int*) result) - 1) = 0x4;<br>
else<br>
*((int*) result) = 0x8;<br>
printf(“%d %d\n”, A, B);<br>
<br>
</p>
<blockquote style="border-left:2px solid rgb(124,191,12);color:rgb(124,191,12);margin:0px 0px 5px;padding-left:5px"><p dir="auto">could easily<br>
compile this to print “1 2” because there would be nothing to<br>
prevent the initialization of |A| from being forwarded to the<br>
final load.</p>
</blockquote><p dir="auto">I don't think that's right, since "a" is still used in the "if", so a bit of information about the address is leaked and you can't assume the address was not guessed.</p>
</blockquote></div>
<div style="white-space:normal">
<p dir="auto">Okay, so you have a baseline expectation that pointer comparisons will be treated as escapes. The reason I added a second comparison in my example is just because otherwise in the <code>a != b</code> case the access is out of bounds. So I guess you’re suggesting that it’s impossible to construct an example that avoids that problem that wouldn’t ultimately do something uneliminatable that would count as an escape?</p>
</div>
<div style="white-space:normal"><blockquote style="border-left:2px solid rgb(57,131,196);color:rgb(57,131,196);margin:0px 0px 5px;padding-left:5px"><p dir="auto">That said, things get more tricky once you want to account for C 'restrict' / LLVM 'noalias', and then it might be necessary to have explicit 'exposed'-like operations. I haven't seen a properly worked-out model for this, but the candidate models I saw would need something like this. So maybe it'd not be a bad idea to have such an operation anyway... I just don't think it should have any effect for the kind of programs we have been discussing here so far.</p>
</blockquote></div>
<div style="white-space:normal">
<p dir="auto">Honestly, part of why I like this approach is that I think the basic idea — recognizing when we’re doing a dependency-breaking value replacement and using a different replacement sequence — also moves us towards a solution for <code>consume</code> dependencies.</p></div></div></div></blockquote><div>(Slightly out of topic, but) I'm interested in knowing about what the consume dependency problem issue is; do you have any suggested thread for this? </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div style="font-family:sans-serif"><div style="white-space:normal">
<p dir="auto">John.</p>
</div>
</div>
</div>
</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><br></div><font size="1">Juneyoung Lee</font><div><font size="1">Software Foundation Lab, Seoul National University</font></div></div></div></div>