[llvm-dev] [RFC] Introducing a byte type to LLVM
John McCall via llvm-dev
llvm-dev at lists.llvm.org
Tue Jun 22 13:52:33 PDT 2021
On 22 Jun 2021, at 15:40, Ralf Jung wrote:
>> Note that "provenance" as we use it in this discussion is an
>> *explicit
>> operational artifact* -- it exists as a concrete piece of state
>> in the
>> Abstract Machine. That is very different from something that
>> might just be
>> used internally in some kind of analysis.
>>
>> There is no problem with "resetting" that provenance on a
>> "inttoptr", and
>> basically forgetting about where the int comes from. Note that
>> this is a
>> statement about an operation in the Abstract Machine, not merely
>> a statement
>> about some analysis: this is not "forgetting" as in "safely
>> overapproximating the real set of possible behaviors", it is
>> "forgetting" by
>> *forcing* the provenance to be some kind of wildcard/default
>> provenance. All
>> analyses then have to correctly account for that.
>>
>> But it’s not a truly wildcard provenance. At the very least, it’s
>> restricted to the set of provenances that have been exposed, and
>> my understanding is that Juneyoung’s non-determinism rule attempts
>> to readmit a use-dependency analysis and turn |ptrtoint| back into
>> a simple scalar operation.
>
> So, at least in the paper Juneyoung and me and a few other people
> wrote (the one referenced a couple times in this thread already:
> https://people.mpi-sws.org/~jung/twinsem/twinsem.pdf), it is a
> wildcard provenance.
> "exposed" is not a thing in the operational semantics, it is a theorem
> we can prove.
Okay, fine, I won’t say “exposure”. :) You intend to still be
able to
prove that the reconstructed provenance cannot be that of certain
memory locations based on a comprehensive use analysis of those
locations.
>> For example, you have compellingly argued that it’s
>> problematic to
>> do the reduction |a == b ? a : b| to |b| for pointer types.
>> Suppose
>> I instead do this optimization on integers where |a =
>> ptrtoint A|.
>> The result is now simply |b|. If I |inttoptr| that result and
>> access
>> the memory, there will be no record that that access may
>> validly
>> be to |A|. It does not help that the access may be
>> represented
>> as |inttoptr (ptrtoint B)| for some |B| rather than just
>> directly
>> to |B|, because there is no use-dependence on |A|. All there
>> is
>> is an apparently unrelated and unused |ptrtoint A|.
>>
>> So that would be "ptrtoint A == ptrtoint B ? ptrtoint A :
>> ptrtoint B" being
>> replaced by "ptrtoint B"? I don't see any problem with that. Do
>> you have a
>> concrete example?
>>
>> I think you can take any example where pointer-type restrictions
>> would be necessary to protect against miscompilation and turn it
>> into an example here by just inserting |inttoptr| and |ptrtoint|
>> appropriately. Quick example:
>>
>> |int A = 0x1; int B = 0x2; long a = (long) (A+1); long b = (long) B;
>> long result = (a == b ? a : b); if (a == b) *(((int*) result) - 1) =
>> 0x4; else *((int*) result) = 0x8; printf(“%d %d\n”, A, B); |
>
> (Sorry about the bad quote, not sure what Thunderbird does here.)
> I assume you mean "&A" and "&B" in lines 3 and 4?
Yes, sorry.
>> I submit that this program has unspecified but not undefined
>> behavior,
>> with printing “1 8” and “4 2” being the only two valid
>> outcomes.
>
> Okay, I can follow that.
>
>> But I think an optimizer which changes the fifth line to
>> |long result = b;| without leaving any trace behind
>
> so then we are at
>
> int A = 0x1;
> int B = 0x2;
> long a = (long) (&A+1);
> long b = (long) &B;
> long result = b;
> if (a == b)
> *(((int*) result) - 1) = 0x4;
> else
> *((int*) result) = 0x8;
> printf(“%d %d\n”, A, B);
>
>
>> could easily
>> compile this to print “1 2” because there would be nothing to
>> prevent the initialization of |A| from being forwarded to the
>> final load.
>
> I don't think that's right, since "a" is still used in the "if", so a
> bit of information about the address is leaked and you can't assume
> the address was not guessed.
Okay, so you have a baseline expectation that pointer comparisons will
be treated as escapes. The reason I added a second comparison in my
example is just because otherwise in the `a != b` case the access is out
of bounds. So I guess you’re suggesting that it’s impossible to
construct an example that avoids that problem that wouldn’t ultimately
do something uneliminatable that would count as an escape?
> That said, things get more tricky once you want to account for C
> 'restrict' / LLVM 'noalias', and then it might be necessary to have
> explicit 'exposed'-like operations. I haven't seen a properly
> worked-out model for this, but the candidate models I saw would need
> something like this. So maybe it'd not be a bad idea to have such an
> operation anyway... I just don't think it should have any effect for
> the kind of programs we have been discussing here so far.
Honestly, part of why I like this approach is that I think the basic
idea — recognizing when we’re doing a dependency-breaking value
replacement and using a different replacement sequence — also moves us
towards a solution for `consume` dependencies.
John.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210622/e057e169/attachment.html>
More information about the llvm-dev
mailing list