[llvm-dev] [RFC] Introducing a byte type to LLVM

Hal Finkel via llvm-dev llvm-dev at lists.llvm.org
Tue Jun 22 09:07:23 PDT 2021


On 6/22/21 05:58, Ralf Jung via llvm-dev wrote:
> Hi John,
>
>> Unfortunately, though, I this non-determinism still doesn’t allow LLVM
>> to be anywhere near as naive about pointer-to-int casts as it is today.
>
> Definitely. There are limits to how naive one can be; beyond those 
> limits, miscompilations lurk. 
> <https://www.ralfj.de/blog/2020/12/14/provenance.html> explains this 
> by showing such a miscompilation arising from three naive 
> optimizations being chained together.
>
>> The rule is intended to allow the compiler to start doing use-analysis
>> of exposures; let’s assume that this analysis doesn’t see any
>> un-analyzable uses, since of course it would need to conservatively
>> treat them as escapes. But if we can optimize uses of integers as if
>> they didn’t carry pointer data — say, in a function that takes integer
>> parameters — and then we can apply those optimized uses to integers
>> that concretely result from pointer-to-int casts — say, by inlining
>> that function into one of its callers — can’t we end up with a use
>> pattern for one or more of those pointer-to-int casts that no longer
>> reflects the fact that it’s been exposed? It seems to me that either
>> (1) we cannot do those optimizations on opaque integers or (2) we
>> need to record that we did them in a way that, if it turns out that
>> they were created by a pointer-to-int casts, forces other code to
>> treat that pointer as opaquely exposed.
>
> There is a third option: don't optimize away ptr-int-ptr roundtrips. 
> Then you can still do all the same optimizations on integers that LLVM 
> does today, completely naively -- the integer world remains "sane". 
> Only the pointer world has to be "strange".
> (You can also not do things like GVN replacement of *pointer-typed* 
> values, but for values of integer types this remains unproblematic.)


Do we have any idea how large of an effect this might be? If we disable 
GVN for all pointer-typed values? And is it really all GVN, or just 
cases where you unify the equivalence classes based on some dominating 
comparison operation? We should be careful here, perhaps, because LLVM's 
GVN does a lot of plain-old CSE, store-to-load forwarding, etc. and we 
should say specifically what would need to be disabled and in what contexts.

  -Hal


>
> I don't think it makes sense for LLVM to adopt an explicit "exposed" 
> flag in its semantics. Reasoning based on non-determinism works fine, 
> and has the advantage of keeping ptr-to-int casts a pure, 
> side-effect-free operation. This is the model we explored in 
> <https://people.mpi-sws.org/~jung/twinsem/twinsem.pdf>, and we were 
> able to show quite a few of LLVM's standard optimizations correct 
> formally. Some changes are still needed as you noted, but those 
> changes will be required anyway even if LLVM were to adopt PNVI-ae:
> - No removal of ptr-int-ptr roundtrips. 
> (https://bugs.llvm.org/show_bug.cgi?id=34548)
> - No GVN replacement of pointer-typed values. 
> (https://bugs.llvm.org/show_bug.cgi?id=35229)
>
>>     (I'm not sure whether this is a good place to introduce this, 
>> but) we
>>     actually have semantics for pointer castings tailored to LLVM (link
>>     <https://sf.snu.ac.kr/publications/llvmtwin.pdf
>>     <https://sf.snu.ac.kr/publications/llvmtwin.pdf>>).
>>     In this proposal, ptrtoint does not have an escaping side effect; 
>> ptrtoint
>>     and inttoptr are scalar operations.
>>     inttoptr simply returns a pointer which can access any object.
>>
>> Skimming your paper, I can see how this works /except/ that I don’t
>> see any way not to treat |ptrtoint| as an escape. And really I think
>> you’re already partially acknowledging that, because that’s the only
>> real sense of saying that |inttoptr(ptrtoint p)| can’t be reduced to
>> |p|. If those are really just scalar operations that don’t expose
>> |p| in ways that might be disconnected from the uses of the |inttoptr|
>> then that reduction ought to be safe.
>
> They are indeed just scalar operations, but the reduction is not safe.
> The reason is that pointer-typed variables have values of the form 
> "(addr, provenance)". There is essentially an 'invisible' component in 
> each pointer value that tracks some additional information -- the 
> "provenance" of the pointer. Casting a ptr to an int removes that 
> provenance. Casting an int to a ptr picks a "default" provenance. So 
> the overall effect of inttoptr(ptrtoint p) is to turn "(addr, 
> provenance)" into "(addr, DEFAULT_PROVENANCE)".
> Clearly that is *not* a NOP, and hence performing the reduction 
> actually changes the result of this operation. Before the reduction, 
> the resulting pointer had DEFAULT_PROVENANCE; after the reduction, it 
> maintains the original provenance of "p". This can introduce UB into 
> previously UB-free programs.
>
> Kind regards,
> Ralf
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


More information about the llvm-dev mailing list