[llvm-dev] [RFC] Introducing a byte type to LLVM
Ralf Jung via llvm-dev
llvm-dev at lists.llvm.org
Tue Jun 22 09:25:58 PDT 2021
Hi Hal,
>>> The rule is intended to allow the compiler to start doing use-analysis
>>> of exposures; let’s assume that this analysis doesn’t see any
>>> un-analyzable uses, since of course it would need to conservatively
>>> treat them as escapes. But if we can optimize uses of integers as if
>>> they didn’t carry pointer data — say, in a function that takes integer
>>> parameters — and then we can apply those optimized uses to integers
>>> that concretely result from pointer-to-int casts — say, by inlining
>>> that function into one of its callers — can’t we end up with a use
>>> pattern for one or more of those pointer-to-int casts that no longer
>>> reflects the fact that it’s been exposed? It seems to me that either
>>> (1) we cannot do those optimizations on opaque integers or (2) we
>>> need to record that we did them in a way that, if it turns out that
>>> they were created by a pointer-to-int casts, forces other code to
>>> treat that pointer as opaquely exposed.
>>
>> There is a third option: don't optimize away ptr-int-ptr roundtrips. Then you
>> can still do all the same optimizations on integers that LLVM does today,
>> completely naively -- the integer world remains "sane". Only the pointer world
>> has to be "strange".
>> (You can also not do things like GVN replacement of *pointer-typed* values,
>> but for values of integer types this remains unproblematic.)
>
>
> Do we have any idea how large of an effect this might be? If we disable GVN for
> all pointer-typed values? And is it really all GVN, or just cases where you
> unify the equivalence classes based on some dominating comparison operation? We
> should be careful here, perhaps, because LLVM's GVN does a lot of plain-old CSE,
> store-to-load forwarding, etc. and we should say specifically what would need to
> be disabled and in what contexts.
What I mean is specifically GVN "exploiting" equality tests (icmp) of pointer
type. That doesn't work (https://bugs.llvm.org/show_bug.cgi?id=35229
"weaponizes" it to create a miscompilation, albeit in artificial code). I think
this optimization it is pretty much unsalvageable, except if you somehow know
that the two pointers have *identical* provenance -- the moment pointers have
any kind of provenance, just because icmp says they are equal does not mean we
can treat them as equivalent for GVN, since icmp cannot know if the provenance
of the two pointers is the same or not.
I assume that is what you mean by "unify the equivalence classes based on some
dominating comparison operation" -- at least that sounds about right. :)
(I am probably using some incorrect/sloppy terminology here, and I apologize for
that. My expertise is more in the area of formal language semantics than
compiler construction.)
GVN can still treat e.g. "GEPi p 4" as equivalent with another "GEPi p 4" --
pure operations with identical inputs produce identical outputs, even at pointer
type, and GEP/GEPi remain pure under this proposal. The issue is specifically
the interaction of GVN with icmp, not GVN alone.
Kind regards,
Ralf
>
> -Hal
>
>
>>
>> I don't think it makes sense for LLVM to adopt an explicit "exposed" flag in
>> its semantics. Reasoning based on non-determinism works fine, and has the
>> advantage of keeping ptr-to-int casts a pure, side-effect-free operation. This
>> is the model we explored in
>> <https://people.mpi-sws.org/~jung/twinsem/twinsem.pdf>, and we were able to
>> show quite a few of LLVM's standard optimizations correct formally. Some
>> changes are still needed as you noted, but those changes will be required
>> anyway even if LLVM were to adopt PNVI-ae:
>> - No removal of ptr-int-ptr roundtrips.
>> (https://bugs.llvm.org/show_bug.cgi?id=34548)
>> - No GVN replacement of pointer-typed values.
>> (https://bugs.llvm.org/show_bug.cgi?id=35229)
>>
>>> (I'm not sure whether this is a good place to introduce this, but) we
>>> actually have semantics for pointer castings tailored to LLVM (link
>>> <https://sf.snu.ac.kr/publications/llvmtwin.pdf
>>> <https://sf.snu.ac.kr/publications/llvmtwin.pdf>>).
>>> In this proposal, ptrtoint does not have an escaping side effect; ptrtoint
>>> and inttoptr are scalar operations.
>>> inttoptr simply returns a pointer which can access any object.
>>>
>>> Skimming your paper, I can see how this works /except/ that I don’t
>>> see any way not to treat |ptrtoint| as an escape. And really I think
>>> you’re already partially acknowledging that, because that’s the only
>>> real sense of saying that |inttoptr(ptrtoint p)| can’t be reduced to
>>> |p|. If those are really just scalar operations that don’t expose
>>> |p| in ways that might be disconnected from the uses of the |inttoptr|
>>> then that reduction ought to be safe.
>>
>> They are indeed just scalar operations, but the reduction is not safe.
>> The reason is that pointer-typed variables have values of the form "(addr,
>> provenance)". There is essentially an 'invisible' component in each pointer
>> value that tracks some additional information -- the "provenance" of the
>> pointer. Casting a ptr to an int removes that provenance. Casting an int to a
>> ptr picks a "default" provenance. So the overall effect of inttoptr(ptrtoint
>> p) is to turn "(addr, provenance)" into "(addr, DEFAULT_PROVENANCE)".
>> Clearly that is *not* a NOP, and hence performing the reduction actually
>> changes the result of this operation. Before the reduction, the resulting
>> pointer had DEFAULT_PROVENANCE; after the reduction, it maintains the original
>> provenance of "p". This can introduce UB into previously UB-free programs.
>>
>> Kind regards,
>> Ralf
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
--
Website: https://people.mpi-sws.org/~jung/
More information about the llvm-dev
mailing list