[PATCH] D104013: [LangRef] State that the based-on relation is for pointer typed values only

Juneyoung Lee via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Sun Jun 13 19:18:59 PDT 2021


aqjune added inline comments.


================
Comment at: llvm/docs/LangRef.rst:2703
+stored in the memory by loading it as a non-pointer type loses its associated
+address range.
+
----------------
nhaehnle wrote:
> hvdijk wrote:
> > nhaehnle wrote:
> > > hvdijk wrote:
> > > > This implies that storing a pointer value in memory and loading it as a pointer type does preserve its address range. Whether it does is unclear; this idea is assumed by some existing optimisations and incompatible with other existing optimisations. In my opinion, it is not a good idea to update the documentation to clarify this either way until it is decided how LLVM should behave.
> > > > 
> > > > If you have raised this not to update the documentation directly, but only to as a starting point for discussion, could you also include what happens when an integer is stored to memory and is then loaded as a pointer value?
> > > > This implies that storing a pointer value in memory and loading it as a pointer type does preserve its address range. Whether it does is unclear; this idea is assumed by some existing optimisations and incompatible with other existing optimisations
> > > 
> > > If store of pointer followed by load of pointer *doesn't* preserve the address range, then mem2reg is likely incorrect as-is, so... :)
> > > 
> > > Which existing optimizations are incompatible with this?
> > Consider
> > ```
> > define i8* @f(i8* %p) {
> >   %buf = alloca i8*
> >   %buf.i32 = bitcast i8** %buf to i32*
> >   store i8* %p, i8** %buf
> >   %i = load i32, i32* %buf.i32
> >   store i32 %i, i32* %buf.i32
> >   %q = load i8*, i8** %buf
> >   ret i8* %q
> > }
> > ```
> > I think it is clear that this needs to return `inttoptr(ptrtoint(%p))`, SROA optimizes it to exactly that, but Early CSE then changes it to return `%p` (tested in LLVM/clang 12.0.0). That looks wrong. Let's suppose it didn't do that, let's suppose we run existing optimisations in a different order. Given in the above
> > ```
> >   %i = load i32, i32* %buf.i32
> >   store i32 %i, i32* %buf.i32
> > ```
> > Early CSE is able to remove this store by looking at only these two instructions without any other context: the store is a store of exactly the value that was just loaded. If storing a pointer value in memory and loading it as a pointer type preserves the address range, though, this is invalid: the previous store to that address was a pointer with provenance, and this store is an integer without provenance, so the store still had an effect. This optimisation is only correct if pointers in memory lose their address ranges.
> Okay, thanks. To repeat the last part, an integer load followed by storing back of the same value may not actually be a no-op. That's pretty horrible :(
Hi @hvdijk ,

> This implies that storing a pointer value in memory and loading it as a pointer type does preserve its address range. Whether it does is unclear; this idea is assumed by some existing optimisations and incompatible with other existing optimisations. In my opinion, it is not a good idea to update the documentation to clarify this either way until it is decided how LLVM should behave.
>
> If you have raised this not to update the documentation directly, but only to as a starting point for discussion, could you also include what happens when an integer is stored to memory and is then loaded as a pointer value?

A straightforward solution would be to define it as yielding `poison`. Since `poison` is more undefined than any other value, the EarlyCSE example can be explained.
Similarly, defining the reversed case (pointer stored to memory and is then loaded as an integer value) as `poison` explains an analogous transformation too.
With this poison semantics, the concerned optimizations can be supported.

But, I think the validity of the statement is orthogonal from its concrete semantics.
It is because based-on is a pointer-to-pointer relation only; it is natural to conclude that the loaded integer will lose its associated address range.
The sentence was here for clarification - readers may naturally ask what happens when a pointer was read as integer via memory punning as well.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104013/new/

https://reviews.llvm.org/D104013



More information about the llvm-commits mailing list