[PATCH] D99135: [deref] Implement initial set of inference rules for deref-at-point

Tue Mar 23 12:40:22 PDT 2021

nlopes added inline comments.

================
Comment at: llvm/lib/IR/Value.cpp:741
+  // *) An alloca is deallocated when the function returns
+  if (isa<GlobalValue>(V) || isa<Constant>(V) || isa<AllocaInst>(V))
+    return false;
----------------
reames wrote:
> nlopes wrote:
> > reames wrote:
> > > reames wrote:
> > > > nlopes wrote:
> > > > > an Alloca can be killed with the lifetime intrinsics. Storing to an alloca after lifetime_end is UB.
> > > > Right, but how does that affect dereferenceability?  The memory is still dereferenceable.  The contents may be undefined, but the access won't fault.
> > > > 
> > > > p.s. Lifetime intrinsics are very badly specified, and are inconsistent across the optimizer - including in the existing deref code.  I really don't want them to be a blocker here.  
> > > JFYI, the wording in the LangRef on semantics of a dead object (after lifetime end) was recently changed in c821ef4.  I objected to the wording at the time, though the patch appears not to have been adjusted.  This is why.  
> > > 
> > > lifetime intrinsics do not change dereferenceability.  The relevant wording from c821ef4 needs removed from the LangRef.  It is wrong.  Both with current and proposed semantics.
> > I agree that doing a store after lifetime_end intrinsic doesn't cause a crash (ATM), but it may write to another object.
> > For example:
> >   %p = alloca i8
> >   %q = alloca i8
> >   
> >   lifetime_start %p
> >   ...
> >   lifetime_end %p
> >   lifetime_start %q
> >   store i8 1, %q
> >   store i8 0, %p ; UB: %p is dead
> >   
> >   %v = load i8, %q
> > 
> > What's the value of %v? In practice, it's going to be 0 because %p & %q will end up allocated at the same address. And this is why storing after lifetime_end must be UB.
> > 
> > So now it depends on how this analysis is used. My concern is that it may be used to move a store past lifetime_end and then we have a miscompilation. Because if a pointer is dereferenceable for the size of the store and it's guaranteed to be alive throughout the function, then  we can conclude that loads & stores to that pointer can be moved freely.
> > 
> > A simple solution is to check if the particular alloca is an argument to a lifetime_end intrinsic. More complex ones require an additional argument like a BB or specific instruction, such that the query becomes: is the object still alive at this BB/instruction?
> This has turned into a discussion of the lifetime intrinsics.  I assert that is unrelated to this change (as the implementation here exactly matches the existing deref implementation for this case).  I ask that this discussion be removed from this review thread and taken elsewhere.
> 
> I'll also note that you are commingling concerns in your discussion.  Dereferenceability does NOT imply it's safe to insert a store.  That's a higher level property with additional concerns (e.g. concurrency) which need to be satisfied.
Unfortunately we need to settle on the lifetime semantics if we want to have code inferring properties about allocas here.
BTW, my concern is about movement of load/stores, not of introduction.

This `canFree` is called from `getPointerDereferenceableBytes()`, which will infer for how many bytes a pointer is dereferenceable in the whole function. Not just in a particular BB. Hence once the dereferenceable attribute is inferred, "free" movement of memory accesses becomes legal. But it's not, as we have to respect the boundaries of lifetime_start/end.
(I'm even ignoring segmented stacks to avoid extra complication).

Fundamentally, I think that if we go with a definition of dereferenceability per function, we are always going to have our hands tied. We may need to refine the abstraction and do it per BB, for example. Without knowing what every single user of `getPointerDereferenceableBytes()` does with the result of the analysis, I'm not comfortable declaring an alloca'ed object as always dereferenceable.

================
Comment at: llvm/lib/IR/Value.cpp:756
+    // on F's behalf either.
+    if (A->hasNoFreeAttr() && A->hasNoAliasAttr())
+      return false;
----------------
reames wrote:
> nlopes wrote:
> > reames wrote:
> > > nlopes wrote:
> > > > I don't understand this argument of why hasNoSync isn't needed.
> > > > All argument attributes are with respect to what the function does, not the environment. It would be impossible to infer attributes if the environment was considered.
> > > > 
> > > > My understanding is that readonly implies nofree. Can't see why nofree would have those special semantics
> > > The focus here isn't on nofree implying nosync.  It's noalias which implies nosync.  In order to for there to be a written (freed) copy of the pointer accessible in the other thread, the noalias fact must be false.  If so, that's already UB.
> > Ok, I confess I'm less familiar with the noalias attribute. Though memcpy's signature is something like:
> >   declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #1
> > 
> > The caller of memcpy may have had the src/dst pointers escaped to other threads. Even then, both arguments are marked as noalias because the C spec says the arguments cannot overlap.
> > This contradicts the global definition of noalias you are suggesting.
> I'm also fairly unfamiliar with noalias.  I'm going off existing wording, and what the optimizer appears to actually do.
> 
> I don't think it actually is contradictory.  Your concern involves a *writeable* copy of the pointer.  The wording of noalias talks about *access*.  It doesn't appear to be UB to have a writeable copy of the pointer on another thread (or even within the callee).  It does appear to be UB to *actually write to it* during the execution of the callee.  
> 
> Note that this definition does make noalias argument attributes very hard to infer.  I'll note that we only have noalias return inference which might be exactly because of that.  :)  I'm not sure.  
I see; thanks for the explanation. LLVM doesn't seem to assume that noalias pointers are distinct indeed: https://gcc.godbolt.org/z/KTfnzMMj1
So I'm good on this now :)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99135/new/

https://reviews.llvm.org/D99135