[PATCH] D27034: [AliasAnalysis] Teach BasicAA about memcpy.

Hal Finkel via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Sat Nov 26 10:11:39 PST 2016


hfinkel added inline comments.


================
Comment at: lib/Analysis/BasicAliasAnalysis.cpp:773
   }
 
+  if (auto *Inst = dyn_cast<MemCpyInst>(CS.getInstruction())) {
----------------
bryant wrote:
> hfinkel wrote:
> > bryant wrote:
> > > hfinkel wrote:
> > > > bryant wrote:
> > > > > hfinkel wrote:
> > > > > > bryant wrote:
> > > > > > > reames wrote:
> > > > > > > > This entire change should be unnecessary.  We have readonly and writeonly attributes with these semantics and the intrinsic should already be declared with them.  
> > > > > > > > 
> > > > > > > > Reading further, it looks like you're change is mainly aimed at the no alias property right?  If so, take a look at the noalias parameter attribute.  It might be we've forgotten to tag the intrinsic correctly.
> > > > > > > > This entire change should be unnecessary. We have readonly and writeonly attributes with these semantics and the intrinsic should already be declared with them.
> > > > > > > 
> > > > > > > Including write-/readonly inside memcpy's declaration with my initial example doesn't work, at least not with `-basicaa`:
> > > > > > > 
> > > > > > >     declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture writeonly, i8* nocapture readonly, i64, i32, i1) nounwind
> > > > > > > 
> > > > > > >     define void @source_clobber(i8* %a, i8* %b) {
> > > > > > >     ; CHECK-LABEL: @source_clobber(
> > > > > > >     ; CHECK-NEXT:  ; 1 = MemoryDef(liveOnEntry)
> > > > > > >     ; CHECK-NEXT:    call void @llvm.memcpy.p0i8.p0i8.i64(i8* %a, i8* %b, i64 128, i32 1, i1 false)
> > > > > > >     ; CHECK-NEXT:  ; MemoryUse(liveOnEntry)
> > > > > > >     ; CHECK-NEXT:    [[X:%.*]] = load i8, i8* %b
> > > > > > >     ; CHECK-NEXT:    ret void
> > > > > > >     ;
> > > > > > >       call void @llvm.memcpy.p0i8.p0i8.i64(i8* %a, i8* %b, i64 128, i32 1, i1 false)
> > > > > > >       %x = load i8, i8* %b
> > > > > > >       ret void
> > > > > > >     }
> > > > > > > 
> > > > > > >     09:57:21 ~> opt -disable-output -basicaa -aa-eval -print-all-alias-modref-info memcpy-test.ll
> > > > > > >     Function: source_clobber: 2 pointers, 1 call sites
> > > > > > >       MayAlias:     i8* %a, i8* %b
> > > > > > >       Both ModRef:  Ptr: i8* %a     <->  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %a, i8* %b, i64 128, i32 1, i1 false)
> > > > > > >       Both ModRef:  Ptr: i8* %b     <->  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %a, i8* %b, i64 128, i32 1, i1 false)
> > > > > > > 
> > > > > > > > Reading further, it looks like you're change is mainly aimed at the no alias property right? If so, take a look at the noalias parameter attribute. It might be we've forgotten to tag the intrinsic correctly.
> > > > > > > 
> > > > > > > It's not always the case that parameters can satisfy the `noalias` attribute. Suppose, for instance, that the previous function was instead
> > > > > > > 
> > > > > > >     define void @f(i8* %a, i8* %b, i8* %c, i8* %d)
> > > > > > > 
> > > > > > > where a and b could alias c and d, respectively. Then neither a nor b fits noalias requirement. Yet, even in such a case, the LangRef constraint on memcpy should be sufficient to deduce that a no-alias b.
> > > > > > > 
> > > > > > I don't understand what your comment means. How are %c and %d accessed?
> > > > > > 
> > > > > > Just in case it is not clear, noalias's semantics are similar in this regard to C's restrict semantics. noalias means that memory accessed through the attributed pointer, or any pointers based on that pointer, are not accessed through a pointer not based on the attributed pointer. In C, memcpy certainly has restrict on its pointer arguments, and we could add noalias on ours as well.
> > > > > > I don't understand what your comment means. How are %c and %d accessed?
> > > > > 
> > > > > I took reames' original suggestion to mean that in my original example, `source_clobber`'s parameters %a and %b should have been tagged `noalias`, in which case the unpatched BasicAA indeed deduces the correct mod/ref info. 
> > > > > 
> > > > > > Just in case it is not clear, noalias's semantics are similar in this regard to C's restrict semantics. noalias means that memory accessed through the attributed pointer, or any pointers based on that pointer, are not accessed through a pointer not based on the attributed pointer.
> > > > > 
> > > > > So this implies that a parameter `i8* %a noalias` would no-alias every other pointer parameter, right? Because that was my point: Calling memcpy with %a and %b only implies that %a no-alias %b, whereas `%a noalias` implies a much stronger condition. Suppose, instead of `source_clobber`, we had the function:
> > > > > 
> > > > > ```
> > > > > define void @f(i8* %a, i8* %b, i8* %c, i8* %d) {
> > > > >   call void @llvm.memcpy.p0i8.p0i8.i64(i8* %a, i8* %b, i64 128, i32 1, i1 false)
> > > > >   call void @llvm.memcpy.p0i8.p0i8.i64(i8* %c, i8* %d, i64 128, i32 1, i1 false)
> > > > >   %x = load i8, i8* %b
> > > > > }
> > > > > ```
> > > > > 
> > > > > It's totally okay for %a to may-alias %c, and %b to may-alias %d, right? An `i8* %a noalias` would forbid this possibility.
> > > > > 
> > > > > > In C, memcpy certainly has restrict on its pointer arguments, and we could add noalias on ours as well.
> > > > > 
> > > > > Do you mean tagging the memcpy IR prototype's parameters `noalias`? I've tried that and it doesn't seem to work, either. Here's a quick patch that adds `Attribute::NoAlias` to the memcpy intrinsic:
> > > > > 
> > > > > ```diff
> > > > > diff --git a/include/llvm/IR/Intrinsics.td b/include/llvm/IR/Intrinsics.td
> > > > > index be0ae3e..4a240ae 100644
> > > > > --- a/include/llvm/IR/Intrinsics.td
> > > > > +++ b/include/llvm/IR/Intrinsics.td
> > > > > @@ -78,6 +78,10 @@ class ReadNone<int argNo> : IntrinsicProperty {
> > > > >    int ArgNo = argNo;
> > > > >  }
> > > > >  
> > > > > +class NoAlias<int argNo> : IntrinsicProperty {
> > > > > +  int ArgNo = argNo;
> > > > > +}
> > > > > +
> > > > >  def IntrNoReturn : IntrinsicProperty;
> > > > >  
> > > > >  // IntrNoduplicate - Calls to this intrinsic cannot be duplicated.
> > > > > @@ -368,7 +372,8 @@ def int_memcpy  : Intrinsic<[],
> > > > >                               [llvm_anyptr_ty, llvm_anyptr_ty, llvm_anyint_ty,
> > > > >                                llvm_i32_ty, llvm_i1_ty],
> > > > >                              [IntrArgMemOnly, NoCapture<0>, NoCapture<1>,
> > > > > -                             WriteOnly<0>, ReadOnly<1>]>;
> > > > > +                             WriteOnly<0>, ReadOnly<1>, NoAlias<0>,
> > > > > +                             NoAlias<1>]>;
> > > > >  def int_memmove : Intrinsic<[],
> > > > >                              [llvm_anyptr_ty, llvm_anyptr_ty, llvm_anyint_ty,
> > > > >                               llvm_i32_ty, llvm_i1_ty],
> > > > > diff --git a/utils/TableGen/CodeGenIntrinsics.h b/utils/TableGen/CodeGenIntrinsics.h
> > > > > index ea3ec67..239e37d 100644
> > > > > --- a/utils/TableGen/CodeGenIntrinsics.h
> > > > > +++ b/utils/TableGen/CodeGenIntrinsics.h
> > > > > @@ -108,7 +108,7 @@ struct CodeGenIntrinsic {
> > > > >    /// True if the intrinsic is marked as convergent.
> > > > >    bool isConvergent;
> > > > >  
> > > > > -  enum ArgAttribute { NoCapture, Returned, ReadOnly, WriteOnly, ReadNone };
> > > > > +  enum ArgAttribute { NoCapture, Returned, ReadOnly, WriteOnly, ReadNone, NoAlias };
> > > > >    std::vector<std::pair<unsigned, ArgAttribute>> ArgumentAttributes;
> > > > >  
> > > > >    CodeGenIntrinsic(Record *R);
> > > > > diff --git a/utils/TableGen/CodeGenTarget.cpp b/utils/TableGen/CodeGenTarget.cpp
> > > > > index 13c8c12..431611e 100644
> > > > > --- a/utils/TableGen/CodeGenTarget.cpp
> > > > > +++ b/utils/TableGen/CodeGenTarget.cpp
> > > > > @@ -610,6 +610,9 @@ CodeGenIntrinsic::CodeGenIntrinsic(Record *R) {
> > > > >      } else if (Property->isSubClassOf("ReadNone")) {
> > > > >        unsigned ArgNo = Property->getValueAsInt("ArgNo");
> > > > >        ArgumentAttributes.push_back(std::make_pair(ArgNo, ReadNone));
> > > > > +    } else if (Property->isSubClassOf("NoAlias")) {
> > > > > +      unsigned ArgNo = Property->getValueAsInt("ArgNo");
> > > > > +      ArgumentAttributes.push_back(std::make_pair(ArgNo, NoAlias));
> > > > >      } else
> > > > >        llvm_unreachable("Unknown property!");
> > > > >    }
> > > > > diff --git a/utils/TableGen/IntrinsicEmitter.cpp b/utils/TableGen/IntrinsicEmitter.cpp
> > > > > index 60cd2da..5fe75ad 100644
> > > > > --- a/utils/TableGen/IntrinsicEmitter.cpp
> > > > > +++ b/utils/TableGen/IntrinsicEmitter.cpp
> > > > > @@ -587,6 +587,12 @@ void IntrinsicEmitter::EmitAttributes(const CodeGenIntrinsicTable &Ints,
> > > > >              OS << "Attribute::ReadNone";
> > > > >              addComma = true;
> > > > >              break;
> > > > > +          case CodeGenIntrinsic::NoAlias:
> > > > > +            if (addComma)
> > > > > +              OS << ",";
> > > > > +            OS << "Attribute::NoAlias";
> > > > > +            addComma = true;
> > > > > +            break;
> > > > >            }
> > > > >  
> > > > >            ++ai;
> > > > > ```
> > > > > 
> > > > > And re-running `source_clobber` through opt:
> > > > > 
> > > > > ```
> > > > > 06:55:57 ~/3rd/llvm> opt -basicaa -aa-eval -print-all-alias-modref-info -S test.ll
> > > > > Function: source_clobber: 2 pointers, 1 call sites
> > > > >   MayAlias:     i8* %a, i8* %b
> > > > >   Both ModRef:  Ptr: i8* %a     <->  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %a, i8* %b, i64 128, i32 1, i1 false)
> > > > >   Both ModRef:  Ptr: i8* %b     <->  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %a, i8* %b, i64 128, i32 1, i1 false)
> > > > > ; ModuleID = 'test.ll'
> > > > > source_filename = "test.ll"
> > > > > 
> > > > > ; Function Attrs: argmemonly nounwind
> > > > > declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i32, i1) #0
> > > > > 
> > > > > define void @source_clobber(i8* %a, i8* %b) {
> > > > >   call void @llvm.memcpy.p0i8.p0i8.i64(i8* %a, i8* %b, i64 128, i32 1, i1 false)
> > > > >   %x = load i8, i8* %b
> > > > >   ret void
> > > > > }
> > > > > 
> > > > > attributes #0 = { argmemonly nounwind }
> > > > > ```
> > > > > 
> > > > > Notice that the intrinsic prototype now contains `noalias`, yet the `-aa-eval` results are still over-conservative.
> > > > > Because that was my point: Calling memcpy with %a and %b only implies that %a no-alias %b, whereas %a noalias implies a much stronger condition. 
> > > > 
> > > > No, it is actually a weaker condition. That's the problem. noalias does not say anything about the aliasing properties of the function arguments themselves. It only has a meaning if you know how the pointers are used within the function. So, yes, we could add noalias to memcpy. No, we'd not be able to optimize the caller of memcpy based only on that information. In the case of memcpy, we know how the memory is accessed so we can say more about the aliasing properties.
> > > > No, we'd not be able to optimize the caller of memcpy based only on that information. In the case of memcpy, we know how the memory is accessed so we can say more about the aliasing properties.
> > > 
> > > Why not? Why can't we assume that two pointers don't alias as soon as they're fed into memcpy, and that if they do, it's undefined behavior?
> > > 
> > > >No, it is actually a weaker condition. That's the problem. noalias does not say anything about the aliasing properties of the function arguments themselves. It only has a meaning if you know how the pointers are used within the function.
> > > 
> > > LangRef says that "objects accessed via pointer values based on the argument or return value are not also accessed, **during the execution of the function**, via pointer values not based on the argument or return value." "During execution of the function" tells me that regardless of how the function uses a noalias pointer argument, it's not allowed to alias any other pointer argument. Correct?
> > > 
> > > And it's UB if I pass in aliased pointers to **any** function whose arguments are noalias because "the caller shares the responsibility with the callee for ensuring that these requirements are met."
> > > Why not? Why can't we assume that two pointers don't alias as soon as they're fed into memcpy, and that if they do, it's undefined behavior?
> > 
> > For memcpy, we can. In general for functions with restrict-qualified arguments, we can't. For example, the following is a fine function:
> > 
> > void foo(int * restrict x, int * restrict y) {
> >   x[0] = y[1];
> > }
> > 
> > and can be called with foo(p, p) without any UB.
> > 
> > > LangRef says that "objects accessed via pointer values based on the argument or return value are not also accessed, during the execution of the function, via pointer values not based on the argument or return value." "During execution of the function" tells me that regardless of how the function uses a noalias pointer argument, it's not allowed to alias any other pointer argument. Correct?
> > 
> > I think the thing that you're missing is that, in general, we don't know what offsets of which pointers are actually used for accesses. Hopefully, my example above makes it clear what I mean. If not, please let me know. For memcpy, we obviously know how the relative offsets work, so we have more information than for a generic function with restrict-qualified pointer arguments.
> > 
> > 
> "Object accessed via pointer values..." presumes that there are objects pointed to by those pointer arguments. So noalias means that those objects don't overlap. If you choose some astronomically large offset that, added to `int * restrict x`, brings you into `int * restrict y`'s territory, then either 1) you're making an out-of-bounds access to a pointer or 2) the objects pointed to by x and y do in fact overlap. Either way, it's UB.
> 
> Also, as a side note, `restrict`ed pointers are allowed to alias if they're never written to: "If the object is never modified, it may be aliased and accessed through different restrict-qualified pointers..." from http://en.cppreference.com/w/c/language/restrict . I think this differs from noalias, whose non-overlap condition is qualified by "access," which to me means both reads and writes.
> 
> In any case: If memcpy is special, what should the correct fix to get the right answers out of -aa-eval, since the tagging noalias to its arguments doesn't seem to work as indicated previously?
> "Object accessed via pointer values..." presumes that there are objects pointed to by those pointer arguments. So noalias means that those objects don't overlap. If you choose some astronomically large offset that, added to int * restrict x, brings you into int * restrict y's territory, then either 1) you're making an out-of-bounds access to a pointer or 2) the objects pointed to by x and y do in fact overlap. Either way, it's UB

x and y can point into the same array so long as the elements of the array accessed through each is distinct. The array is not itself an object, in this sense, but the objects are the array elements.

> In any case: If memcpy is special, what should the correct fix to get the right answers out of -aa-eval, since the tagging noalias to its arguments doesn't seem to work as indicated previously?

I think that we need to add special logic (which is what I presume you're doing in this patch -- I'll take a look at the specifics, but I've not done so yet).


Repository:
  rL LLVM

https://reviews.llvm.org/D27034





More information about the llvm-commits mailing list