[PATCH] D9403: llvm.noalias - Clang CodeGen for local restrict-qualified pointers

Mon Aug 14 21:26:29 PDT 2017

hfinkel added inline comments.

================
Comment at: lib/CodeGen/CGDecl.cpp:1291
+// where the pointer to the local variable is the key in the map.
+void CodeGenFunction::EmitAutoVarNoAlias(const AutoVarEmission &emission) {
+  assert(emission.Variable && "emission was not valid!");
----------------
rjmccall wrote:
> This should really be a subroutine of EmitAutoVarAlloca; that will implicitly pick this logic up for a bunch of less standard places in IRGen that create local variables.  Please make it a static function (if possible) called EmitNoAliasScope and call it immediately after EmitVarAnnotations.
Done (I didn't make it static, which is certainly possible, but didn't look much cleaner to me - let me know if you want it static anyway).

================
Comment at: lib/CodeGen/CGStmt.cpp:537
+                                      llvm::LLVMContext::MD_noalias),
+                                      NewScopeList));
+
----------------
rjmccall wrote:
> rjmccall wrote:
> > hfinkel wrote:
> > > rjmccall wrote:
> > > > This is a very strange representation.  Every memory operation in the lexical block is annotated with a list of all of the scopes that were entered within the block, even if they were entered after the operation.  But for some reason, not with nested scopes?
> > > > 
> > > > What's the right patch for me to read about this representation?
> > > Perhaps unfortunately, this is an artifact of the way that restrict is defined in C. It applies to all accesses in the block in which the variable is declared, even those before the declaration of the restrict-qualified local itself.
> > > 
> > > It should work with nested scopes, in the sense that we add these things as we complete each scope. So we add things to the inner scope, and then when we complete the outer scope, we go back over the instructions (including those in the inner scope because the scope recording recurses up the scope hierarchy), and adds the outer scopes - it concatenates them to any added by the inner (nested) scopes.
> > > 
> > > The noalias intrinsic's LangRef updates are in D9375.
> > Ah, I see the recursion now.  Please add a comment explaining that expectation here.
> I would still like this comment.  It should be enough to explain how MemoryInsts actually gets filled: there's a callback from inserting an instruction which adds all memory instructions to every active lexical scope that's recording them.
Added.

================
Comment at: lib/CodeGen/CodeGenFunction.cpp:1986
+  if (I->mayReadOrWriteMemory())
+    recordMemoryInstruction(I);
 }
----------------
rjmccall wrote:
> This condition will include calls, which I assume is unnecessary (except maybe memory intrinsics?).  More seriously, it will also include loads and stores into allocas, which will sweep in all sorts of bookkeeping temporaries that IRGen uses in more complex code-generation situations.  You might want to consider ways to avoid recording this kind of instruction that are very likely to just get immediately optimized away by e.g. mem2reg.
> 
> Please also modify LexicalScope so that it records whether there's any active recording scope in the CodeGenFunction.  Lexical scopes are nested, so that should be as easy as saving the current state when you enter a scope and restoring it when you leave.  That will allow you to optimize this code by completely bypassing the recursion and even the check for whether the instruction is a memory instruction in the extremely common case of a function with no restrict variables.
> 
> In general, a lot of things about this approach seem to have worryingly poor asymptotic performance in the face of large functions or (especially) many restrict variables.  (You could, for example, have chosen to have each memory instruction link to its enclosing noalias scope, which would contain a link to its enclosing scope and a list of the variables it directly declares.  This would be a more complex representation to consume, but it would not require inherently quadratic frontend work building all these lists.)  The only saving grace is that we expect very little code to using restrict, so it becomes vital to ensure that your code is immediately short-circuited when nothing is using restrict.
> This condition will include calls, which I assume is unnecessary (except maybe memory intrinsics?). 

No, getting calls here was intentional. The AA implementation can specifically deal with calls (especially those that are known only to access memory given by their arguments).

> More seriously, it will also include loads and stores into allocas, which will sweep in all sorts of bookkeeping temporaries that IRGen uses in more complex code-generation situations. You might want to consider ways to avoid recording this kind of instruction that are very likely to just get immediately optimized away by e.g. mem2reg.

I agree, but this might be difficult to do in general. We should be able to avoid annotating accesses to allocas that don't escape. Do we have a sound way to determine at this stage whether a local variable has had its address taken?

> Please also modify LexicalScope so that it records whether there's any active recording scope in the CodeGenFunction. Lexical scopes are nested, so that should be as easy as saving the current state when you enter a scope and restoring it when you leave. That will allow you to optimize this code by completely bypassing the recursion and even the check for whether the instruction is a memory instruction in the extremely common case of a function with no restrict variables.

I added a variable to the scope to cache whether or not anything is being recorded. This avoids the quadratic recording-check behavior (unless some scope is actually recording something).

> In general, a lot of things about this approach seem to have worryingly poor asymptotic performance in the face of large functions or (especially) many restrict variables. (You could, for example, have chosen to have each memory instruction link to its enclosing noalias scope, which would contain a link to its enclosing scope and a list of the variables it directly declares. This would be a more complex representation to consume, but it would not require inherently quadratic frontend work building all these lists.) The only saving grace is that we expect very little code to using restrict, so it becomes vital to ensure that your code is immediately short-circuited when nothing is using restrict.

We're definitely on the same page here. The reasons I didn't worry about this too much is that, as you note, we expect this to be rare, and moreover, the underlying metadata representation being created is itself quadratic (#restricts x #accesses). The good news is that this new representation is actually less verbose than the existing noalias metadata. The bad news is that, if this turns out to be a problem, and it could, then we'll need to either add cutoffs or design some different representation (or both).

================
Comment at: lib/CodeGen/CodeGenFunction.h:597
+    // those blocks) so that we can later add the appropriate metadata. Record
+    // this instruction and so the same in any parent scopes.
+    void recordMemoryInstruction(llvm::Instruction *I) {
----------------
rjmccall wrote:
> I would suggest this wording:
> 
>   /// Record the given memory access instruction in this and all of its enclosing scopes.
>   /// When we close this scope, we'll add the list of all the restrict-qualified local variables
>   /// it declared to all the memory accesses which occurred within it.  Recording is only
>   /// enabled under optimization, and it's disabled in a scope unless it actually declares
>   /// any local restrict variables.
Sounds good (updated).

https://reviews.llvm.org/D9403