[PATCH] D145210: [Pipeline] Adjust PostOrderFunctionAttrs placement in simplification pipeline

Mon Jun 26 13:05:07 PDT 2023

nikic added inline comments.

================
Comment at: llvm/lib/Passes/PassBuilderPipelines.cpp:838
+  // functions.
+  MainCGPipeline.addPass(PostOrderFunctionAttrsPass(/*SkipNonRecursive*/ true));

----------------
cfang wrote:
> aeubanks wrote:
> > cfang wrote:
> > > nikic wrote:
> > > > cfang wrote:
> > > > > nikic wrote:
> > > > > > cfang wrote:
> > > > > > > nikic wrote:
> > > > > > > > cfang wrote:
> > > > > > > > > cfang wrote:
> > > > > > > > > > aeubanks wrote:
> > > > > > > > > > > cfang wrote:
> > > > > > > > > > > > aeubanks wrote:
> > > > > > > > > > > > > cfang wrote:
> > > > > > > > > > > > > > We found that "nofree" attribute will greatly affect the earlyCSE pass in function simplification. In our case, missing nofree essentially disabled earlyCSE which resulted in a huge performance regression (register pressure).   Other optimizations like GVN in function simplification may also be affected by function attributes.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > So is it possible that we just run this pass twice?   Thanks.
> > > > > > > > > > > > > Can you give an example where EarlyCSE depends on function attributes? That doesn't really make sense to me, function attributes should really be used by callers of the function, not the function itself.
> > > > > > > > > > > > EarlyCSE can make use of the attributes of the functions it calls. 
> > > > > > > > > > > > The pass manager should be running passes such that the attributes
> > > > > > > > > > > > of callees are known before running optimizations on the callers
> > > > > > > > > > > yes the pass manager already takes care of visiting callees before callers with the CGSCCPassManager. the only case where we need to run function-attrs before the function simplification pipeline is when you have recursive functions, which this patch handles.
> > > > > > > > > > > 
> > > > > > > > > > > so I'm still not understanding where you'd be seeing a regression
> > > > > > > > > > MainCGPipeline.addPass(PostOrderFunctionAttrsPass(/*SkipNonRecursive*/ false));
> > > > > > > > > > 
> > > > > > > > > > I switch SkipNonRecursive between false and true , and compare IR around the following earlyCSE pass in function simplification pipeline: 
> > > > > > > > > > 
> > > > > > > > > >   FPM.addPass(EarlyCSEPass(true /* Enable mem-ssa. */));
> > > > > > > > > > 
> > > > > > > > > > 1. Before this pass,  the only difference is that when SkipNonRecursive == false; there is a nofree function attribute, but I am not clear about the difference in the callee. Some of them seem losing the "convergent" attribute
> > > > > > > > > > 
> > > > > > > > > > 2. After this earlyCSE pass, we see a huge IR difference, which I don't know the reason yet. But the difference is due to  SkipNonRecursive flip. 
> > > > > > > > > > 
> > > > > > > > > > So something goes not what we think it should be. 
> > > > > > > > > >   
> > > > > > > > > addArgumentAttrs(Nodes.SCCNodes, Changed);
> > > > > > > > > 
> > > > > > > > > Looks to me this above function was not called to deduce function argument attributes. 
> > > > > > > > > For our case, missing WriteOnly/ReadOnly attributes affects memorySSA for earlyCSE, and we have observed may laods are not CSEd.
> > > > > > > > Please provide a PhaseOrdering test.
> > > > > > > Here is a simple test case where argpromotion is a pass between the first function attribute recognition and function simplification.
> > > > > > > These argument attributes will affect optimizations like earlyCSE.
> > > > > > > 
> > > > > > > ---------------------------------------- func-arg-attrs.ll -----------------------------------------
> > > > > > > 
> > > > > > > ; RUN: opt < %s -O3 -print-after=argpromotion -disable-output 2>&1 | FileCheck %s
> > > > > > > 
> > > > > > > define i32 @func_arg_attrs(ptr %p1) {
> > > > > > > ; CHECK-LABEL: define i32 @func_arg_attrs
> > > > > > > ; CHECK-SAME: (ptr nocapture readonly %p1) local_unnamed_addr #{{[0-9]+}} {
> > > > > > > ;
> > > > > > >   %v1 = load i32, ptr %p1
> > > > > > >   ret i32 %v1
> > > > > > > }
> > > > > > > 
> > > > > > A PhaseOrdering test is a minimal test that shows an **end-to-end** optimization difference for the whole `-O3` pipeline before and after the change.
> > > > > > 
> > > > > > It's obvious that this changes the inferred attributes at specific pipeline positions, we are interested in how that affects the overall optimization pipeline.
> > > > > It is apparent that the missing argument attribute due to this patch will affect many optimization in the function simplifications.
> > > > > Yes we can try to reduce the giant test case for a simple one, but it is not possible to cover all cases that will be affected. I hope you won't ask for all cases that will be affected by argument attrs.  Thanks.   
> > > > No, this is not apparent at all. The inferred attributes are, to the most part, not relevant to the optimization of the function itself, only its callers. That's why it is sufficient to infer them at the end of function simplification for non-recursive functions.
> > > > 
> > > > Clearly, this does not hold up for your particular case, and we would like to understand why that is. However, your description of the issue has not been helpful in the least, and we are unlikely to make progress here without a test case.
> > > Fair enough. Out case shows that the argument attributes will affect early-cse in the function simplification pipeline. 
> > > Here is a test case that show the early-cse difference:
> > > 
> > > ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-attributes --version 2
> > > ; RUN: opt < %s -O3 -print-before=speculative-execution -S -disable-output 2>&1 | FileCheck %s
> > > 
> > > define i32 @arg_readonly_early_cse(ptr noalias %addr.i, ptr noalias %otheraddr) {
> > > ; CHECK: Function Attrs: mustprogress nofree norecurse nounwind willreturn
> > > ; CHECK-LABEL: define i32 @arg_readonly_early_cse
> > > ; CHECK-SAME: (ptr noalias nocapture readonly [[ADDR_I:%.*]], ptr noalias nocapture readnone [[OTHERADDR:%.*]]) local_unnamed_addr #[[ATTR0:[0-9]+]] {
> > > ; CHECK-NEXT:    [[A:%.*]] = load i32, ptr [[ADDR_I]], align 4
> > > ; CHECK-NEXT:    fence acquire
> > > ; CHECK-NEXT:    ret i32 0
> > > ;
> > >   %a = load i32, ptr %addr.i, align 4
> > >   fence acquire
> > >   %a2 = load i32, ptr %addr.i, align 4
> > >   %res = sub i32 %a, %a2
> > >   ret i32 %res
> > > } 
> > ```
> > define i32 @arg_readonly_early_cse(ptr noalias %addr.i, ptr noalias %otheraddr) {
> >   %a = load i32, ptr %addr.i, align 4
> >   fence acquire
> >   %a2 = load i32, ptr %addr.i, align 4
> >   %res = sub i32 %a, %a2
> >   ret i32 %res
> > }
> > ```
> > `opt -p function-attrs,instcombine` optimizes most of it away but `opt -p instcombine` doesn't (same with `early-cse<memssa>`).
> > 
> > this boils down to [this](https://github.com/llvm/llvm-project/blob/1a0d23efe1573e1d5d547c80939ec2bdb7e2524a/llvm/lib/Analysis/BasicAliasAnalysis.cpp#L708) checking noalias + readonly on arguments.
> > 
> > we could unconditionally perform argument attribute inference in the first function-attrs pass
> > we could unconditionally perform argument attribute inference in the first function-attrs pass
> 
> This looks reasonable to me. Thanks! 
Thanks for the test case! The core issue here seems to be a discrepancy between FunctionAttrs inference capabilities and AA ModRef modelling for fences. The first two question I would ask are:

1. Is the inference performed by FunctionAttrs actually correct? Apparently the inference of the readonly/writeonly argument attributes ignores fences completely, which seems rather unlikely to be correct. AA models fences as ModRef-ing all memory, and FunctionAttrs memory attribute inference does the same.

2. If FunctionAttrs is actually correct, can AA be improved to return the same result? I would expect this to be along the lines of "if the memory location is an unescaped identified local object, the fence cannot modref the location".

If the answer to one of those questions is "yes", we should fix the corresponding pass. If the answer to both is "no", we can look into recovering this attribute inference.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D145210/new/

https://reviews.llvm.org/D145210