[PATCH] D125485: [ArgPromotion] Unify byval promotion with non-byval

Fri Jun 24 07:37:44 PDT 2022

psamolysov marked an inline comment as done.
psamolysov added inline comments.

================
Comment at: llvm/lib/Transforms/IPO/ArgumentPromotion.cpp:443-444
+    // succeed.
+    DominatorTree DT(*NF);
+    PromoteMemToReg(Allocas, DT, &AC);
   }
----------------
nikic wrote:
> psamolysov wrote:
> > nikic wrote:
> > > psamolysov wrote:
> > > > psamolysov wrote:
> > > > > aeubanks wrote:
> > > > > > creating the DominatorTree should go through something like
> > > > > > `function_ref<DominatorTree &(Function &F)> DTGetter`, see `AARGetter`.
> > > > > > 
> > > > > > but we run SROA (including mem2reg) right after argpromotion, is there a reason to run mem2reg here rather than leave it for the function simplification pipeline? then we wouldn't need to worry about AssumptionCache/DominatorTree here since they're purely used for mem2reg
> > > > > About running `mem2reg` in the pass. I tried to remove the `mem2reg` and run a few tests with `opt -O3` (and adding the `noinline` attribute for the callee before), and see that the almost of the code of the callees is optimized out, so the idea just to remove the `mem2reg` call from the pass looks suitable. 
> > > > > 
> > > > > @nikic what is your opinion? Should we call the `mem2reg` inside the argument promotion pass or let the compilation pipeline call it?
> > > > > 
> > > > > P.S. I'm also planning to add the `Dead Argument Elimination` pass into the pipeline after landing this patch because `mem2reg` after new argument promotion leaves unused arguments.
> > > > @aeubanks Thank you for the suggestion about `DTGetter`. 
> > > >  Unfortunately, I cannot get why the `DTGetter` should be used. As I get, `AARGetter` is a wrapper over the `FAM.getResult<AAManager>(F)` for the new PM and is just an instance of the `LegacyAARGetter` class for the legacy PM. It's goal is to give the correct AAR depending on the used pass manager.
> > > > 
> > > > In our case, the `DominatorTree` is just built again for a new function after the argument promotion and doesn't depend on the used pass manager.
> > > I'm okay with not doing the promotion here. We should probably add `mem2reg` to the tests though, to make sure we generate trivially promotable code.
> > I've started to remove the promotion (mem2reg) from the pass and found the following problem: the `chained.ll` test is not worked as it is designed now, so to run the `argpromotion` pass and then `mem2reg` is not the same as run the `mem2reg` in the `argpromotion` pass because the pass runs `mem2reg` after every promotion attempt while the pass manager run passes one by one.
> > 
> > If we consider the `chained.ll` case, after the `argpromotion` pass without `mem2reg` call inside the pass, the generated code is the following:
> > 
> > ```
> > define internal i32 @test(i32* %x.0.val) {
> > entry:
> >   %x.0.allc = alloca i32*, align 8
> >   store i32* %x.0.val, i32** %x.0.allc, align 8
> >   %y = load i32*, i32** %x.0.allc, align 8
> >   %z = load i32, i32* %y, align 4
> >   ret i32 %z
> > }
> > 
> > define i32 @caller() {
> > entry:
> >   %G2.val = load i32*, i32** @G2, align 8
> >   %x = call i32 @test(i32* %G2.val)
> >   ret i32 %x
> > }
> > ```
> > 
> > And the pass cannot promote the pointer `%x.0.val` again because there is a `store`:
> > 
> > ```
> > ArgPromotion of i32* %x.0.val failed: unknown user   store i32* %x.0.val, i32** %x.0.allc, align 8
> > ```
> > 
> > and the `byval` attribute is not present so stores aren't allowed.
> > 
> > If we run `mem2reg` from the pass, the `alloca` and corresponding `store`s will be promoted and the `argpromotion` pass gets a chance to do another iteration.
> > 
> > So that to run `mem2reg` from the pass makes sense.
> Ah, good point. So we do need to run promotion here.
> 
> > Unfortunately, I cannot get why the DTGetter should be used. As I get, AARGetter is a wrapper over the FAM.getResult<AAManager>(F) for the new PM and is just an instance of the LegacyAARGetter class for the legacy PM. It's goal is to give the correct AAR depending on the used pass manager.
> 
> The general motivation is to cache the DT, so the next pass using it won't have to recompute it. But I don't know what complications the legacy pass manager introduces here. I didn't see any compile-time impact from your current code (though I expect this is mainly because argument promotion triggers rarely.)
When I tried to reuse the DominatorTree from the legacy pass manager, I got problems on some tests from polly. 

It looks like my problem was in that I built the Dominator Tree for the old function and tried to reuse it to promote the allocas in the new one. If the Dominator Tree is lazy built for the new function using the `DTGetter`, everything works for the new pass manager (but, as I remember, it worked even when I tried to pass the DT for the old function, I believe it worked because we actually don't change CFG). For the legacy PM, the situation is more dramatic because the `ArgPromotion` is a `CallGraphSCCPass` pass and not a `FunctionPass` one and this code: `getAnalysis<DominatorTreeWrapperPass>(F).getDomTree()` doesn't work as it is expected: the following error occurs on the Legacy PM:

```
Unable to schedule 'Dominator Tree Construction' required by 'Promote 'by reference' arguments to scalars'
Unable to schedule pass
```

As a workaround, I just create a new `DominatorTree` instance from the `DTGetter` created from the legacy pass. The idea is stolen from the `lib/Analysis/MustExecute.cpp` file. Unfortunately, this requires to store all the created `DominatorTree` instances while the pass is running.

================
Comment at: llvm/lib/Transforms/IPO/ArgumentPromotion.cpp:635
+              HandleEndUser(SI, SI->getValueOperand()->getType(),
+                            /* GuaranteedToExecute */ true)) {
+        if (!*Res)
----------------
nikic wrote:
> psamolysov wrote:
> > nikic wrote:
> > > psamolysov wrote:
> > > > nikic wrote:
> > > > > psamolysov wrote:
> > > > > > nikic wrote:
> > > > > > > psamolysov wrote:
> > > > > > > > I'm not aware about speculative stores. Should we handle the stores in the exactly same way as loads here, so should the parameter be set to `false`?
> > > > > > > I think the parameter should be false here, because the store is not guaranteed to executed, and it would be confusing otherwise.
> > > > > > > 
> > > > > > > The relevant part here is that we're not going to speculate any stores, so we don't care about the alignment at all -- we can explicitly skip the alignment code in HandleEndUser for stores to make this clear.
> > > > > >  Thank you for the suggestion. I've changed the argument's value to `true` and fix the alignment check in the `HandleEndUser` in order to let it work for loads only. Also, I've edited the comment before the check to make it clear that only loads are checked.
> > > > > > 
> > > > > > Two questions, please. Currently `Part` can be either a load or a store previously saved in the `ArgParts` map by the same offset. Should the condition that the `MustExecInstr` in the seen before `ArgPart` is a load (or at least no a store) also be added? Also, when the `Part.Alignment` field is recalculated - `Part.Alignment = std::max(Part.Alignment, I->getAlign());`, should we do this whenever `I` is a load only or in any case?
> > > > > > 
> > > > > > Thank you.
> > > > > Hm, so I guess it's not quite true that we don't speculate stores, in the sense that we do insert a store to store the argument into the alloca. Of course, as this store gets promoted later, the actual alignment doesn't really matter. It's probably still best to produce correct alignment for it.
> > > > > 
> > > > > So I think the safest thing to do here is just not special case the store case at all -- or would that cause any regressions in tests?
> > > > No, it adds no regressions, so I've just remove the check that the instruction is a `load` and use the same check for `load` as well as `store` instructions and reformulated the comment a bit to cover both cases.
> > > Though we could also explicitly set the align of the alloca to the same value, in which case those would always match. Probably a good idea to do that anyway.
> > >> set the align of the alloca to the same value...
> > 
> > Should the value be `Pair.second.Alignment` as for the `store` into the alloca instruction on line 365?
> Yeah, that's what I had in mind.
Done.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D125485/new/

https://reviews.llvm.org/D125485