[PATCH] D125485: [ArgPromotion] Unify byval promotion with non-byval

Fri Jun 24 03:15:57 PDT 2022

psamolysov marked 14 inline comments as done.
psamolysov added a comment.

@nikic Thank you very much for the review and comments, I've addressed almost all of them and asked a pair of questions about alignment check for loads in the `HandleEndUser` lambda. Could you have a look and give your answers?

================
Comment at: llvm/lib/Transforms/IPO/ArgumentPromotion.cpp:326
+                                    return Res + ArgParts.second.size();
+                                  }));
+
----------------
nikic wrote:
> I doubt this code is performance-critical enough to need this accumulate + reserve :)
I agree, removed this `accumulate` and `reserve`.

================
Comment at: llvm/lib/Transforms/IPO/ArgumentPromotion.cpp:376

-    // Otherwise, if we promoted this argument, then all users are load
-    // instructions (with possible casts and GEPs in between).
+    // Cleanup the code from the dead instructions: GEPs and BitCasts in between
+    // the original argument and its users: loads and stores. Retarget every
----------------
nikic wrote:
> This comment looks a bit out of place, as this closure doesn't remove any dead instructions itself.
Thank you. I've moved the comment just before the dead code elimination loop.

================
Comment at: llvm/lib/Transforms/IPO/ArgumentPromotion.cpp:401
+        Value *TheAlloca = GetAlloca(Ptr);
+        IRBuilder<NoFolder> LIRB(LI);
+        Instruction *NewLI = LIRB.CreateAlignedLoad(LI->getType(), TheAlloca,
----------------
nikic wrote:
> It's possible to reuse one IRBuilder instance using `IRB.SetInsertPoint()`.
I wasn't aware about this method, thanks!

================
Comment at: llvm/lib/Transforms/IPO/ArgumentPromotion.cpp:431
+    for (const auto &Pair : OffsetToAlloca) {
+      if (isAllocaPromotable(Pair.second))
+        Allocas.push_back(Pair.second);
----------------
nikic wrote:
> Can we `assert(isAllocaPromotable())` here? I think that by design, we should only produce promotable allocas.
I've added the assertion for `isAllocaPromotable`.

================
Comment at: llvm/lib/Transforms/IPO/ArgumentPromotion.cpp:443-444
+    // succeed.
+    DominatorTree DT(*NF);
+    PromoteMemToReg(Allocas, DT, &AC);
   }
----------------
aeubanks wrote:
> creating the DominatorTree should go through something like
> `function_ref<DominatorTree &(Function &F)> DTGetter`, see `AARGetter`.
> 
> but we run SROA (including mem2reg) right after argpromotion, is there a reason to run mem2reg here rather than leave it for the function simplification pipeline? then we wouldn't need to worry about AssumptionCache/DominatorTree here since they're purely used for mem2reg
About running `mem2reg` in the pass. I tried to remove the `mem2reg` and run a few tests with `opt -O3` (and adding the `noinline` attribute for the callee before), and see that the almost of the code of the callees is optimized out, so the idea just to remove the `mem2reg` call from the pass looks suitable. 

@nikic what is your opinion? Should we call the `mem2reg` inside the argument promotion pass or let the compilation pipeline call it?

P.S. I'm also planning to add the `Dead Argument Elimination` pass into the pipeline after landing this patch because `mem2reg` after new argument promotion leaves unused arguments.

================
Comment at: llvm/lib/Transforms/IPO/ArgumentPromotion.cpp:443-444
+    // succeed.
+    DominatorTree DT(*NF);
+    PromoteMemToReg(Allocas, DT, &AC);
   }
----------------
psamolysov wrote:
> aeubanks wrote:
> > creating the DominatorTree should go through something like
> > `function_ref<DominatorTree &(Function &F)> DTGetter`, see `AARGetter`.
> > 
> > but we run SROA (including mem2reg) right after argpromotion, is there a reason to run mem2reg here rather than leave it for the function simplification pipeline? then we wouldn't need to worry about AssumptionCache/DominatorTree here since they're purely used for mem2reg
> About running `mem2reg` in the pass. I tried to remove the `mem2reg` and run a few tests with `opt -O3` (and adding the `noinline` attribute for the callee before), and see that the almost of the code of the callees is optimized out, so the idea just to remove the `mem2reg` call from the pass looks suitable. 
> 
> @nikic what is your opinion? Should we call the `mem2reg` inside the argument promotion pass or let the compilation pipeline call it?
> 
> P.S. I'm also planning to add the `Dead Argument Elimination` pass into the pipeline after landing this patch because `mem2reg` after new argument promotion leaves unused arguments.
@aeubanks Thank you for the suggestion about `DTGetter`. 
 Unfortunately, I cannot get why the `DTGetter` should be used. As I get, `AARGetter` is a wrapper over the `FAM.getResult<AAManager>(F)` for the new PM and is just an instance of the `LegacyAARGetter` class for the legacy PM. It's goal is to give the correct AAR depending on the used pass manager.

In our case, the `DominatorTree` is just built again for a new function after the argument promotion and doesn't depend on the used pass manager.

================
Comment at: llvm/lib/Transforms/IPO/ArgumentPromotion.cpp:551
       LLVM_DEBUG(dbgs() << "ArgPromotion of " << *Arg << " failed: "
-                        << "loaded via both " << *Part.Ty << " and " << *Ty
+                        << (isa<LoadInst>(I) ? "loaded via" : "stored to")
+                        << " both " << *Part.Ty << " and " << *Ty
----------------
nikic wrote:
> I would replaced this with "accessed as" to cover both. Otherwise it sounds like it's either both loads or both stores with conflicting types, but it can also be a mix.
Ups... I've forgotten that `ArgParts` collected all the promotable parts for loads as well as for stores and a mix is possible.

================
Comment at: llvm/lib/Transforms/IPO/ArgumentPromotion.cpp:641
+      // Only stores TO the argument is allowed, all the other stores are
+      // unknown users
+    }
----------------
nikic wrote:
> What happens if the we're storing the argument into itself? `store ptr %arg, ptr %arg` or similar. I don't think your code handles this correct right now?
> 
> It might be best to walk over `Use`s rather than `User`s, so we can check which operand the use is on.
Good catch! This is correct: when I tried `store ptr %arg, ptr %arg`, an access violation error in the `llvm::Instruction::eraseFromParent()` function occurred because in the dead instruction elimination loop in the `doPromotion` function, the `store` instruction handles twice.

Thank you for the idea! I've rewritten the walk to use `Use`s instead of `User`s and to check that the value the `store` is the user of is not the `store`'s value operand. If so, this `store` is an "unknown user" because we can store something into the pointer but not the pointer's value into somewhere. A LIT test was also added.

I tried to add a similar check for the `load` instructions (if the `load`'s pointer operand is exactly the value the `load` is the user of) and remove the check in `HandleEndUser` that `Ptr` is equal to `Arg`. Unfortunately, this makes no sense because before the walking, there is a loop where we try to handle *every* `load` instruction in the entry BB for *every* argument and this check actually checks whether the current `load` instruction is related to the current argument. So, the check is still required.

================
Comment at: llvm/lib/Transforms/IPO/ArgumentPromotion.cpp:671
   // a correctness perspective. Profitability is less obvious though.
+  // TODO: What about store promotion?
   int64_t Offset = ArgPartsVec[0].first;
----------------
nikic wrote:
> I think it's safe to drop these TODOs -- this is not really compatible with the new promotion approach that uses separate allocas (we'd have to reconstruct the result from multiple allocas, which makes little sense).
I believe this comment was added because it wasn't clear for me whether stores can be speculative. Thank you for the answer, the situation is clear for now and I've removed my TODO comment as well as the previous one.

================
Comment at: llvm/lib/Transforms/IPO/ArgumentPromotion.cpp:635
+              HandleEndUser(SI, SI->getValueOperand()->getType(),
+                            /* GuaranteedToExecute */ true)) {
+        if (!*Res)
----------------
nikic wrote:
> psamolysov wrote:
> > I'm not aware about speculative stores. Should we handle the stores in the exactly same way as loads here, so should the parameter be set to `false`?
> I think the parameter should be false here, because the store is not guaranteed to executed, and it would be confusing otherwise.
> 
> The relevant part here is that we're not going to speculate any stores, so we don't care about the alignment at all -- we can explicitly skip the alignment code in HandleEndUser for stores to make this clear.
 Thank you for the suggestion. I've changed the argument's value to `true` and fix the alignment check in the `HandleEndUser` in order to let it work for loads only. Also, I've edited the comment before the check to make it clear that only loads are checked.

Two questions, please. Currently `Part` can be either a load or a store previously saved in the `ArgParts` map by the same offset. Should the condition that the `MustExecInstr` in the seen before `ArgPart` is a load (or at least no a store) also be added? Also, when the `Part.Alignment` field is recalculated - `Part.Alignment = std::max(Part.Alignment, I->getAlign());`, should we do this whenever `I` is a load only or in any case?

Thank you.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D125485/new/

https://reviews.llvm.org/D125485