[PATCH] D91398: [LoopVectorizer] Lower uniform loads as a single load (instead of relying on CSE)

Mon Nov 16 13:31:22 PST 2020

anna added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:2667
+    if (!Operand || !OrigLoop->contains(Operand) ||
+        (Cost->isUniformAfterVectorization(Operand, State.VF)))
+      InputInstance.Lane = 0;
----------------
fhahn wrote:
> reames wrote:
> > fhahn wrote:
> > > The VPRecplicateRecipe contains a `IsUniform` flag. I think it should be possible to pass the flag through from the recipe to `scalarizeInstruction`. Ideally the recipes should contain all information required for code-generation to avoid having to tie code generation directly to the cost-model.
> > I think you're misreading the code slightly.  This isn't checking whether the recipe for the instruction being scalarized is uniform.  It's checking whether the *input* to the instruction is uniform.  
> > 
> > It seems like overkill to record a per operand uniform flag in the recipe?  
> > I think you're misreading the code slightly. This isn't checking whether the recipe for the instruction being scalarized is uniform. It's checking whether the *input* to the instruction is uniform.
> 
> Indeed, I was thinking about something slightly differently. The underlying point about ideally not having this code depend on the cost-model, but the information in the corresponding VPlan still applies.
> 
> > It seems like overkill to record a per operand uniform flag in the recipe?
> 
> Yes, but fortunately I don't think that is necessary, because I think all required information is already encoded in the operands in VPlan: they should be uniform VPReplicateRecipes. Making this information accessible here is currently work-in-progress. With D91500 which I just put up, you should be able to  check if `User.getOperand(op)` is a `VPReplicateRecipe` and if so, `IsUniform` also needs to be true.
> 
> I think it would make sense to take it one step further, and use this logic directly in `VPTransformState::get`: when requesting a particular lane for a uniform VPValue, we should always be able to just return lane 0 and other callers potentially could also benefit. With D91501, the changes to `scalarizeInstruction` should not be needed.
Frankly, I'm finding the code written here in review easier to read :) Mainly because my context with VPlan is limited. Do we need to block this change on two reviews landing? I'm okay either way - but we can also land this and then "clean this up" once D91500 and D91501 lands?

> I think it would make sense to take it one step further, and use this logic directly in VPTransformState::get: when requesting a particular lane for a uniform VPValue, we should always be able to just return lane 0 and other callers potentially could also benefit. 
@fhahn Could you pls clarify further what you mean by "other callers potentially could also benefit" ?

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:5039
   auto addToWorklistIfAllowed = [&](Instruction *I) -> void {
+    if (isOutOfScope(I)) {
+      LLVM_DEBUG(dbgs() << "LV: Found not uniform due to scope: "
----------------
Could this be turned into an assert and landed separately? I see all callers of `addToWorklistIfAllowed` either already checks for outOfScope or the instructions is already checked to be in loop.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91398/new/

https://reviews.llvm.org/D91398