[PATCH] D91398: [LoopVectorizer] Lower uniform loads as a single load (instead of relying on CSE)

Sun Nov 22 16:38:45 PST 2020

reames added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:2667
+    if (!Operand || !OrigLoop->contains(Operand) ||
+        (Cost->isUniformAfterVectorization(Operand, State.VF)))
+      InputInstance.Lane = 0;
----------------
fhahn wrote:
> anna wrote:
> > fhahn wrote:
> > > reames wrote:
> > > > fhahn wrote:
> > > > > The VPRecplicateRecipe contains a `IsUniform` flag. I think it should be possible to pass the flag through from the recipe to `scalarizeInstruction`. Ideally the recipes should contain all information required for code-generation to avoid having to tie code generation directly to the cost-model.
> > > > I think you're misreading the code slightly.  This isn't checking whether the recipe for the instruction being scalarized is uniform.  It's checking whether the *input* to the instruction is uniform.  
> > > > 
> > > > It seems like overkill to record a per operand uniform flag in the recipe?  
> > > > I think you're misreading the code slightly. This isn't checking whether the recipe for the instruction being scalarized is uniform. It's checking whether the *input* to the instruction is uniform.
> > > 
> > > Indeed, I was thinking about something slightly differently. The underlying point about ideally not having this code depend on the cost-model, but the information in the corresponding VPlan still applies.
> > > 
> > > > It seems like overkill to record a per operand uniform flag in the recipe?
> > > 
> > > Yes, but fortunately I don't think that is necessary, because I think all required information is already encoded in the operands in VPlan: they should be uniform VPReplicateRecipes. Making this information accessible here is currently work-in-progress. With D91500 which I just put up, you should be able to  check if `User.getOperand(op)` is a `VPReplicateRecipe` and if so, `IsUniform` also needs to be true.
> > > 
> > > I think it would make sense to take it one step further, and use this logic directly in `VPTransformState::get`: when requesting a particular lane for a uniform VPValue, we should always be able to just return lane 0 and other callers potentially could also benefit. With D91501, the changes to `scalarizeInstruction` should not be needed.
> > Frankly, I'm finding the code written here in review easier to read :) Mainly because my context with VPlan is limited. Do we need to block this change on two reviews landing? I'm okay either way - but we can also land this and then "clean this up" once D91500 and D91501 lands?
> >  
> > > I think it would make sense to take it one step further, and use this logic directly in VPTransformState::get: when requesting a particular lane for a uniform VPValue, we should always be able to just return lane 0 and other callers potentially could also benefit. 
> > @fhahn Could you pls clarify further what you mean by "other callers potentially could also benefit" ?
> > 
> > Frankly, I'm finding the code written here in review easier to read :) Mainly because my context with VPlan is limited. Do we need to block this change on two reviews landing? I'm okay either way - but we can also land this and then "clean this up" once D91500 and D91501 lands?
> 
> The linked patches are 'more' code yes, but most of the code is part of current in-progress improvements and the uniform handling is mostly a nice side benefit of VPlanization.
> 
> We don't necessarily need to block this change, I just wanted to note that this is a step backwards in terms of the general direction (quite a bit of work was spent on moving any cost-model references out of code generation). It's just a small step though and we have a clear path forward  to resolve this.
> 
> > @fhahn Could you pls clarify further what you mean by "other callers potentially could also benefit" ?
> 
> There might be other places that request individual lanes for uniform values. Those would also benefit from only getting the first lane.
I've gone ahead and approved your D91501.  I don't think it fully solves the problem, but I see little harm in it.

Your proposed approach creates a coupling between the lowering strategy chosen and the semantic property of uniformity.  To some extend that might be unavoidable, but can we do better?

As an example case, consider an udiv(add X, Y), Z) where all inputs are loop invariant.  If X and Y already have vector uses, we might reasonable decide that a widened vector add is the best lowering for this uniform expression.  (Not sure if we do today, but it's a reasonable choice.)  When considering the udiv, if we decide to replicate it (because we know it's uniform and expensive), we want to be able to extra lane 0 regardless of the lowering strategy we chose for the add.

Another approach is to directly skip the cost model and ask the legality question.  That works for the example I just gave, but *doesn't* work with the way we currently handle GEPs feeding wideable loads - which is entirely handled in costing.

I suspect we need to move the GEP detection out of the cost model, but I don't fully understand the broader implications of that.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91398/new/

https://reviews.llvm.org/D91398