[PATCH] D91398: [LoopVectorizer] Lower uniform loads as a single load (instead of relying on CSE)

Mon Nov 23 13:22:03 PST 2020

fhahn accepted this revision.
fhahn added a reviewer: Ayal.
fhahn added a subscriber: Ayal.
fhahn added a comment.
This revision is now accepted and ready to land.

LGTM, thanks. I also added @Ayal, who might also have some thoughts. It would probably be best to wait a few days with committing, in case there are additional comments.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:2667
+    if (!Operand || !OrigLoop->contains(Operand) ||
+        (Cost->isUniformAfterVectorization(Operand, State.VF)))
+      InputInstance.Lane = 0;
----------------
reames wrote:
> reames wrote:
> > fhahn wrote:
> > > anna wrote:
> > > > fhahn wrote:
> > > > > reames wrote:
> > > > > > fhahn wrote:
> > > > > > > The VPRecplicateRecipe contains a `IsUniform` flag. I think it should be possible to pass the flag through from the recipe to `scalarizeInstruction`. Ideally the recipes should contain all information required for code-generation to avoid having to tie code generation directly to the cost-model.
> > > > > > I think you're misreading the code slightly.  This isn't checking whether the recipe for the instruction being scalarized is uniform.  It's checking whether the *input* to the instruction is uniform.  
> > > > > > 
> > > > > > It seems like overkill to record a per operand uniform flag in the recipe?  
> > > > > > I think you're misreading the code slightly. This isn't checking whether the recipe for the instruction being scalarized is uniform. It's checking whether the *input* to the instruction is uniform.
> > > > > 
> > > > > Indeed, I was thinking about something slightly differently. The underlying point about ideally not having this code depend on the cost-model, but the information in the corresponding VPlan still applies.
> > > > > 
> > > > > > It seems like overkill to record a per operand uniform flag in the recipe?
> > > > > 
> > > > > Yes, but fortunately I don't think that is necessary, because I think all required information is already encoded in the operands in VPlan: they should be uniform VPReplicateRecipes. Making this information accessible here is currently work-in-progress. With D91500 which I just put up, you should be able to  check if `User.getOperand(op)` is a `VPReplicateRecipe` and if so, `IsUniform` also needs to be true.
> > > > > 
> > > > > I think it would make sense to take it one step further, and use this logic directly in `VPTransformState::get`: when requesting a particular lane for a uniform VPValue, we should always be able to just return lane 0 and other callers potentially could also benefit. With D91501, the changes to `scalarizeInstruction` should not be needed.
> > > > Frankly, I'm finding the code written here in review easier to read :) Mainly because my context with VPlan is limited. Do we need to block this change on two reviews landing? I'm okay either way - but we can also land this and then "clean this up" once D91500 and D91501 lands?
> > > >  
> > > > > I think it would make sense to take it one step further, and use this logic directly in VPTransformState::get: when requesting a particular lane for a uniform VPValue, we should always be able to just return lane 0 and other callers potentially could also benefit. 
> > > > @fhahn Could you pls clarify further what you mean by "other callers potentially could also benefit" ?
> > > > 
> > > > Frankly, I'm finding the code written here in review easier to read :) Mainly because my context with VPlan is limited. Do we need to block this change on two reviews landing? I'm okay either way - but we can also land this and then "clean this up" once D91500 and D91501 lands?
> > > 
> > > The linked patches are 'more' code yes, but most of the code is part of current in-progress improvements and the uniform handling is mostly a nice side benefit of VPlanization.
> > > 
> > > We don't necessarily need to block this change, I just wanted to note that this is a step backwards in terms of the general direction (quite a bit of work was spent on moving any cost-model references out of code generation). It's just a small step though and we have a clear path forward  to resolve this.
> > > 
> > > > @fhahn Could you pls clarify further what you mean by "other callers potentially could also benefit" ?
> > > 
> > > There might be other places that request individual lanes for uniform values. Those would also benefit from only getting the first lane.
> > I've gone ahead and approved your D91501.  I don't think it fully solves the problem, but I see little harm in it.
> > 
> > Your proposed approach creates a coupling between the lowering strategy chosen and the semantic property of uniformity.  To some extend that might be unavoidable, but can we do better?
> > 
> > As an example case, consider an udiv(add X, Y), Z) where all inputs are loop invariant.  If X and Y already have vector uses, we might reasonable decide that a widened vector add is the best lowering for this uniform expression.  (Not sure if we do today, but it's a reasonable choice.)  When considering the udiv, if we decide to replicate it (because we know it's uniform and expensive), we want to be able to extra lane 0 regardless of the lowering strategy we chose for the add.
> > 
> > Another approach is to directly skip the cost model and ask the legality question.  That works for the example I just gave, but *doesn't* work with the way we currently handle GEPs feeding wideable loads - which is entirely handled in costing.
> > 
> > I suspect we need to move the GEP detection out of the cost model, but I don't fully understand the broader implications of that.
> > We don't necessarily need to block this change, I just wanted to note that this is a step backwards in terms of the general direction (quite a bit of work was spent on moving any cost-model references out of code generation). It's just a small step though and we have a clear path forward  to resolve this.
> Could I get an LGTM then?
> 
> I'm happy to help evolve towards a good overall design, but it's really hard to make progress when the first patch which has a functional effect gets blocked in review.
> 
> Your proposed approach creates a coupling between the lowering strategy chosen and the semantic property of uniformity. To some extend that might be unavoidable, but can we do better?

Yes, the VPlan changes I put up just ask the same question in a different way. I think we should be able to improve the code generated for your example, but that should probably happen before codegen, e.g. by widening.

> Another approach is to directly skip the cost model and ask the legality question. That works for the example I just gave, but *doesn't* work with the way we currently handle GEPs feeding wideable loads - which is entirely handled in costing.

Asking legality here comes with the same problematic coupling which hinders modularizing transforms via VPlan I think.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91398/new/

https://reviews.llvm.org/D91398