[llvm] [VPlan] Delay adding canonical IV increment and exit branches. (PR #82270)

Wed Feb 28 10:45:34 PST 2024

nikolaypanchenko wrote:

> I don't think the form without the increment or branch-on-cond should be seen as ill-formed, just more abstract. The main idea is to gradually lower the from a more abstract VPlan representation to a more concrete one that can be code-gen'd easily (like also done for replicate regions, which only get introduced later). One advantage of VPlan should be that it allows us to model more complex constructs directly , especially early on.

Can you please share a VPlan design doc which has been discussed with a community ? Perhaps I'm missing something and as right now my view of a VPlan is it has self-contained i.e. it should be possible to get all required information to make xform without accessing underlying IR (except for live-ins and live-outs) and at any point be able to generate code out of it without any extra manipulations on a VPlan.

> At the early stages, various IV increments and how exactly the loop is exited do not really add any additional info (e.g. the canonical IV may or may not control exiting the loop). IIRC when adding recipes to model the canonical IV and its increment, the reason why the increment was modeled was to simplify code-gen, similar to how #82021 will simplify code-gen. At that time, VPlan as only added for code-gen, so it made sense at that time.
> 
> Since then the infrastructure has evolved quite a bit, and relatively recently we started to use gradual lowering to introduce complexity as needed at later stages.

Ignoring cost model part, for VPlan that represents just one loop it's certainly simplifies xforms. But if we consider future work with peel/remainder (prolog/epilog/tail/cleanup) loops represented that is going to complicate xforms (unrolling, interleaving), codegen and cost model, am I wrong ? That technically can be mitigated by https://reviews.llvm.org/D148581, but if we consider SEME loops, from my perspective, that becomes complicated too. 

> IIUC the main benefit from #82021 is that it simplifies codegen, so I think it would make sense to consistently introduce those in preparation for codegen; modeling the increments for all indications early on would unnecessarily make the plans more verbose, while only being needed for codegen. Just before codegen, if the increments are modeled explicitly, then we could also lower to a generic scalar or vector phi nodes, instead of having multiple recipes to generate phis.

Eventually yes, to simplify codegen. But most importantly, for [EVL-based vectorization](https://github.com/llvm/llvm-project/pull/76172) that change helps to replace `WidenVFxUF` that is used with `WidenEVL`. That will be true for the remaining `VPWidenPointerInductionRecipe` that also better to be decomposed.
Also want to point out #82021 moves `trunc` optimization to xform so that VPlan became more explicit, which IMO makes more sense. However, if VPlan is going to allow some implicitness that's not so clear if it's good.

> It depends, if we see CanonincalIVPHIRecipe and VPWidenIntOrFpRecipe as complex recipes that take care of their increment as well then computing the cost for both is natural. If the more verbose form is only introduced just before executing, there should be no need to account for costing the increment separately.

Right, and that's my point: now all places that need to be aware of the update will have extra logic to deal with them. Also, for the `VPWidenIntOrFpRecipe` heuristics becomes a bit more complicated / unclear to estimate broadcast of VF. For the `VPWidenPointerInductionRecipe` it would also have to accommodate `onlyScalarsGenerated` case.
For these vectorized IVs, is there a plan to represent `broadcast(VF)` in the preheader ?

https://github.com/llvm/llvm-project/pull/82270