[llvm] [RFC][LV] VPlan-based cost model (PR #67647)

Fri Jun 7 09:06:30 PDT 2024

fhahn wrote:

> Not sure I follow TTI reason. `llvm::Instruction` does not have `getCost` or `execute` methods, unlike `VPRecipe`. So I do see TTI/BasicTTI as as dedicated object to compute cost of instruction(s). Unlike vectorizer's cost model, TTI was done to be caller-agonostic as much as possible, so it's caller's responsibility to estimate other context-related information. Regarding TTI's downstream customization, I'd say it first important to notice X86, ARM, RISC-V etc have their own TTI's implementation. At least for RISC-V we didn't find a reason to have our own TTI, however that is possible to have vendor-specific TTI. My points about separate object for cost model are:
> 
> * separation of concerns: VPlan represents possible vectorization, cost model estimates cost of that vectorization.
> * clear single instance to estimate the cost. Unless context does not matter, `VPRecipe::getCost` has to accept some "state", so I don't see how it's going to be beneficial from encapsulation point of view.
>   Regarding vendor-specific heuristic, as I mentioned earlier, separate object to evaluate the cost does not mean absence of a default cost model that vendor can use as a base class: BasicTTI -> RISCVTTI

With regards to TTI, recipes are defined in terms of IR instructions they generate, so defining the cost of the recipe next to the place where code is generated seems naturally (and I think this also matches the original intended design). We need some state, but that should be only limited to TTI and some analysis like type inference eventually

> Yeah, I can understand why "downstream has to deal with it" is beneficial for upstream. My biggest worry that with a variety of different hws, that might become a big concern for everyone in a long run.

I also mention this in the response below, but I think the first step doesn't mean that everything needs to fit the initial structure in the long run. Once there's a concrete use case that benefits from a different structure we can adjust, while making sure that there's a clear improvement to the current state that's testable.

> > > At the moment, computing costs is framed in terms of the IR instructions we generate (i.e. what `::execute` will generate) and target-specific info is pulled in via TTI only. In that sense, I think it makes sense to keep them defined in the same place. Do you by any chance have some examples where TTI would not be sufficient for downstream customizations?
> 
> I'm inclined to proceed with your approach outlined in #67934. However, I believe it's important to address the ongoing discussion in this thread regarding the potential separation of the cost model from VPlan.
> 
> @nikolaypanchenko's observation on TTI being caller-agnostic highlights the need for callers to estimate additional context-related information. Expanding on this, while TTI does offer target-specific information at the instruction level, the consideration for cost could extend beyond a recipe level and cover basic block and cross basic-block levels. By separating the cost model from VPlan, we can integrate heuristics at both levels, including considerations like register pressure, spills, and fills estimates. Such an approach aligns with our overall objectives by providing flexibility and granularity in our cost modeling approach. Would you agree that this separation could enhance our cost modeling capabilities?

The initial VPlan-based cost model should IMO be solely focused on evaluating the cost per recipe, as this directly maps to the generated IR for which we can query cost. One of the design goals is to model concepts explicitly in VPlan, to avoid having the look across recipes, regions or blocks to compute an accurate cost for any given recipe (e.g. modeling interleave groups explicitly). At the moment, the only concrete example I know of where that is not possible at the moment is arithmetic vector instructions that also can do a sign/zero-extend of some of the operands. 

Currently TTI tries to solve this via looking at users of a given instructions to determine if a cast is free or not. This won’t work with VPlan, as TTI cannot traverse VPlan’s def-use chains (nor should it IMO). To address this, this could be modeled by a transform that replaces zext/add recipes with a widening add recipe for targets that have that capability. 

There will be other analysis that are not interested in the cost of a single recipe, but other metrics, e.g. resource usage or register pressure when deciding on the interleave count. But those seem to be completely separate from assigning a cost per instruction and could be done completely separately.

Switching to a VPlan-based cost-model is likely to cause some disruption if we aren’t careful, so I think to start with we should focus on moving forward with targeted steps, with everything we bring up being used and tested by default. If in the future there are concrete cases where we could benefit from a different structure in terms of where cost is computed and how, we should re-evaluate once we are at this point.

https://github.com/llvm/llvm-project/pull/67647