[PATCH] D28975: [LV] Introducing VPlan to model the vectorized code and drive its transformation

Mon Feb 13 16:50:21 PST 2017

Ayal added inline comments.

================
Comment at: docs/VectorizationPlan.rst:244
+   model and the code generation phases, where the cost estimation must
+   evaluate the to-be-generated code accurately.
+
----------------
rengolin wrote:
> Ayal wrote:
> > mkuper wrote:
> > > rengolin wrote:
> > > > Aligning cost and codegen will always have to rely on heuristics, even if we codegen multiple variations, as this is modelling IR, not assembly.
> > > > 
> > > > We should respect the cost analysis and we should strive to do the best we can, yes, but we shouldn't try to estimate the generated code **accurately** over other improvements.
> > > I think "codegen" here means "the IR generated by the vectorizer", not "backend code generation". See line 123.
> > > Actually getting the cost model to accurately estimate the cost of complex IR constructs is an orthogonal problem.
> > Right, sorry for the confusion. "CodeGen" indeed means here "the IR code generated by the vectorizer".
> Right, but costs are associated with what assembly instructions will be generated by those IR instructions. So we associate zero cost to a lot of shuffles because we know they'll be merged into the load or add. But the same shuffles, in a different pattern, will demand a sequence of insert/extract instructions that will cost a lot more than zero.
> 
> This is my point, that **accuracy** is not possible at this level and that, as you say, it's an orthogonal problem to solve.
> 
> Maybe the word we're looking for is "reliably"?
Yes, "reliably" or "consistently" better describe the intent here. The "accuracy" still depends on TTI. Fix will appear in next upload.

================
Comment at: docs/VectorizationPlan.rst:313
+  interleave group of loads or stores with additional information for
+  calculating their cost and performing code-gen, as a group.
+
----------------
rengolin wrote:
> This is less clear. I understand how one VRecipe that knows about interleave access to worry about this, but I don't understand how we're going to tell the others that have no such concept to not ignore it and calculate the costs correctly.
> 
> Our current interleave cost calculations may not be enough to bar unrelated changes.
Another way to view this: each instruction that will eventually appear in the vectorized loop will be generated by a VPRecipe. This recipe also takes care of estimating its cost. It is possible for several such instructions to be generated by one common VPRecipe, as in the case of an interleave group recipe.

================
Comment at: docs/VectorizationPlan.rst:318
+  e.g., to facilitate changing the order of Instructions between original
+  and vectorized versions.
+
----------------
rengolin wrote:
> I'd advise against doing code modifications in an already vectorised block. You may be able to get an extra oomph, but legality analysis might have to be done all over again, which can significantly add costs to an already expensive process. 
> 
> Not to mention tht the complexity of the task will likely leak illegal transformations through.
The loop vectorizer already moves code effectively, when it forms interleave groups and sinks scalar operands. There may be additional opportunities during vectorization that involve code modification. These admittedly must be done carefully, but potentially impact performance significantly (by more than an extra oomph) and so ought to be represented and estimated reliably.

================
Comment at: docs/VectorizationPlan.rst:387
+:VPTransformState:
+  Stores information used for code generation, passed from the Planner to its
+  selected VPlan for execution, and used to pass additional information down
----------------
mkuper wrote:
> What kind of information?
For instances: where to generate the next instruction(s), by passing an IRBuilder. When generating scalarized predicated instructions, passing which lane to generate scalar instance for.

================
Comment at: docs/VectorizationPlan.rst:414
+
+.. image:: VPlanRecipesUML.png
+
----------------
rengolin wrote:
> What do you mean by `vectorize()`? Is that `legal+cost+transform`?
> 
> To avoid paying cost analysis and transformation costs up-front, wouldn't it make more sense to do each step for all plans as a group?
> 
> Assuming VPlans may expose different legality issues, you should:
> 
>     vec<VPlan> TODO;
>     for (auto P : Plans) {
>       if (P->isLegal())
>         TODO.push_back(P);
>     }
> 
> Then, compute the costs first, and only `vectorize()` the assumed cheapest:
> 
>     unsigned MinCost = ScalarCost;
>     VPlan Best;
>     for (auto P : TODO) {
>       unsigned Cost = P->cost();
>       if(MinCost > Cost) {
>         Best = P;
>         MinCost = Cost;
>       }
>     }
>     Best->transform();
> 
By "vectorize()" we mean "transform". The latter verb can be used instead if clearer?

Yes, we only call "vectorize()" / "transform()" on Best VPlan, after finding it according to relative costs, as suggested above.

Currently VPlans are built only after passing all "isLegal()" checks. So every VPlan is legal by construction.

================
Comment at: docs/VectorizationPlan.rst:425
+
+   a. including those that take place after Legal, which is kept intact;
+   b. including those that use the Cost Model - refactor it slightly to expose
----------------
rengolin wrote:
> mkuper wrote:
> > rengolin wrote:
> > > Couldn't different VPlans expose different legality issues? For example, for different combination of UFs and VFs?
> > I think this is why the design says that Legal would "encode constraints". I guess the LVP would have to consult Legal per-potential-VPlan to see whether it's feasible or not?
> Right, this is what I didn't understand. We can do it both ways: one legal consulting all plans, or each plan consulting legal.
> 
> I'd prefer we act on plans for everything, as it would be a cleaner concept and an easier move.
> 
> As I wrote in another comment: first we iterate through all plans and discard all illegals, then we calculate the costs, sort and pick the best.
> 
> We could even (in the far future) have multiple costs per plan (VF, UF, code-size, hazards, etc) and sort by a formula that takes all of them into account.
The current design performs all Legal checks before starting to work with VPlans, retaining the separation between correctness and performance  considerations. This could be revisited if desired.

If Legal determines that a loop cannot be vectorized at all, no VPlans are built.

If Legal determines that a loop can be vectorized but only under certain restrictions, all VPlans built will comply with all these restrictions, and offer distinct alternatives how to vectorize. Current restriction may be: VF small enough, pass runtime guards regarding potential aliasing and non-unit strides, handle cross-iteration dependencies such as reductions, 1st order recurrences, live-outs.

How to calculate the cost(s) of each VPlan and find the best one is subject to a separate, follow-up patch.

================
Comment at: docs/VectorizationPlan.rst:472
+  LoopVectorizationCostModel::VectorizationFactor VF =
+      LVP.plan(OptForSize, UserVF, MaxVF);
+  bool VectorizeLoop = (VF.Width > 1);
----------------
rengolin wrote:
> Right, so this is my "cost loop" above.
Right, LVP.plan takes care of building the relevant VPlans and running the "cost loop", returning the best VF. This "cost loop" currently still uses the existing CostModel, but is planned to transition to use VPlans in the aforementioned follow-up patch.

================
Comment at: docs/VectorizationPlan.rst:485
+  // Select the interleave count.
+  unsigned IC = CM.selectInterleaveCount(OptForSize, VF.Width, VF.Cost);
+
----------------
mkuper wrote:
> The long-term plan is to have LVP produce a VPlan that contains both UF and VF, and this is just a transitional state, right?
VPlan already addresses both UF and VF in this patch, in the sense that when instructed to do so, by calling "vectorize()", it will generate vector code based on both. Determining the best VF and the best UF is still done based on existing CostModel in this patch, by first finding the best VF and then finding the best UF. Follow-up patch is devoted to having both these decisions be based on VPlans as well.

================
Comment at: docs/VectorizationPlan.rst:515
+
+    LVP.executeBestPlan(LB);
+
----------------
rengolin wrote:
> And this is my `Best.transform()` above. Looks good.
Right, this will perform the actual transformation. :-)

================
Comment at: docs/VectorizationPlan.rst:556
+      DEBUG(printCurrentPlans("Initial VPlans", dbgs()));
+      optimizePredicatedInstructions();
+      DEBUG(printCurrentPlans("After optimize predicated instructions",dbgs()));
----------------
rengolin wrote:
> This is just analysis, right?
Right, no IR has been transformed nor constructed yet.

https://reviews.llvm.org/D28975