[PATCH] D28975: [LV] Introducing VPlan to model the vectorized code and drive its transformation

Fri Jan 27 04:01:46 PST 2017

rengolin added inline comments.

================
Comment at: docs/VectorizationPlan.rst:140
+OpenMP [6]_. This design to extend the Loop Vectorizer to outer loops supports
+and raises the importance of explicit vectorization beyond the current
+capabilities of Clang and LLVM. Namely, from currently forcing the
----------------
mkuper wrote:
> I don't believe anything in this design raises the importance of explicit outer loop vectorization.
> I think what you're trying to say is that if we want to support explicit outer loop vectorization (which I believe we do), the best way to do it would be using this design. Am I right?
>From the original RFC (linked in the description), the idea is that outer loops will add a lot more complexity to the vectoriser, which is already getting side-tracked by the multiple strategies. This effort is the first step in joining the strategies, which will be the sanest way forward to adding more functionality to the loop vectoriser, including outer loops and better interactions with Polly.

================
Comment at: docs/VectorizationPlan.rst:240
+   In particular, the transition can start with an NFC patch.
+   In particular, VPlan must support efficient selection of VF and/or UF.
+
----------------
I believe this is not possible, but that's ok.

What I think you mean is that we can start with a single VPlan that does **exactly** what the current cost model does, in which case, it will have no noticeable impact to the users (this is not the same as NFC, as Michael said). But just doing that is just replacing six of one by half-a-dozen of the other.

This is only worth doing if we can add more (different) VPlans, in which case, there will be a noticeable impact in higher optimisation levels.

But again, that's ok. We currently change the behaviour of the vectoriser depending on the -O flag and we can continue doing it. -O2 means one VPlan, -O3 means more. We could even break the invisible 3 barrier and add an actual 4 which is exhaust all options.

My point is: this cannot be a design guideline.

================
Comment at: docs/VectorizationPlan.rst:244
+   model and the code generation phases, where the cost estimation must
+   evaluate the to-be-generated code accurately.
+
----------------
Aligning cost and codegen will always have to rely on heuristics, even if we codegen multiple variations, as this is modelling IR, not assembly.

We should respect the cost analysis and we should strive to do the best we can, yes, but we shouldn't try to estimate the generated code **accurately** over other improvements.

================
Comment at: docs/VectorizationPlan.rst:260
+   UF's must be represented efficiently.
+   In particular support potential versionings efficiently.
+
----------------
This is the holly grail, where a lot of attention to detail and benchmarking will have to be done.

================
Comment at: docs/VectorizationPlan.rst:295
+  may happen, e.g., to support disjoint Regions and to ensure Regions have a
+  single exit, possibly an empty one.
+
----------------
mkuper wrote:
> What do you mean by supporting disjoint Regions?
If this is related to detecting in-loop disjoint regions, wouldn't it be easier to do that transformation before the vectoriser and generate two loops, so that they're always SESE or not worth looking?

================
Comment at: docs/VectorizationPlan.rst:307
+  **Design principle:** in order to reason about how to vectorize an Instruction
+  or how much it would cost, one has to consult the VPRecipe holding it.
+
----------------
This makes sense. And VRecipes consult the target information based on instructions and sizes.

It should be straightforward to have cross-target VRecipes just based on isLegal() and friends, which would prune all unsupported functionality really early.

================
Comment at: docs/VectorizationPlan.rst:313
+  interleave group of loads or stores with additional information for
+  calculating their cost and performing code-gen, as a group.
+
----------------
This is less clear. I understand how one VRecipe that knows about interleave access to worry about this, but I don't understand how we're going to tell the others that have no such concept to not ignore it and calculate the costs correctly.

Our current interleave cost calculations may not be enough to bar unrelated changes.

================
Comment at: docs/VectorizationPlan.rst:318
+  e.g., to facilitate changing the order of Instructions between original
+  and vectorized versions.
+
----------------
I'd advise against doing code modifications in an already vectorised block. You may be able to get an extra oomph, but legality analysis might have to be done all over again, which can significantly add costs to an already expensive process. 

Not to mention tht the complexity of the task will likely leak illegal transformations through.

================
Comment at: docs/VectorizationPlan.rst:409
+
+.. image:: VPlanUML.png
+
----------------
This looks mostly ok...

================
Comment at: docs/VectorizationPlan.rst:414
+
+.. image:: VPlanRecipesUML.png
+
----------------
What do you mean by `vectorize()`? Is that `legal+cost+transform`?

To avoid paying cost analysis and transformation costs up-front, wouldn't it make more sense to do each step for all plans as a group?

Assuming VPlans may expose different legality issues, you should:

    vec<VPlan> TODO;
    for (auto P : Plans) {
      if (P->isLegal())
        TODO.push_back(P);
    }

Then, compute the costs first, and only `vectorize()` the assumed cheapest:

    unsigned MinCost = ScalarCost;
    VPlan Best;
    for (auto P : TODO) {
      unsigned Cost = P->cost();
      if(MinCost > Cost) {
        Best = P;
        MinCost = Cost;
      }
    }
    Best->transform();

================
Comment at: docs/VectorizationPlan.rst:425
+
+   a. including those that take place after Legal, which is kept intact;
+   b. including those that use the Cost Model - refactor it slightly to expose
----------------
Couldn't different VPlans expose different legality issues? For example, for different combination of UFs and VFs?

================
Comment at: docs/VectorizationPlan.rst:472
+  LoopVectorizationCostModel::VectorizationFactor VF =
+      LVP.plan(OptForSize, UserVF, MaxVF);
+  bool VectorizeLoop = (VF.Width > 1);
----------------
Right, so this is my "cost loop" above.

================
Comment at: docs/VectorizationPlan.rst:515
+
+    LVP.executeBestPlan(LB);
+
----------------
And this is my `Best.transform()` above. Looks good.

================
Comment at: docs/VectorizationPlan.rst:556
+      DEBUG(printCurrentPlans("Initial VPlans", dbgs()));
+      optimizePredicatedInstructions();
+      DEBUG(printCurrentPlans("After optimize predicated instructions",dbgs()));
----------------
This is just analysis, right?

https://reviews.llvm.org/D28975