[PATCH] D28975: [LV] Introducing VPlan to model the vectorized code and drive its transformation

Thu Jan 26 16:50:10 PST 2017

mkuper added a subscriber: chandlerc.
mkuper added a comment.

Thanks, Ayal, Gil!

Two meta-comments about the review:

1. I suggest we use this review for higher-level points and issues with the design. If there are no significant objections, we can then review the code piece-by-piece, split into the chunks the review comment suggests. This review can then be used as a reference to how those hunks fit together, if the purpose of some of them is unclear, as a stand-alone patch. Does this make sense to everyone?
2. I think this is major enough to give a heads-up to llvm-dev, and give more people opportunity to take a look at the design.

Having said that, I think this is generally the right way forward. The way legal/cost/transformation interact in the current LV wasn't as much designed, as evolved, and not necessarily in the right direction...

Some preliminary comments from a first pass inline. I still need to understand how the recipes are going to work.

================
Comment at: docs/VectorizationPlan.rst:10
+
+- be a *lightweight* NFC patch;
+- show key aspects of VPlan's Hierarchical CFG concept;
----------------
As @chandlerc pointed out to me, this isn't actually NFC. It's expected not to change the output of the vectorizer, but it's a huge change in how the vectorizer *functions*. 

================
Comment at: docs/VectorizationPlan.rst:27
+
+  * aligning Cost step with Transformation step,
+  * representing entire code being transformed,
----------------
By this do you mean that making a cost decision will "record" the transformation we expect to make?

================
Comment at: docs/VectorizationPlan.rst:36
+
+Out of scope for initial patch:
+
----------------
This sounds reasonable to me.
I assume this is in sync with what Elena is doing in, say, D27919, right?

================
Comment at: docs/VectorizationPlan.rst:60
+
+2. A LoopVectorizationPlanner can construct, optimize and discard one or more
+   VPlans, providing different ways to vectorize the loop or the loop nest.
----------------
How is "can construct, optimize and discard one or more VPlans" different from just "generates VPlans"?
(I'm trying to understand if this adds information I'm missing. :-) )

================
Comment at: docs/VectorizationPlan.rst:111
+
+1. Legal Step: check if loop can be legally vectorized; encode contraints and
+   artifacts if so. Initiate Vectorization Plan showing how the loop can be
----------------
contraints -> constraints.
Also,  I understand what constraints are in this context (e.g. dependence distance), but what do you mean by artifacts?

================
Comment at: docs/VectorizationPlan.rst:131
+
+One can compare with LLVM's existing SLP vectorizer, where TSLP [3]_ adds
+Step 2.b.
----------------
Could you give a more concrete example for what optimizing an LV VPlan would mean?

I haven't read the TSLP paper (although I should), but skimming it, it looks like it's basically constructing a lot of subtrees (each of which could be considered, in our context, its own VPlan) and choose the best among those. So this isn't exactly "optimizing" a plan, it's just recursively generating plans and choosing the cheapest one.
Or should I just go and read the paper properly?

================
Comment at: docs/VectorizationPlan.rst:140
+OpenMP [6]_. This design to extend the Loop Vectorizer to outer loops supports
+and raises the importance of explicit vectorization beyond the current
+capabilities of Clang and LLVM. Namely, from currently forcing the
----------------
I don't believe anything in this design raises the importance of explicit outer loop vectorization.
I think what you're trying to say is that if we want to support explicit outer loop vectorization (which I believe we do), the best way to do it would be using this design. Am I right?

================
Comment at: docs/VectorizationPlan.rst:251
+      vectorized loop which may include multiple basic-blocks and nested loops.
+   b. SLP vectorization.
+   c. Combinations of the above, including nested vectorization: vectorizing
----------------
If this happens, it would be fantastic. 
How would this design help support this?

================
Comment at: docs/VectorizationPlan.rst:271
+:VPlan:
+  A recipe for generating a vectorized version from a given IR code.
+  Takes a "scenario-based approach" to vectorization planning.
----------------
So, given the text below, by "a given IR code" you mean "a given IR SESE region", right?

================
Comment at: docs/VectorizationPlan.rst:281
+  basic-blocks as in Sharir [9]_, promoting the Tile encapsulation. We use the
+  terms Region and Block rather than Tile [8]_ to avoid confusion with loop
+  tiling.
----------------
Each Region is still required to be SESE, right?

================
Comment at: docs/VectorizationPlan.rst:287
+  instructions that will appear consecutively in a basic block of the vectorized
+  version. The instructions of such a basic block originate from one or more
+  VPBasicBlocks.
----------------
How can they originate from more than one basic block? Can you be more explicit about the relationship between IR basic blocks and VP basic blocks?

================
Comment at: docs/VectorizationPlan.rst:289
+  VPBasicBlocks.
+  The VPBasicBlock takes care of the control-flow
+  relations with other VPBasicBlock's and Regions.
----------------
Something here is garbled.

================
Comment at: docs/VectorizationPlan.rst:295
+  may happen, e.g., to support disjoint Regions and to ensure Regions have a
+  single exit, possibly an empty one.
+
----------------
What do you mean by supporting disjoint Regions?

================
Comment at: docs/VectorizationPlan.rst:387
+:VPTransformState:
+  Stores information used for code generation, passed from the Planner to its
+  selected VPlan for execution, and used to pass additional information down
----------------
What kind of information?

================
Comment at: docs/VectorizationPlan.rst:485
+  // Select the interleave count.
+  unsigned IC = CM.selectInterleaveCount(OptForSize, VF.Width, VF.Cost);
+
----------------
The long-term plan is to have LVP produce a VPlan that contains both UF and VF, and this is just a transitional state, right?

https://reviews.llvm.org/D28975