[llvm-dev] [LV][VPlan] Status Update on VPlan ----- where we are currently, and what's ahead of us

Saito, Hideki via llvm-dev llvm-dev at lists.llvm.org
Thu Dec 14 15:08:47 PST 2017


We are working with Univ. of Saarland folks for this aspect. What you wrote is true (and you know I know that) ---- I just didn’t write too
much details in that one-liner explanation on why we need to work in that area, as I expect Simon Moll (U. Saarland) to be sending in his
RFC on this topic in not too distant future. We think Divergence Analysis (DA) code from Region Vectorizer (RV) project has good potential
for reuse in Outer Loop Vectorization project (RFC: http://lists.llvm.org/pipermail/llvm-dev/2017-December/119523.html), and good
divergence analysis should also help innermost loop vectorization (e.g., gather/scatter versus unit-stride).

I suggest first trying to get in touch with Simon if you are interested in this aspect of vectorization to see what DA in RV already has. Let us
know if you are also interested in the outer loop vectorization. There are plenty of things for everyone interested.

Thanks,
Hideki

From: Serge Preis [mailto:spreis at yandex-team.ru]
Sent: Thursday, December 14, 2017 3:15 AM
To: Saito, Hideki <hideki.saito at intel.com>; llvm-dev at lists.llvm.org
Subject: Re: [llvm-dev] [LV][VPlan] Status Update on VPlan ----- where we are currently, and what's ahead of us

Hello,

Just minor comment.

* Improve uniformity/divergence analysis  ----- Uniformity in innermost loop vectorization is
   invariance. For outer loop vectorization, there are uniform values that are not invariant.

I believe that uniformity/divergence analysis is one of key technologies for efficient vectorization, so I appreciate you bringing this up and looking forward to extensive and comprehensive framework here.

In fact there is uniformity in inner loop vectorization that is not invariance. Expressions like a[i/16] are uniform under certain conditions (namely i starts with 0 mod min(VL, 16), and 16 % VL == 0) while not invariant. It is unfortunate for many media codes operating on blocks that loop vectorizer (at least in my experience) cannot detect and harness this uniformity. I may even try to look into improving this if someone give me pointers where to start.

Regards,
Serge Preis



06.12.2017, 07:22, "Saito, Hideki via llvm-dev" <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>:

Status Update on VPlan ---- where we are currently, and what's ahead of us
==========================================================

Goal:
-----
Extending Loop Vectorizer (LV) such that it can handle outer loops, via uplifting its infrastructure with VPlan.
The goal of this status update is to summarize the progress and the future steps needed.

Background:
-----------
This is related to the VPlan infrastructure project we started a while back, a project to extend the (inner loop vectorization focused) Loop Vectorizer to support outer loop vectorization. VPlan is the vectorization planner that records the decisions and candidate directions to pursue in order to drive cost modeling and vector code generation. When it is fully integrated into LV (i.e., at the end of this big project), VPlan will use a Hierarchical-CFG (HCFG) and transform it starting from the abstraction of the input IR to reflect current vectorization decisions being made. The HCFG eventually becomes the abstraction of the output IR, and the vector code generation is driven by this abstract representation.

Please refer to the following for more detailed background:

RFCs
       http://lists.llvm.org/pipermail/llvm-dev/2016-September/105057.html (Extending LV to vectorize outerloops)
       http://lists.llvm.org/pipermail/llvm-dev/2017-February/110159.html  (Introducing VPlan to model the vectorized code and drive its transformation)

"Extending LoopVectorizer: OpenMP4.5 SIMD and Outer Loop Auto-Vectorization"  (Saito, et.al.)
2016 LLVM Developers' Meeting
https://www.youtube.com/watch?v=XXAvdUwO7kQ

"Introducing VPlan to the LoopVectorizer"     (Rapaport and Zaks)
2017 EuroLLVM Developers' Meeting
https://www.youtube.com/watch?v=IqzJRs6tb7Y
"Vectorizing Loops with VPlan - Current State and Next Steps"   (Zaks and Rapaport)
2017 LLVM Developers' Meeting
https://www.youtube.com/watch?v=BjBSJFzYDVk

Patches Committed:
------------------
Two big patches have been submitted/committed.
https://reviews.llvm.org/D28975 by Gil Rapaport. (Introducing VPlan to model the vectorized code and drive its transformation)
     Has been broken down to a series of smaller patches and went in. The last (re)commit of the series is
     https://reviews.llvm.org/rL311849
https://reviews.llvm.org/D38676 by Gil Rapaport. (Modeling masking in VPlan, introducing VPInstructions)
     This is also being broken down to a series of smaller patches to facilitate the review.
     Committed as https://reviews.llvm.org/rL318645

Where We Are:
-------------
With the first patch, we introduced the concept of VPlan to LV and started explicitly recording decisions like interleave memory access optimization and serialization. In the first patch, we resisted introducing VPInstructions ----- and introduced VPRecipes instead, in an attempt to avoid duplicating Instructions in the abstract HCFG Representation (i.e., abstract Instructions in HCFG that is separate from incoming IR Instructions). As we moved on, it became more and more apparent that we have a need to introduce new abstract Instructions (see https://reviews.llvm.org/D38676 for more details)  which also requires representation of new use-def relations that does not exist in incoming IR Instructions. As a result, with the second patch, as part of explicitly modeling masking in VPlan, we introduced VPInstruction, which is an abstraction of IR Instruction.

All these, so far, are the refactoring of (still innermost loop vectorization centric) Loop Vectorizer's existing functionality to explicitly model what was implicitly handled before.

Future Refactoring Needed:
--------------------------
The following aspects of LV still need to be refactored into the VPlan based representation. This list is non-exhaustive, but should give you a ball park of the amount of work left here.
* Predication
* Cost model
* Remainder Loop
* Runtime Guards
* External Users
* Reduction Epilog
* Interleave Grouping
* Sink Scalar Operands

Work Needed for Simple Outer Loop Vectorization:
------------------------------------------------
* Improve uniformity/divergence analysis  ----- Uniformity in innermost loop vectorization is
   invariance. For outer loop vectorization, there are uniform values that are not invariant.
* Better predication ---- Retaining uniform backedge is a must-have. Retaining uniform forward
   branch is good for inner loop vectorization as well.
* Masking on HCFG
* Code Generation driven by VPlan/HCFG

Additional Work Needed to Handle Higher Complexity:
---------------------------------------------------
* Construct VPlan near the beginning of LV (right after Legal or Must-Vectorize directive check)
* VPlan to VPlan transform of divergent inner loop control flow into uniform loop control
   flow + divergent acyclic control flow (all vector elements has to iterate the same number of times)
* Predication on the transformed VPlan.

Additional Work Needed for Outer Loop Auto-Vectorization:
---------------------------------------------------------
* Legality check
* Cost modeling (compare it to inner loop vectorization strategy in apples-to-apples manner).

Other Enhancements (out of the scope of this doc):
--------------------------------------------------
* Remainder Loop Vectorizaion
* SLP and LV in one Vectorizer
* Nested Vectorization
* ...

Related Work:
-------------
In the previous RFC, we went with the direction to convert Function Vectorization into Loop Vectorization. When such a function has a loop inside,
the loop vectorization needed in that scenario is "outer loop vectorization".
http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html (X. Tian, RFC for vectorizing a call --- caller side and callee side)
https://reviews.llvm.org/D22792 (M. Masten, Converting Function Vectorization to Loop Vectorization)
https://reviews.llvm.org/D40575 (M. Masten, Caller side support for invoking vector function from vector loop)

Related work of related work. Math lib vectorization using SVML.
http://lists.llvm.org/pipermail/llvm-dev/2016-March/097862.html (M. Masten, RFC for vector math lib call using Intel SVML)
https://reviews.llvm.org/D19544 (M. Masten, vector math lib call using Intel SVML)

Summary:
--------
Summary of the current state of VPlan infrastructure project is presented, and the remaining steps towards outer loop vectorization is listed. We are currently at a point where we can slow down the refactoring effort for the purpose of expediting the big functionality boost: outer loop vectorization ----- and by doing so encourage more participation from the wider LLVM community in the refactoring effort to expedite the overall transition to the VPlan framework.
Shortly, we will send out an RFC to solicit community feedback on our plan to trade-off between 1) making concurrent progress on refactoring and outer loop vectorization and 2) finish refactoring and then adding outer loop vectorization.
Please stay tuned.

Thanks,
Hideki Saito

_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171214/2d5dbd9f/attachment-0001.html>


More information about the llvm-dev mailing list