[llvm-dev] RFC: Extending LV to vectorize outerloops

Wed Sep 28 10:55:45 PDT 2016

On 24 September 2016 at 00:08, Zaks, Ayal <ayal.zaks at intel.com> wrote:
> Our first step is to make LV fully conscious of the control-flow that will result from vectorizing an (innermost) loop, allowing this control-flow to comprise of more than a single basic-block. A next step is to consider retaining branches rather than fully if-converting them. This could indeed entail a large solution space, but simple heuristics are usually employed to prune it effectively. When we reach the ability to handle outerloops the solution space grows further, and we fully recognize that early pruning of candidates will be important. As Hal noted, some heuristics can indeed be employed here too.

Sounds like a plan.

> This step is designed to generate a straw-man vectorized loop in "shadow" form, whose cost can be directly computed (3), and which facilitates generating the actual vector IR instructions (4).

I'm worried about how different will this be to actual IR. Semantics
is a hard thing to keep when representing data in different formats,
and an important task will be to make sure this internal format isn't
breaking any assumptions, or it'll be hard to fix bugs int the
vectorised code that don't exist in the scalar IR, if you go (scalar
IR -> internal rep -> vector IR).

Early pruning will indeed help the cases where we're not sure of the
semantics, but there will be a period that this internal format won't
be able to represent all the cases we have today, meaning there could
be some regressions on the kind of loops we can vectorise just because
the internal representation is not rich enough yet.

Unless, of course, we use this internal representation *in addition*
to the current vectoriser if all else fails, which means we'll run the
vectoriser twice in those cases.

> ("cheap" - in terms of compile-time I take it)
> Care must be taken therefore when deciding how many plans to draft. Step 2.b can consult the cost of partial plans being considered, for early plan pruning.

Yes, compile time. Indeed, pruning will need to be harsh initially,
then relaxed as we improve the code / representation. That's why I
fear regressions.

> ("very costly" - in terms of compile-time I take it; "cost" is overloaded here)

Sorry, yes, compile time.

> Computing cost before having an explicit vectorization plan is indeed more compile-time efficient than first constructing the plan, but inherently less accurate and harder to keep consistent.

Yup. :)

> Such a "divergent" predictive cost could be used to prune candidates, when considered decisive, to save compile-time. Having both a "-faggressive-vectorizer" path that builds a plan and weighs it before implementing it, while keeping the current path that vectorizes directly w/o a plan, may be quite a maintenance burden. A "-faggressive-vectorizer" option could be used to control early pruning of candidates. Let's first see how much compile-time, and improved accuracy, are actually involved.

Agree, we're getting a bit ahead of ourselves... :)

> OTOH, computing the "non-divergent" costs of explicit plans could lead to the development of pruning techniques, or creative improvements to the cost-model. That is, when these costs turn out to be very high or very low.

I see what you mean. This could indeed work faster than I'm expecting,
if the pruning happens at the same time as building the internal
representation, we can have many early exits and fall backs.

cheers,
--renato