[llvm-dev] Vectorizing multiple exit loops

Mon Sep 30 14:00:53 PDT 2019

On Mon, 30 Sep 2019 at 20:59, Zaks, Ayal (Mobileye) <ayal.zaks at intel.com> wrote:
> > AFAIK this isn't strictly in the immediate plans, but loop splitting was one of
> > the aims for doing outer-loop vectorisation.
>
> The aim being to vectorize an outer-loop when an inner-loop is difficult / requires loop splitting to vectorize?

Not specifically, but I guess I was "specific", sorry about that. :)

I meant that loop splitting is one particular case that can expose
outer-loop vectorisation opportunities in VPlan, as you describe
below.

> VPlan is designed to support the tentative decisions and alternatives when vectorizing. E.g., when different types of instructions are to be generated for different VF's. Loop index splitting, and turning a multiple exit loop into a (countable) single exit loop, seem like preparatory transformations that can enable vectorization and/or interleaving for any VF/UF, similar to loop distribution.

Right, for probing the search space in more than one path, and picking
the best overall, not the first profitable. With some conservative
heuristics we can restrict the combinations and reach a reasonable
solution without considerably more compile time.

> VPlan currently models the instructions inside the vector loop. If a "while" or multiple exit loop has stores above the "break point", it may be possible to sink them in order to avoid issuing speculative stores. LV applies similar "SinkAfter" code motion when needed to facilitate 1st order recurrence. Loads may indeed be handled speculatively by dereferencability, which may in turn require peeling the first iterations to reach an  aligned address. Such peeling may be folded into the vector loop by masking, if desired, analogous to LV's foldTailByMasking.

There was also another work, for platforms without masking, to fold
tail loops into smaller VFs, but I think that didn't go in because it
increased complexity a lot and performance by almost nothing. With
most large vector platforms now supporting masks, I think peeling with
masks may be a generic and sufficient approach on both ends, and leave
head/tail scalar loops for older platforms.