[PATCH] Loop Rerolling Pass

Wed Oct 23 03:40:02 PDT 2013

On 22 October 2013 17:31, Arnold Schwaighofer <aschwaighofer at apple.com>wrote:

> If you wanted to tackle the general problem of vectorization of
> interleaved data I am not sure that much of Hal’s code can be immediately
> reused for the problem.

Hi Arnold,

I agree with you, but some of the cases where a stride would work, Hal's
patch would re-roll, and if I base my analysis on what was there before, I
won't detect it.

Since Hal hinted on leaving that as an API for the vectorizer, I thought
that maybe it'd be worth running that first, and delegating the
vectorization to other, simpler, parts of the vectorizer.

As an example, I created two loops:

One, unrolled, with stride access:
    for (i=0; i<MAX; i++) {
      a[i] = OFFSET - b[i] - DELTA;
      a[i+1] = OFFSET - b[i+1] - DELTA;
      a[i+2] = OFFSET - b[i+2] - DELTA;
    }

One re-rolled, as it would, with Hal's patch:
    for (i=0; i<MAX*3; i++) {
      a[i] = OFFSET - b[i] - DELTA;
    }

The first loop doesn't get vectorized, because there are too many PHIs, but
the second does.

So, Hal's patch will deal with the most basic strided access, so that we
can focus on the more complex patterns.

There is one slight issue with all this (APIs, and re-tries), and it's that
there are already too much analysis going on in the vectorizer, and you
don't want to run the canVectorize() too many times, so calling re-roll on
failure and trying again, then calling stride-transform and trying again on
loops that would never benefit from those treatments might not be a good
idea.

To deal with this, I thought we could have a table with a list of problems
and how to solve them, from cheaper to more expensive analysis. So, say I
found a loop with strided access, and the general error from a more basic
analysis is to say that we found an "unidentified PHI". We then mark the
loop with some metadata to that effect, and the next iteration, it'll try
to transform the loop before vectorizing in a way that will increase the
chances of it getting vectorized.

Moreover, such a table (and associated metadata), could also be an easier
way to annotate vectorization failures, and even used by tools to help
developers fix their code.

> You want to split the set of load and store instructions with non-unit
> stride accesses into groups where each group contains useful locality (the
> members of a group are adjacent).
>

Thanks again for the how-to, they're helping me get up to speed with the
vectorizer again.

cheers,
--renato
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20131023/724ba5bf/attachment.html>