Vectorization of pointer PHI nodes

Mon Oct 14 11:28:17 PDT 2013

On 14 October 2013 18:15, Nadav Rotem <nrotem at apple.com> wrote:

> 1. We have 4 stores to consecutive locations, but the last element is the
> constant zero, and not an additional SUB.   At the moment we don’t have
> support for idempotence operations, but this is something that we should
> add.
>

The fourth write is not necessary for GCC to vectorize it (nor was in the
original code), but it was a result of CReduce's attempt to converge when
running ARM's GCC and inspecting the right sequence of vector instructions.
(btw, CReduce is great!).

In this case, shouldn't the vector operations to just add an undef to the
fourth lane? Would back-ends recognize it as a AVX/NEON/AltiVec
instruction, or just try to re-linearise?

2. The values that we are subtracting come from 3 loads.  We usually load 4
> elements from memory, or scalarize the inputs (we don’t support masked
> loads on AVX512).
>

That is a more complicated issue, but we can get away with it if we, in a
first implementation, only allow the same number of reads and writes on
each loop. In that case, if the operations on the independent variables are
identical, than it means the loop can be simplified by multiplying the
induction range by N and reducing the number of load/sub/store lanes to
one, in which case, loop vectorization becomes trivial.

Do you know if the GCC SLP Vectorizer vectorizes this, or is it their Loop
> Vectorizer ?
>

Good question. What vectorizer does the "-ftree-vectorizer" turns on?
Because if I use "-fno-tree-vectorize", the code remains scalar.

cheers,
--renato
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20131014/2b7f646a/attachment.html>