[PATCH] D15812: [SLP] Vectorize gather-like idioms ending at non-consecutive loads.

Tue Dec 29 12:37:47 PST 2015

mssimpso created this revision.
mssimpso added reviewers: nadav, jmolloy, hfinkel, anemet.
mssimpso added subscribers: mcrosier, llvm-commits.

This patch tries to vectorize gather-like expression trees ending at
non-consecutive loads, such as the one shown in the example below.

```
... = g[a[0] - b[0]] + g[a[1] - b[1]] + ... + g[a[n] - b[n]];
```

Here, the index calculations for the "g" accesses can be vectorized. The loads
of the "a" and "b" array elements and the subtractions can all be replaced by
their vector equivalents. Our bottom-up vectorizer currently misses cases like
this because the expression trees don't end in stores or reductions.

It's possible to vectorize these cases in a top-down phase beginning at the
consecutive loads. However, I've chosen here to detect the specific pattern of
interest and proceed bottom-up as we do with other interesting cases. The
advantage of this approach is that it avoids the complexity, compile-time, and
phase ordering issues of a full-blown top-down pass. The disadvantage is that
it's probably not as general as it would be otherwise.

The primary changes included in the patch allow us to (1) vectorize the
gather-like pattern shown above and (2) set vector factors based on the width
of memory accesses in the expression trees. Your feedback is welcome.

[SLP] Truncate expressions to minimum required bit width

This change attempts to produce vectorized integer expressions in bit widths
that are narrower than their scalar counterparts. By reducing the bit width
where possible, we can pack more isomorphic expressions into a single vector
and increase parallelism. The need for demotion arises especially on
architectures in which the small integer types (e.g., i8 and i16) are not legal
for scalar operations but can still be used in vectors.

Like similar work done within the loop vectorizer, we rely on InstCombine to
perform the actual type-shrinking. Here, we only insert the truncations that
are needed to seed InstCombine's type demotion. This introduces the limitation
that we can only rewrite single-use chains (every instruction in the expression
can have at most one use). We further limit ourselves to chains that are rooted
by instructions other than stores, since we cannot change the width of vector
memory operations. With these restrictions, only expression roots can be used
externally, and we sign extend them back to their original type after we
extract them from the vectors.

We use ComputeNumSignBits from ValueTracking to determine the minimum required
bit width of an expression. We update cost estimates to account for the
narrower types and sign extensions we add to the vectorized code.

http://reviews.llvm.org/D15812

Files:
  lib/Transforms/Vectorize/SLPVectorizer.cpp
  test/Transforms/SLPVectorizer/AArch64/gather-reduce.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D15812.43751.patch
Type: text/x-patch
Size: 32800 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20151229/bdfc3fa1/attachment.bin>