[PATCH] D67510: [LV] Support gaps, overlaps, and inexact sizes in speculation logic

Sat Sep 14 09:53:04 PDT 2019

Ayal added inline comments.

================
Comment at: lib/Analysis/Loads.cpp:242
+  //   load i32, i32* <0,+,2>, align 2 (every 2 bytes, load a 4 byte chunk)
+  const APInt AccessSize = (TC-1) * StepC + EltSize;

----------------
reames wrote:
> Ayal wrote:
> > Note that interleave groups (of loads with positive step currently) with gaps at the end may benefit from checking dereferenceability across AccessSize of TC * StepC. But that warrants a separate patch.
> I don't understand the case you're describing here.  What do you mean by a "gap at the end"?  
> 
> p.s. Is there a definition somewhere in code of what an interleave group is?  At the moment, I'm assuming it's an access pattern with periodic gaps, but is there something more specific?  
> What do you mean by a "gap at the end"?

Vectorizing with "gap at the end" was originally described in D19487, and extended under fold-tail in D53668. Simple examples are

```
for (int i = 0; i < N; i++)
  B[i] = A[2*i];
```
and
```
for (int i = 0; i < N; i++)
  C[i] = D[4*i] + D[4*i+1];
```
These currently trigger  requiresScalarEpilogue(), which is better to avoid if dereferenceability  can be proven (for missing elements `A[2*(N-1)+1]` and `D[4*(N-1)+2]`, `D[4*(N-1)+3]`).

> Is there a definition somewhere in code of what an interleave group is?

Yes, an interleave group is defined in Analysis/VectorUtils.h, above the definition of `class InterleaveGroup`.
Descriptions how interleave groups are vectorized appear above
`InterleavedAccessInfo::analyzeInterleaving()` in Analysis/VectorUtils.cpp and above
 `InnerLoopVectorizer::vectorizeInterleaveGroup()` in LoopVectorize.cpp.

> At the moment, I'm assuming it's an access pattern with periodic gaps, but is there something more specific?

It's usually several access patterns with the same periodic step that can be optimized together using one common wider unit-stride load/store followed-by/preceded-by shuffles, instead of independent gathers/scatters. Plenty of more specific information related to vectorizing interleave groups is available, e.g., "Automatic Vectorization of Interleaved Data Revisited" TACO 2015, "Exploiting mixed SIMD parallelism by reducing data reorganization overhead" CGO 2016.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D67510/new/

https://reviews.llvm.org/D67510