[PATCH] D71919: [LoopVectorize] Disable single stride access predicates when gather loads are available.

Thu Jan 9 08:40:21 PST 2020

Ayal added a comment.

> The LoopVectorizer/LAA has the ability to add runtime checks for memory accesses that look like they may be single stride accesses, in an attempt to still run vectorized code. This can happen in a boring matrix multiply kernel, for example:

  for(int i = 0; i < n; i++) {
    for (int j = 0; j < m; j++)
    {
      int sum = 0;
      for (int k = 0; k < l; k++)
        sum += A[i*l + k] * B[k*m + j];
      C[i*m + j] = sum;
    }
  }

Note that a (more boring?) matrix multiply kernel where B is a square matrix, i.e., where stride m is equal to trip count l, will not be specialized for m=1. But this general case may multiply matrix A by a single column matrix B, whose stride m is 1.

Another possible way to prevent such undesired specialization may be with a __builtin_expect/llvm.expect(m>1, 1).

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D71919/new/

https://reviews.llvm.org/D71919