[PATCH] D109368: [LV] Vectorize cases with larger number of RT checks, execute only if profitable.

Thu Jul 7 14:16:21 PDT 2022

fhahn added a comment.

In D109368#3636313 <https://reviews.llvm.org/D109368#3636313>, @alexfh wrote:

> In D109368#3636302 <https://reviews.llvm.org/D109368#3636302>, @alexfh wrote:
>
>> When compiled with `--target=x86_64--linux-gnu -O2`, before and after this commit, the resulting assembly differs in a way that seems wrong to me:
>
> After reading the description of the commit I'm not sure about this being wrong, but this sort of a change has definitely caused a difference in the behavior of some numpy C code used by the tensorflow test @asmok-g mentioned.

I had a look at the example, but I don't think the patch is at fault here directly. The only difference for the example is that the vector loop is only executed if the loop execute 16 or more iterations vs 8 or more before. This shouldn't impact correctness, unless the code path for the scalar loop is mis-compiled.

Here's the only IR change:

  <   %min.iters.check = icmp ult i32 %0, 7
  ---
  >   %min.iters.check = icmp ult i32 %0, 15

Is it possible that the reproducer has been reduced too far? Does it work as expected if vectorization is disabled for the loop via `#pragma clang loop vectorize(enable) / #pragma clang loop interleave(enable)`?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D109368/new/

https://reviews.llvm.org/D109368