[PATCH] D109368: [LV] Vectorize cases with larger number of RT checks, execute only if profitable.
Florian Hahn via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Jul 7 14:16:21 PDT 2022
fhahn added a comment.
In D109368#3636313 <https://reviews.llvm.org/D109368#3636313>, @alexfh wrote:
> In D109368#3636302 <https://reviews.llvm.org/D109368#3636302>, @alexfh wrote:
>
>> When compiled with `--target=x86_64--linux-gnu -O2`, before and after this commit, the resulting assembly differs in a way that seems wrong to me:
>
> After reading the description of the commit I'm not sure about this being wrong, but this sort of a change has definitely caused a difference in the behavior of some numpy C code used by the tensorflow test @asmok-g mentioned.
I had a look at the example, but I don't think the patch is at fault here directly. The only difference for the example is that the vector loop is only executed if the loop execute 16 or more iterations vs 8 or more before. This shouldn't impact correctness, unless the code path for the scalar loop is mis-compiled.
Here's the only IR change:
< %min.iters.check = icmp ult i32 %0, 7
---
> %min.iters.check = icmp ult i32 %0, 15
Is it possible that the reproducer has been reduced too far? Does it work as expected if vectorization is disabled for the loop via `#pragma clang loop vectorize(enable) / #pragma clang loop interleave(enable)`?
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D109368/new/
https://reviews.llvm.org/D109368
More information about the llvm-commits
mailing list