[PATCH] D93967: [SLP]Need shrink the load vector after reordering.

Fri Jan 8 05:18:24 PST 2021

ABataev added a comment.

In D93967#2486129 <https://reviews.llvm.org/D93967#2486129>, @mstorsjo wrote:

> This caused misoptimizations for armv7, where code that previously worked correctly now produce different results. (The code is clean under ubsan, so it shouldn't be relying on anything undefined.)
>
> The issue appears with https://martin.st/temp/interplayvideo-preproc.c, compiled with `clang -target armv7-linux-gnueabihf -O2`.
>
> The diff in generated code, before/after, looks like this:
>
>           vmov.32 d16[0], lr
>           vmov.32 d16[1], r2
>    .LBB27_3:                               @ %if.end
>                                            @   in Loop: Header=BB27_4 Depth=1
>           vmov.32 r2, d16[1]
>           add     r3, r3, #1
>           vmov.32 r5, d16[0]
>           cmp     r3, #8
>   +       vdup.32 d16, d16[0]
>           vmov.16 d17[1], r2
>           vmov.16 d18[0], r5
>           vdup.16 d21, d17[1]
>           vdup.16 d20, d18[0]
>           vst1.16 {d20, d21}, [r1], r12
>           beq     .LBB27_9
>    .LBB27_4:                               @ %for.body
>                                            @ =>This Inner Loop Header: Depth=1
>           tst     r3, #3
>           bne     .LBB27_3
>
> If it loops back to `.LBB27_3`, the vector element `d16[1]` no longer has the value it was expected to have.

Hi, thanks for the report, the patch is correct, the bug is in the existing code. Will fix it soon.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93967/new/

https://reviews.llvm.org/D93967