[llvm-bugs] [Bug 38916] New: vector-compare code does unnecessary widening/narrowing

Wed Sep 12 10:58:24 PDT 2018

https://bugs.llvm.org/show_bug.cgi?id=38916

            Bug ID: 38916
           Summary: vector-compare code does unnecessary
                    widening/narrowing
           Product: new-bugs
           Version: trunk
          Hardware: All
                OS: All
            Status: NEW
          Severity: normal
          Priority: P
         Component: new bugs
          Assignee: unassignedbugs at nondot.org
          Reporter: srj at google.com
                CC: llvm-bugs at lists.llvm.org

The change in https://reviews.llvm.org/rL339875 seems to have regressed the
quality of some vector-compare code generation (for x86 at least) in Halide.

Halide is attempting to generate code that is comparing two <8 x i8> vectors;

```
    vec<8 x i8> ones = {1,1,1,1,1,1,1,1};
    vec<8 x i8> twos = {2,2,2,2,2,2,2,2};
    vec<8 x i8> a = load_vec_a();
    vec<8 x i8> b = load_vec_b();
    // result should contain 1 for each byte that matches, 2 for each that does
not
    vec<8 x i8> result = (a == b) ? ones : twos;
```

The unoptimized LLVM IR we generate for the above is:

```
    %9 = bitcast i8* %load_vec_a to <8 x i8>*
    %10 = load <8 x i8>, <8 x i8>* %9
    %11 = bitcast i8* %load_vec_b to <8 x i8>*
    %12 = load <8 x i8>, <8 x i8>* %11
    %13 = shufflevector <8 x i8> %10, <8 x i8> undef, <16 x i32> <i32 0, i32 1,
i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32
undef, i32 undef, i32 undef, i32 undef, i32 undef>
    %14 = shufflevector <8 x i8> %12, <8 x i8> undef, <16 x i32> <i32 0, i32 1,
i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32
undef, i32 undef, i32 undef, i32 undef, i32 undef>
    %15 = icmp eq <16 x i8> %13, %14
    %16 = shufflevector <16 x i1> %15, <16 x i1> undef, <8 x i32> <i32 0, i32
1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
    %17 = shufflevector <8 x i1> %16, <8 x i1> undef, <16 x i32> <i32 0, i32 1,
i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32
undef, i32 undef, i32 undef, i32 undef, i32 undef>
    %18 = select <16 x i1> %17, <16 x i8> <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1,
i8 1, i8 1, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8
undef, i8 undef>, <16 x i8> <i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8
undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef>
    %19 = shufflevector <16 x i8> %18, <16 x i8> undef, <8 x i32> <i32 0, i32
1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
    %20 = bitcast i8* %result to <8 x i8>*
    store <8 x i8> %19, <8 x i8>* %20
```

The expectation here is that for x86 (w/SSE4), we'll end up with x86 code
something like this:

```
    vmovq     load_vec_a, %xmm0
    vmovq     load_vec_b, %xmm1
    vpcmpeqb  %xmm1, %xmm0, %xmm0
    vpaddb    .LCPI0_0(%rip), %xmm0, %xmm0  ## LCPI0_0 =
<2,2,2,2,2,2,2,2,u,u,u,u,u,u,u,u>
    vmovq     %xmm0, result
```

But after https://reviews.llvm.org/rL339875, however, the IR above emits
something more like:

```
    vpmovzxbw   load_vec_a, %xmm0 
    vpmovzxbw   load_vec_b, %xmm1
    vpcmpeqw    %xmm1, %xmm0, %xmm0
    vpacksswb   %xmm0, %xmm0, %xmm0
    vpsllw      $7, %xmm0, %xmm0
    vpand       .LCPI0_0(%rip), %xmm0, %xmm0  ## LCPI0_0 =
<0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0>
    vpxor       %xmm1, %xmm1, %xmm1
    vpcmpgtb    %xmm0, %xmm1, %xmm0
    vpaddb      .LCPI0_1(%rip), %xmm0, %xmm0  ## LCPI0_1 =
<2,2,2,2,2,2,2,2,u,u,u,u,u,u,u,u>
    vmovq       %xmm0, result
```

Besides being twice as long, it's just odd -- why are we expanding the results
to 16 bit when the source, intermediate, and result are all 8 bit? (Note that
current top-of-tree produces slightly different output from this second
example, but the fundamental pathology of unnecessary-widening-and-narrowing is
still in place.)

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20180912/a451bf66/attachment.html>