[llvm-bugs] [Bug 38916] New: vector-compare code does unnecessary widening/narrowing
via llvm-bugs
llvm-bugs at lists.llvm.org
Wed Sep 12 10:58:24 PDT 2018
https://bugs.llvm.org/show_bug.cgi?id=38916
Bug ID: 38916
Summary: vector-compare code does unnecessary
widening/narrowing
Product: new-bugs
Version: trunk
Hardware: All
OS: All
Status: NEW
Severity: normal
Priority: P
Component: new bugs
Assignee: unassignedbugs at nondot.org
Reporter: srj at google.com
CC: llvm-bugs at lists.llvm.org
The change in https://reviews.llvm.org/rL339875 seems to have regressed the
quality of some vector-compare code generation (for x86 at least) in Halide.
Halide is attempting to generate code that is comparing two <8 x i8> vectors;
```
vec<8 x i8> ones = {1,1,1,1,1,1,1,1};
vec<8 x i8> twos = {2,2,2,2,2,2,2,2};
vec<8 x i8> a = load_vec_a();
vec<8 x i8> b = load_vec_b();
// result should contain 1 for each byte that matches, 2 for each that does
not
vec<8 x i8> result = (a == b) ? ones : twos;
```
The unoptimized LLVM IR we generate for the above is:
```
%9 = bitcast i8* %load_vec_a to <8 x i8>*
%10 = load <8 x i8>, <8 x i8>* %9
%11 = bitcast i8* %load_vec_b to <8 x i8>*
%12 = load <8 x i8>, <8 x i8>* %11
%13 = shufflevector <8 x i8> %10, <8 x i8> undef, <16 x i32> <i32 0, i32 1,
i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32
undef, i32 undef, i32 undef, i32 undef, i32 undef>
%14 = shufflevector <8 x i8> %12, <8 x i8> undef, <16 x i32> <i32 0, i32 1,
i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32
undef, i32 undef, i32 undef, i32 undef, i32 undef>
%15 = icmp eq <16 x i8> %13, %14
%16 = shufflevector <16 x i1> %15, <16 x i1> undef, <8 x i32> <i32 0, i32
1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
%17 = shufflevector <8 x i1> %16, <8 x i1> undef, <16 x i32> <i32 0, i32 1,
i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32
undef, i32 undef, i32 undef, i32 undef, i32 undef>
%18 = select <16 x i1> %17, <16 x i8> <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1,
i8 1, i8 1, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8
undef, i8 undef>, <16 x i8> <i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8
undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef>
%19 = shufflevector <16 x i8> %18, <16 x i8> undef, <8 x i32> <i32 0, i32
1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
%20 = bitcast i8* %result to <8 x i8>*
store <8 x i8> %19, <8 x i8>* %20
```
The expectation here is that for x86 (w/SSE4), we'll end up with x86 code
something like this:
```
vmovq load_vec_a, %xmm0
vmovq load_vec_b, %xmm1
vpcmpeqb %xmm1, %xmm0, %xmm0
vpaddb .LCPI0_0(%rip), %xmm0, %xmm0 ## LCPI0_0 =
<2,2,2,2,2,2,2,2,u,u,u,u,u,u,u,u>
vmovq %xmm0, result
```
But after https://reviews.llvm.org/rL339875, however, the IR above emits
something more like:
```
vpmovzxbw load_vec_a, %xmm0
vpmovzxbw load_vec_b, %xmm1
vpcmpeqw %xmm1, %xmm0, %xmm0
vpacksswb %xmm0, %xmm0, %xmm0
vpsllw $7, %xmm0, %xmm0
vpand .LCPI0_0(%rip), %xmm0, %xmm0 ## LCPI0_0 =
<0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0>
vpxor %xmm1, %xmm1, %xmm1
vpcmpgtb %xmm0, %xmm1, %xmm0
vpaddb .LCPI0_1(%rip), %xmm0, %xmm0 ## LCPI0_1 =
<2,2,2,2,2,2,2,2,u,u,u,u,u,u,u,u>
vmovq %xmm0, result
```
Besides being twice as long, it's just odd -- why are we expanding the results
to 16 bit when the source, intermediate, and result are all 8 bit? (Note that
current top-of-tree produces slightly different output from this second
example, but the fundamental pathology of unnecessary-widening-and-narrowing is
still in place.)
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20180912/a451bf66/attachment.html>
More information about the llvm-bugs
mailing list