<html>
<head>
<base href="https://bugs.llvm.org/">
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW - vector-compare code does unnecessary widening/narrowing"
href="https://bugs.llvm.org/show_bug.cgi?id=38916">38916</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>vector-compare code does unnecessary widening/narrowing
</td>
</tr>
<tr>
<th>Product</th>
<td>new-bugs
</td>
</tr>
<tr>
<th>Version</th>
<td>trunk
</td>
</tr>
<tr>
<th>Hardware</th>
<td>All
</td>
</tr>
<tr>
<th>OS</th>
<td>All
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>normal
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>new bugs
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>srj@google.com
</td>
</tr>
<tr>
<th>CC</th>
<td>llvm-bugs@lists.llvm.org
</td>
</tr></table>
<p>
<div>
<pre>The change in <a href="https://reviews.llvm.org/rL339875">https://reviews.llvm.org/rL339875</a> seems to have regressed the
quality of some vector-compare code generation (for x86 at least) in Halide.
Halide is attempting to generate code that is comparing two <8 x i8> vectors;
```
vec<8 x i8> ones = {1,1,1,1,1,1,1,1};
vec<8 x i8> twos = {2,2,2,2,2,2,2,2};
vec<8 x i8> a = load_vec_a();
vec<8 x i8> b = load_vec_b();
// result should contain 1 for each byte that matches, 2 for each that does
not
vec<8 x i8> result = (a == b) ? ones : twos;
```
The unoptimized LLVM IR we generate for the above is:
```
%9 = bitcast i8* %load_vec_a to <8 x i8>*
%10 = load <8 x i8>, <8 x i8>* %9
%11 = bitcast i8* %load_vec_b to <8 x i8>*
%12 = load <8 x i8>, <8 x i8>* %11
%13 = shufflevector <8 x i8> %10, <8 x i8> undef, <16 x i32> <i32 0, i32 1,
i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32
undef, i32 undef, i32 undef, i32 undef, i32 undef>
%14 = shufflevector <8 x i8> %12, <8 x i8> undef, <16 x i32> <i32 0, i32 1,
i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32
undef, i32 undef, i32 undef, i32 undef, i32 undef>
%15 = icmp eq <16 x i8> %13, %14
%16 = shufflevector <16 x i1> %15, <16 x i1> undef, <8 x i32> <i32 0, i32
1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
%17 = shufflevector <8 x i1> %16, <8 x i1> undef, <16 x i32> <i32 0, i32 1,
i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32
undef, i32 undef, i32 undef, i32 undef, i32 undef>
%18 = select <16 x i1> %17, <16 x i8> <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1,
i8 1, i8 1, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8
undef, i8 undef>, <16 x i8> <i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8 2, i8
undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef>
%19 = shufflevector <16 x i8> %18, <16 x i8> undef, <8 x i32> <i32 0, i32
1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
%20 = bitcast i8* %result to <8 x i8>*
store <8 x i8> %19, <8 x i8>* %20
```
The expectation here is that for x86 (w/SSE4), we'll end up with x86 code
something like this:
```
vmovq load_vec_a, %xmm0
vmovq load_vec_b, %xmm1
vpcmpeqb %xmm1, %xmm0, %xmm0
vpaddb .LCPI0_0(%rip), %xmm0, %xmm0 ## LCPI0_0 =
<2,2,2,2,2,2,2,2,u,u,u,u,u,u,u,u>
vmovq %xmm0, result
```
But after <a href="https://reviews.llvm.org/rL339875">https://reviews.llvm.org/rL339875</a>, however, the IR above emits
something more like:
```
vpmovzxbw load_vec_a, %xmm0
vpmovzxbw load_vec_b, %xmm1
vpcmpeqw %xmm1, %xmm0, %xmm0
vpacksswb %xmm0, %xmm0, %xmm0
vpsllw $7, %xmm0, %xmm0
vpand .LCPI0_0(%rip), %xmm0, %xmm0 ## LCPI0_0 =
<0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0>
vpxor %xmm1, %xmm1, %xmm1
vpcmpgtb %xmm0, %xmm1, %xmm0
vpaddb .LCPI0_1(%rip), %xmm0, %xmm0 ## LCPI0_1 =
<2,2,2,2,2,2,2,2,u,u,u,u,u,u,u,u>
vmovq %xmm0, result
```
Besides being twice as long, it's just odd -- why are we expanding the results
to 16 bit when the source, intermediate, and result are all 8 bit? (Note that
current top-of-tree produces slightly different output from this second
example, but the fundamental pathology of unnecessary-widening-and-narrowing is
still in place.)</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>