<html>
<head>
<base href="https://bugs.llvm.org/">
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW - [x86] wrong codegen during vselect/pshufb optimisation involving zero literals"
href="https://bugs.llvm.org/show_bug.cgi?id=52122">52122</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>[x86] wrong codegen during vselect/pshufb optimisation involving zero literals
</td>
</tr>
<tr>
<th>Product</th>
<td>libraries
</td>
</tr>
<tr>
<th>Version</th>
<td>trunk
</td>
</tr>
<tr>
<th>Hardware</th>
<td>PC
</td>
</tr>
<tr>
<th>OS</th>
<td>All
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>normal
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>Backend: X86
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>benjsith@gmail.com
</td>
</tr>
<tr>
<th>CC</th>
<td>craig.topper@gmail.com, llvm-bugs@lists.llvm.org, llvm-dev@redking.me.uk, pengfei.wang@intel.com, spatel+llvm@rotateright.com
</td>
</tr></table>
<p>
<div>
<pre>LLVM incorrectly handles an optimisation around changing a vselect/pshufb combo
that has constant zeros in one of the inputs.
Here is a minimal repro that triggers the issue:
__m256i do_stuff(__m256i I1, __m256i I2) {
__m256i X = _mm256_set1_epi64x(0);
__m256i A = _mm256_unpacklo_epi8(I1, X);
__m256i B = _mm256_unpacklo_epi8(A, I2);
__m256i C = _mm256_unpackhi_epi8(B, A);
return C;
}
compiled with -mavx2 and -O1 or -O2.
Here is an example snippet of code on Godbolt, showing the difference in
output:
<a href="https://godbolt.org/z/r5acr545K">https://godbolt.org/z/r5acr545K</a>
Places that should have had 00 as the output byte would instead output one of
the input bytes.
The problematic parts of assembly are:
.LCPI0_1:
.byte 2 # 0x2
.byte 4 # 0x4
.byte 128 # 0x80
.zero 1
.zero 1
...
and then:
vpshufb ymm0, ymm0, ymmword ptr [rip + .LCPI0_1] # ymm0 =
ymm0[2,4],zero,ymm0[u,u,5],zero,ymm0[u,3,6],zero,ymm0[u,u,7],zero,ymm0[u,18,20],zero,ymm0[u,u,21],zero,ymm0[u,19,22],zero,ymm0[u,u,23],zero,ymm0[u]
The mask that gets generated for that pshufb should contain 0x80's to put a
constant zero in the destination instead of one of the input bytes. Instead,
the mask has null bytes, which loads the first byte of the source operand.
The following pass (in X86ISelLowering.cpp) seems to be the cause:
// fold vselect(cond, pshufb(x), pshufb(y)) -> or (pshufb(x), pshufb(y))
It calls getTargetShuffleMask(), and indirectly DecodePSHUFBMask(), to pull out
the mask of one of the operands. This function uses SM_SentinelZero (-2) to
mark indices that should load a constant zero. However, this mask then gets
passed to getConstVector(). getConstVector() interprets any negative numbers as
undef, and so these indices get marked as undef instead of a constant zero
(0x80). These undef indices end up being compiled as null bytes in the final
mask, which leads to the observed behaviour.
I'm not sure what a correct patch here would be. I found that converting from
SM_SentinelZero (-2) to 0x80 at this stage, before getConstVector(), fixed the
issue. But there's probably another way.
I verified the bug still repros on the latest trunk,
a1f0f847ff7d3944c992158226026024ccc67207.
PS: I don't know if this changes the priority, but this bug was found via a
fuzzer I wrote that's meant to test intrinsics compilation. It was not in code
I manually wrote</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>