[llvm-bugs] [Bug 52122] New: [x86] wrong codegen during vselect/pshufb optimisation involving zero literals
via llvm-bugs
llvm-bugs at lists.llvm.org
Sat Oct 9 07:52:39 PDT 2021
https://bugs.llvm.org/show_bug.cgi?id=52122
Bug ID: 52122
Summary: [x86] wrong codegen during vselect/pshufb optimisation
involving zero literals
Product: libraries
Version: trunk
Hardware: PC
OS: All
Status: NEW
Severity: normal
Priority: P
Component: Backend: X86
Assignee: unassignedbugs at nondot.org
Reporter: benjsith at gmail.com
CC: craig.topper at gmail.com, llvm-bugs at lists.llvm.org,
llvm-dev at redking.me.uk, pengfei.wang at intel.com,
spatel+llvm at rotateright.com
LLVM incorrectly handles an optimisation around changing a vselect/pshufb combo
that has constant zeros in one of the inputs.
Here is a minimal repro that triggers the issue:
__m256i do_stuff(__m256i I1, __m256i I2) {
__m256i X = _mm256_set1_epi64x(0);
__m256i A = _mm256_unpacklo_epi8(I1, X);
__m256i B = _mm256_unpacklo_epi8(A, I2);
__m256i C = _mm256_unpackhi_epi8(B, A);
return C;
}
compiled with -mavx2 and -O1 or -O2.
Here is an example snippet of code on Godbolt, showing the difference in
output:
https://godbolt.org/z/r5acr545K
Places that should have had 00 as the output byte would instead output one of
the input bytes.
The problematic parts of assembly are:
.LCPI0_1:
.byte 2 # 0x2
.byte 4 # 0x4
.byte 128 # 0x80
.zero 1
.zero 1
...
and then:
vpshufb ymm0, ymm0, ymmword ptr [rip + .LCPI0_1] # ymm0 =
ymm0[2,4],zero,ymm0[u,u,5],zero,ymm0[u,3,6],zero,ymm0[u,u,7],zero,ymm0[u,18,20],zero,ymm0[u,u,21],zero,ymm0[u,19,22],zero,ymm0[u,u,23],zero,ymm0[u]
The mask that gets generated for that pshufb should contain 0x80's to put a
constant zero in the destination instead of one of the input bytes. Instead,
the mask has null bytes, which loads the first byte of the source operand.
The following pass (in X86ISelLowering.cpp) seems to be the cause:
// fold vselect(cond, pshufb(x), pshufb(y)) -> or (pshufb(x), pshufb(y))
It calls getTargetShuffleMask(), and indirectly DecodePSHUFBMask(), to pull out
the mask of one of the operands. This function uses SM_SentinelZero (-2) to
mark indices that should load a constant zero. However, this mask then gets
passed to getConstVector(). getConstVector() interprets any negative numbers as
undef, and so these indices get marked as undef instead of a constant zero
(0x80). These undef indices end up being compiled as null bytes in the final
mask, which leads to the observed behaviour.
I'm not sure what a correct patch here would be. I found that converting from
SM_SentinelZero (-2) to 0x80 at this stage, before getConstVector(), fixed the
issue. But there's probably another way.
I verified the bug still repros on the latest trunk,
a1f0f847ff7d3944c992158226026024ccc67207.
PS: I don't know if this changes the priority, but this bug was found via a
fuzzer I wrote that's meant to test intrinsics compilation. It was not in code
I manually wrote
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20211009/89a92a07/attachment-0001.html>
More information about the llvm-bugs
mailing list