[llvm-bugs] [Bug 45808] New: Suboptimal code for _mm256_zextsi128_si256(_mm_set1_epi8(-1))
via llvm-bugs
llvm-bugs at lists.llvm.org
Tue May 5 13:03:37 PDT 2020
https://bugs.llvm.org/show_bug.cgi?id=45808
Bug ID: 45808
Summary: Suboptimal code for
_mm256_zextsi128_si256(_mm_set1_epi8(-1))
Product: new-bugs
Version: trunk
Hardware: PC
OS: Windows NT
Status: NEW
Severity: enhancement
Priority: P
Component: new bugs
Assignee: unassignedbugs at nondot.org
Reporter: nemo at self-evident.org
CC: htmldeveloper at gmail.com, llvm-bugs at lists.llvm.org
Related: Bug #45806 and https://stackoverflow.com/q/61601902/
I am trying to produce an AVX2 mask with all-ones in the lower lane and
all-zeroes in the upper lane of a YMM register. The code I am using is:
__m256i mask = _mm256_zextsi128_si256(_mm_set1_epi8(-1));
This should produce a single instruction like `vpcmpeqd %xmm0,%xmm0,%xmm0`, but
Clang insists on putting the value into memory and loading it.
However, Clang insists on putting this into memory and loading it.
The behavior in context is even more odd:
__m256i minmax(__m256i v1, __m256i v2)
{
__m256i comp = _mm256_cmpgt_epi64(v1, v2);
__m256i mask = _mm256_zextsi128_si256(_mm_set1_epi8(-1));
return _mm256_blendv_epi8(v2, v1, _mm256_xor_si256(comp, mask));
}
This goes through a bunch of contortions with extracting, shifting, and
expanding 128-bit registers when I feel like the result I want is pretty
straightforward.
Godbolt example: https://gcc.godbolt.org/z/GPhJ6s
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20200505/9b64bdb1/attachment.html>
More information about the llvm-bugs
mailing list