[llvm-bugs] [Bug 45808] New: Suboptimal code for _mm256_zextsi128_si256(_mm_set1_epi8(-1))

via llvm-bugs llvm-bugs at lists.llvm.org
Tue May 5 13:03:37 PDT 2020


https://bugs.llvm.org/show_bug.cgi?id=45808

            Bug ID: 45808
           Summary: Suboptimal code for
                    _mm256_zextsi128_si256(_mm_set1_epi8(-1))
           Product: new-bugs
           Version: trunk
          Hardware: PC
                OS: Windows NT
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: new bugs
          Assignee: unassignedbugs at nondot.org
          Reporter: nemo at self-evident.org
                CC: htmldeveloper at gmail.com, llvm-bugs at lists.llvm.org

Related: Bug #45806 and https://stackoverflow.com/q/61601902/

I am trying to produce an AVX2 mask with all-ones in the lower lane and
all-zeroes in the upper lane of a YMM register. The code I am using is:

    __m256i mask = _mm256_zextsi128_si256(_mm_set1_epi8(-1));

This should produce a single instruction like `vpcmpeqd %xmm0,%xmm0,%xmm0`, but
Clang insists on putting the value into memory and loading it.

However, Clang insists on putting this into memory and loading it.

The behavior in context is even more odd:

    __m256i minmax(__m256i v1, __m256i v2)
    {
        __m256i comp = _mm256_cmpgt_epi64(v1, v2);
        __m256i mask = _mm256_zextsi128_si256(_mm_set1_epi8(-1));
        return _mm256_blendv_epi8(v2, v1, _mm256_xor_si256(comp, mask));
    }

This goes through a bunch of contortions with extracting, shifting, and
expanding 128-bit registers when I feel like the result I want is pretty
straightforward.

Godbolt example: https://gcc.godbolt.org/z/GPhJ6s

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20200505/9b64bdb1/attachment.html>


More information about the llvm-bugs mailing list