[llvm-bugs] [Bug 39984] New: Recognize and optimize movemask equivalents for 16/32/64 bits

via llvm-bugs llvm-bugs at lists.llvm.org
Wed Dec 12 15:13:58 PST 2018


https://bugs.llvm.org/show_bug.cgi?id=39984

            Bug ID: 39984
           Summary: Recognize and optimize movemask equivalents for
                    16/32/64 bits
           Product: clang
           Version: trunk
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: -New Bugs
          Assignee: unassignedclangbugs at nondot.org
          Reporter: bugzilla at poradnik-webmastera.com
                CC: htmldeveloper at gmail.com, llvm-bugs at lists.llvm.org,
                    neeilans at live.com, richard-llvm at metafoo.co.uk

I found that new clang version (clang version 8.0.0 (trunk 348905)) is able to
optimize expressions which uses both vectors and masks. However there is one
case which is not optimized - movemask equivalents for 16/32/64 bits data. As
"true" in vectors is stored as -1 (all bits set), it means that it is enough to
extract any bit from every vector element to get similar result as movemask
would have. Two possible solutions for 16-bit data are below in test()
function. First one uses packs+movemask instructions, second uses pext. In the
latter case it is also possible to use different masks as 2nd arg, as both
would give the same results.

Code for 32 and 64 bit data would look similar - you need to use move packs to
reduce mask vector to 8-bit data, or wider mask for pext.

For reference I also added function test2() which operates on 8-bit data, and
optimized assembler code generated from it. Resulting code for test() after
fixing this issue should be similar.

[code]
#include <immintrin.h>

void test(void* data1, void* data2)
{
    __m128i v1 = _mm_load_si128((__m128i const*)data1);
    __m128i v2 = _mm_load_si128((__m128i const*)data2);

    v1 = _mm_cmpeq_epi16(v1, _mm_setzero_si128());
    v2 = _mm_cmpeq_epi16(v2, _mm_setzero_si128());

#if 1
    v1 = _mm_packs_epi16(v1, _mm_setzero_si128());
    v2 = _mm_packs_epi16(v2, _mm_setzero_si128());

    int m1 = _mm_movemask_epi8(v1);
    int m2 = _mm_movemask_epi8(v2);
#else
    int m1 = _mm_movemask_epi8(v1);
    int m2 = _mm_movemask_epi8(v2);
    #define PEXT_MASK 0xaaaa
    //#define PEXT_MASK 0x5555
    m1 = _pext_u32(m1, PEXT_MASK);
    m2 = _pext_u32(m2, PEXT_MASK);
#endif

    int m = (m1 | 3) & (m2 | 3);

    v1 = _mm_maskz_add_epi16(m, v1, v2);
    _mm_store_si128((__m128i*)data2, v1);
}

void test2(void* data1, void* data2)
{
    __m128i v1 = _mm_load_si128((__m128i const*)data1);
    __m128i v2 = _mm_load_si128((__m128i const*)data2);

    __m128i vc1 = _mm_cmpeq_epi8(v1, _mm_setzero_si128());
    __m128i vc2 = _mm_cmpeq_epi8(v2, _mm_setzero_si128());

    int m1 = _mm_movemask_epi8(vc1);
    int m2 = _mm_movemask_epi8(vc2);

    int m = (m1 | 3) & (m2 | 3);

    v1 = _mm_maskz_add_epi8(m, v1, v2);
    _mm_store_si128((__m128i*)data2, v1);
}
[/code]

[asm]

[/asm]

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20181212/1ac1714c/attachment.html>


More information about the llvm-bugs mailing list