[llvm-bugs] [Bug 41512] New: Conversion from int to XMM is handled inefficiently on SSE4

via llvm-bugs llvm-bugs at lists.llvm.org
Tue Apr 16 04:22:21 PDT 2019


https://bugs.llvm.org/show_bug.cgi?id=41512

            Bug ID: 41512
           Summary: Conversion from int to XMM is handled inefficiently on
                    SSE4
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: Windows NT
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: Backend: X86
          Assignee: unassignedbugs at nondot.org
          Reporter: spreis at yandex-team.ru
                CC: craig.topper at gmail.com, llvm-bugs at lists.llvm.org,
                    llvm-dev at redking.me.uk, spatel+llvm at rotateright.com

Created attachment 21786
  --> https://bugs.llvm.org/attachment.cgi?id=21786&action=edit
Proposed fix

In attempt to swicth all our builds to SSE4 from SSSE3 we found out that code
as simple as

    const __m128i lo = _mm_cvtsi32_si128(d0[value]);
    const __m128i hi = _mm_cvtsi32_si128(d0[value+1024]);
    val = _mm_add_epi64(val, _mm_unpacklo_epi64(lo, hi));

or

    const __m128i all = _mm_set_epi32(0, d0[value], 0, d0[value+1024]);
    val = _mm_add_epi64(val, all);


When inlined into loop performs worse when compiled with -sse4.1 than with just
-ssse3.

The problem is that _mm_cvtsi32_si128() and _mm_set_epi32() both modeled via
INSERT_VECTOR_ELT, and 

  %13 = insertelement <4 x i32> <i32 undef, i32 0, i32 undef, i32 0>, i32 %12,
i32 0, !dbg !287

Lowered to single movd instruction prior to SSE4 and to xor+pinsrd on SSE4.
https://gcc.godbolt.org/z/qY8nkO

* Notice that in a kernel fucntion in 2nd case there are couple of movd's, but
when used in loop it results in pair of pinsrd from memory into same register.

This seems to me like poor instruction selection both from performance and code
size standpopints.

I suggset steering instruction selection for this idiomatic case of
INSERT_VECTOR_ELT to SCALAR_TO_VECTOR. This will directly lead to movd
emission.

Proposed change to lib/Target/X86/X86ISelLowering.cpp is attached.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20190416/a873c978/attachment.html>


More information about the llvm-bugs mailing list