[llvm-bugs] [Bug 40340] New: [X86] Add selective commutation support for insertps

via llvm-bugs llvm-bugs at lists.llvm.org
Wed Jan 16 10:41:56 PST 2019


            Bug ID: 40340
           Summary: [X86] Add selective commutation support for insertps
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: Windows NT
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: Backend: X86
          Assignee: unassignedbugs at nondot.org
          Reporter: llvm-dev at redking.me.uk
                CC: craig.topper at gmail.com, llvm-bugs at lists.llvm.org,
                    llvm-dev at redking.me.uk, spatel+llvm at rotateright.com


#include <x86intrin.h>

__m128 ii(__m128 x, __m128 *y) {
    return _mm_insert_ps(*y, x, (1<<6) | (1<<4) | 5);

define <4 x float> @_Z2iiDv4_fPS_(<4 x float>, <4 x float>* nocapture readonly)
  %3 = load <4 x float>, <4 x float>* %1, align 16
  %4 = tail call <4 x float> @llvm.x86.sse41.insertps(<4 x float> %3, <4 x
float> %0, i8 85)
  ret <4 x float> %4
declare <4 x float> @llvm.x86.sse41.insertps(<4 x float>, <4 x float>, i8) #1

_Z2iiDv4_fPS_: # @_Z2iiDv4_fPS_
  vmovaps (%rdi), %xmm1
  vinsertps $85, %xmm0, %xmm1, %xmm0 # xmm0 = zero,xmm0[1],zero,xmm1[3]

When we have 1 "inline" element from each source and zeros everywhere else we
should be able to commute the immediate to allow the memory fold.

You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20190116/88abbe91/attachment.html>

More information about the llvm-bugs mailing list