[llvm-bugs] [Bug 33584] New: Suboptimal lowering for _mm_mask_add_sd expansion

via llvm-bugs llvm-bugs at lists.llvm.org
Sun Jun 25 06:26:13 PDT 2017


            Bug ID: 33584
           Summary: Suboptimal lowering for _mm_mask_add_sd expansion
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: Windows NT
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: Backend: X86
          Assignee: unassignedbugs at nondot.org
          Reporter: zvi.rackover at intel.com
                CC: llvm-bugs at lists.llvm.org

Here's a possible LLVM IR expansion of _mm_mask_add_sd:

define <2 x double> @_mm_mask_add_sd(<2 x double> %src, i8 %k, <2 x double> %a,
<2 x double> %b) {
   %vecext.i = extractelement <2 x double> %b, i32 0
   %vecext1.i = extractelement <2 x double> %a, i32 0
   %add.i = fadd double %vecext1.i, %vecext.i
   %0 = and i8 %k, 1
   %tobool.i.i = icmp ne i8 %0, 0
   %__W.elt.i.i = extractelement <2 x double> %src, i32 0
   %vecext1.i.i = select i1 %tobool.i.i, double %add.i, double %__W.elt.i.i
   %vecins.i.i = insertelement <2 x double> %a, double %vecext1.i.i, i32 0
   ret <2 x double> %vecins.i.i

llc -mcpu=skx gives:
     vaddsd  %xmm2, %xmm1, %xmm2
     kmovd   %edi, %k1
     vmovsd  %xmm2, %xmm1, %xmm0 {%k1}

A better sequence would be:
     kmovw   %edi, %k1
     vaddsd  %xmm2, %xmm1, %xmm0 {%k1}

Here's the state of the selectionDAG just before the instruction selection
   t0: ch = EntryToken
   t7: v2f64,ch = CopyFromReg t0, Register:v2f64 %vreg2
                 t4: i32,ch = CopyFromReg t0, Register:i32 %vreg1
               t5: i8 = truncate t4
             t16: i8 = and t5, Constant:i8<1>
           t32: v1i1 = scalar_to_vector t16
             t13: f64 = extract_vector_elt t7, Constant:i64<0>
               t9: v2f64,ch = CopyFromReg t0, Register:v2f64 %vreg3
             t12: f64 = extract_vector_elt t9, Constant:i64<0>
           t14: f64 = fadd t13, t12
             t2: v2f64,ch = CopyFromReg t0, Register:v2f64 %vreg0
           t20: f64 = extract_vector_elt t2, Constant:i64<0>
         t33: f64 = X86ISD::SELECTS t32, t14, t20
       t30: v2f64 = scalar_to_vector t33
     t35: v2f64 = X86ISD::MOVSD t7, t30
   t25: ch,glue = CopyToReg t0, Register:v2f64 %XMM0, t35
   t26: ch = X86ISD::RET_FLAG t25, TargetConstant:i32<0>, Register:v2f64 %XMM0,

