[PATCH] D79718: [x86][CGP] enable target hook to sink funnel shift intrinsic's splatted shift amount

Tue May 12 12:21:57 PDT 2020

spatel marked an inline comment as done.
spatel added inline comments.

================
Comment at: llvm/test/CodeGen/X86/vector-fshl-128.ll:2195
 ; AVX1-NEXT:    movq $-1024, %rax # imm = 0xFC00
-; AVX1-NEXT:    vpand {{.*}}(%rip), %xmm0, %xmm0
-; AVX1-NEXT:    vpslld $23, %xmm0, %xmm0
-; AVX1-NEXT:    vpaddd {{.*}}(%rip), %xmm0, %xmm0
-; AVX1-NEXT:    vcvttps2dq %xmm0, %xmm0
-; AVX1-NEXT:    vpshufd {{.*#+}} xmm1 = xmm0[1,1,3,3]
+; AVX1-NEXT:    vpshufd {{.*#+}} xmm0 = xmm0[0,0,0,0]
+; AVX1-NEXT:    vpand {{.*}}(%rip), %xmm0, %xmm1
----------------
craig.topper wrote:
> spatel wrote:
> > craig.topper wrote:
> > > Why are we splatting the scalar here when only element 0 is used?
> > This is another limitation caused by the block-level visibility - SDAG doesn't know that the splat is from a scalar because we are only sinking the shuffle instruction, not the insertelement:
> >       t12: v4i32,ch = CopyFromReg t0, Register:v4i32 %0
> >     t14: v4i32 = vector_shuffle<0,0,0,0> t12, undef:v4i32
> > 
> > The splat doesn't get hoisted back out of the loop until later in MachineLICM, and there's apparently no really late analysis for demanded elements.
> > 
> > We could try to sink insertelement to shuffles. That should probably be another patch though.
> I'm still confused. Shouldn't demandedelts inside selectiondag have determined the splat shuffle was unnecessary regardless of it coming from an insertelement? 
Ah, I see. Starting from the x86 shift nodes, we should see that we only need the low chunk. I didn't step through, but there are many potential candidates here that would foil the analysis: too many intervening nodes, casts to different sizes, and/or multiple uses:

    t14: v4i32 = vector_shuffle<0,0,0,0> t12, undef:v4i32
    t45: v4i32 = BUILD_VECTOR Constant:i32<31>, Constant:i32<31>, Constant:i32<31>, Constant:i32<31>
  t46: v4i32 = and t14, t45
        t25: ch = CopyToReg t0, Register:i64 %2, t23
                t54: v2i64 = zero_extend_vector_inreg t46
              t55: v4i32 = bitcast t54
            t56: v4i32 = X86ISD::VSHL t10, t55
                    t48: v4i32 = BUILD_VECTOR Constant:i32<32>, Constant:i32<32>, Constant:i32<32>, Constant:i32<32>
                  t49: v4i32 = sub t48, t46
                t58: v2i64 = zero_extend_vector_inreg t49
              t59: v4i32 = bitcast t58
            t60: v4i32 = X86ISD::VSRL t10, t59

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79718/new/

https://reviews.llvm.org/D79718