[PATCH] D79718: [x86][CGP] enable target hook to sink funnel shift intrinsic's splatted shift amount
Sanjay Patel via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue May 12 12:21:57 PDT 2020
spatel marked an inline comment as done.
spatel added inline comments.
================
Comment at: llvm/test/CodeGen/X86/vector-fshl-128.ll:2195
; AVX1-NEXT: movq $-1024, %rax # imm = 0xFC00
-; AVX1-NEXT: vpand {{.*}}(%rip), %xmm0, %xmm0
-; AVX1-NEXT: vpslld $23, %xmm0, %xmm0
-; AVX1-NEXT: vpaddd {{.*}}(%rip), %xmm0, %xmm0
-; AVX1-NEXT: vcvttps2dq %xmm0, %xmm0
-; AVX1-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[1,1,3,3]
+; AVX1-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,0,0,0]
+; AVX1-NEXT: vpand {{.*}}(%rip), %xmm0, %xmm1
----------------
craig.topper wrote:
> spatel wrote:
> > craig.topper wrote:
> > > Why are we splatting the scalar here when only element 0 is used?
> > This is another limitation caused by the block-level visibility - SDAG doesn't know that the splat is from a scalar because we are only sinking the shuffle instruction, not the insertelement:
> > t12: v4i32,ch = CopyFromReg t0, Register:v4i32 %0
> > t14: v4i32 = vector_shuffle<0,0,0,0> t12, undef:v4i32
> >
> > The splat doesn't get hoisted back out of the loop until later in MachineLICM, and there's apparently no really late analysis for demanded elements.
> >
> > We could try to sink insertelement to shuffles. That should probably be another patch though.
> I'm still confused. Shouldn't demandedelts inside selectiondag have determined the splat shuffle was unnecessary regardless of it coming from an insertelement?
Ah, I see. Starting from the x86 shift nodes, we should see that we only need the low chunk. I didn't step through, but there are many potential candidates here that would foil the analysis: too many intervening nodes, casts to different sizes, and/or multiple uses:
t14: v4i32 = vector_shuffle<0,0,0,0> t12, undef:v4i32
t45: v4i32 = BUILD_VECTOR Constant:i32<31>, Constant:i32<31>, Constant:i32<31>, Constant:i32<31>
t46: v4i32 = and t14, t45
t25: ch = CopyToReg t0, Register:i64 %2, t23
t54: v2i64 = zero_extend_vector_inreg t46
t55: v4i32 = bitcast t54
t56: v4i32 = X86ISD::VSHL t10, t55
t48: v4i32 = BUILD_VECTOR Constant:i32<32>, Constant:i32<32>, Constant:i32<32>, Constant:i32<32>
t49: v4i32 = sub t48, t46
t58: v2i64 = zero_extend_vector_inreg t49
t59: v4i32 = bitcast t58
t60: v4i32 = X86ISD::VSRL t10, t59
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D79718/new/
https://reviews.llvm.org/D79718
More information about the llvm-commits
mailing list