[PATCH] D96405: [DAGCombiner] Improve reduceBuildVecToShuffle Performance
Simon Pilgrim via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Feb 11 03:22:47 PST 2021
RKSimon added inline comments.
================
Comment at: llvm/test/CodeGen/X86/vector-shuffle-combining-avx512bwvl.ll:115
+; X86-NEXT: vpsraw $8, %xmm1, %xmm1
+; X86-NEXT: vpunpcklqdq {{.*#+}} ymm0 = ymm0[0],ymm1[0],ymm0[2],ymm1[2]
+; X86-NEXT: vmovdqu %ymm0, (%eax)
----------------
mmarjieh wrote:
> RKSimon wrote:
> > At quick glance - this looks wrong, I'd expect this still to be the same vshufpd?
> I am not familiar with X86's ISA.
> Can you explain why?
> Meanwhile, I will show you the difference in the DAG after my patch:
>
> Before this patch:
> SelectionDAG has 28 nodes:
> t0: ch = EntryToken
> t5: v4i64,ch = load<(load 32 from `<4 x i64>* null`, align 8)> t0, Constant:i32<0>, undef:i32
> t6: v4i64,ch = load<(load 32 from `<4 x i64>* undef`, align 8)> t0, undef:i32, undef:i32
> t21: ch = TokenFactor t5:1, t6:1
> t36: v8i16 = BUILD_VECTOR Constant:i16<0>, Constant:i16<0>, Constant:i16<0>, Constant:i16<0>, undef:i16, undef:i16, undef:i16, undef:i16
> t52: v2f64 = bitcast t36
> t62: v4f64 = concat_vectors t52, undef:v2f64
> t63: v4f64 = vector_shuffle<u,u,0,0> t62, undef:v4f64
> t37: v8i16 = X86ISD::VTRUNC t5
> t39: v8i16 = sign_extend_inreg t37, ValueType:ch:v8i8
> t44: v2f64 = bitcast t39
> t40: v8i16 = X86ISD::VTRUNC t6
> t41: v8i16 = sign_extend_inreg t40, ValueType:ch:v8i8
> t49: v2f64 = bitcast t41
> t59: v4f64 = concat_vectors t44, t49
> t68: v4f64 = vector_shuffle<4,6,2,3> t63, t59
> t3: i32,ch = load<(load 4 from %fixed-stack.0)> t0, FrameIndex:i32<-1>, undef:i32
> t57: ch = store<(store 32 into %ir.10, align 2)> t21, t68, t3, undef:i32
> t29: ch = X86ISD::RET_FLAG t57, TargetConstant:i32<0>
>
>
> After this patch:
> SelectionDAG has 26 nodes:
> t0: ch = EntryToken
> t5: v4i64,ch = load<(load 32 from `<4 x i64>* null`, align 8)> t0, Constant:i32<0>, undef:i32
> t6: v4i64,ch = load<(load 32 from `<4 x i64>* undef`, align 8)> t0, undef:i32, undef:i32
> t21: ch = TokenFactor t5:1, t6:1
> t37: v8i16 = X86ISD::VTRUNC t5
> t39: v8i16 = sign_extend_inreg t37, ValueType:ch:v8i8
> t44: v2f64 = bitcast t39
> t40: v8i16 = X86ISD::VTRUNC t6
> t41: v8i16 = sign_extend_inreg t40, ValueType:ch:v8i8
> t49: v2f64 = bitcast t41
> t59: v4f64 = concat_vectors t44, t49
> t36: v8i16 = BUILD_VECTOR Constant:i16<0>, Constant:i16<0>, Constant:i16<0>, Constant:i16<0>, undef:i16, undef:i16, undef:i16, undef:i16
> t52: v2f64 = bitcast t36
> t62: v4f64 = concat_vectors t52, undef:v2f64
> t64: v4f64 = vector_shuffle<0,2,4,4> t59, t62
> t3: i32,ch = load<(load 4 from %fixed-stack.0)> t0, FrameIndex:i32<-1>, undef:i32
> t57: ch = store<(store 32 into %ir.10, align 2)> t21, t64, t3, undef:i32
> t29: ch = X86ISD::RET_FLAG t57, TargetConstant:i32<0>
>
My mistake - I missed that we were implicitly zeroing the upper elements (xmm -> ymm) - sorry about that
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D96405/new/
https://reviews.llvm.org/D96405
More information about the llvm-commits
mailing list