[PATCH] D59710: [SLP] remove lower limit for forming reduction patterns
Alexey Bataev via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Nov 1 11:18:12 PDT 2019
ABataev added inline comments.
================
Comment at: llvm/test/Transforms/SLPVectorizer/X86/hadd.ll:82-86
+; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <2 x i64> [[A:%.*]], <2 x i64> undef, <2 x i32> <i32 1, i32 undef>
+; CHECK-NEXT: [[BIN_RDX2:%.*]] = add <2 x i64> [[RDX_SHUF1]], [[A]]
+; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <2 x i64> [[B:%.*]], <2 x i64> undef, <2 x i32> <i32 1, i32 undef>
+; CHECK-NEXT: [[BIN_RDX:%.*]] = add <2 x i64> [[RDX_SHUF]], [[B]]
+; CHECK-NEXT: [[R01:%.*]] = shufflevector <2 x i64> [[BIN_RDX2]], <2 x i64> [[BIN_RDX]], <2 x i32> <i32 0, i32 2>
----------------
spatel wrote:
> ABataev wrote:
> > This one is worse than it was before for SSE
> Here are the SSE alternatives:
>
> Without SLP (original IR):
>
> ```
> movq %xmm0, %rax
> pshufd $78, %xmm0, %xmm0 ## xmm0 = xmm0[2,3,0,1]
> movq %xmm0, %rcx
> addq %rax, %rcx
> movq %xmm1, %rax
> pshufd $78, %xmm1, %xmm0 ## xmm0 = xmm1[2,3,0,1]
> movq %xmm0, %rdx
> addq %rax, %rdx
> movq %rdx, %xmm1
> movq %rcx, %xmm0
> punpcklqdq %xmm1, %xmm0 ## xmm0 = xmm0[0],xmm1[0]
>
> ```
> With SLP currently:
>
> ```
> movdqa %xmm0, %xmm2
> punpcklqdq %xmm1, %xmm2 ## xmm2 = xmm2[0],xmm1[0]
> punpckhqdq %xmm1, %xmm0 ## xmm0 = xmm0[1],xmm1[1]
> paddq %xmm2, %xmm0
>
> ```
> With this SLP patch:
>
> ```
> pshufd $78, %xmm0, %xmm2 ## xmm2 = xmm0[2,3,0,1]
> paddq %xmm2, %xmm0
> pshufd $78, %xmm1, %xmm2 ## xmm2 = xmm1[2,3,0,1]
> paddq %xmm1, %xmm2
> punpcklqdq %xmm2, %xmm0 ## xmm0 = xmm0[0],xmm2[0]
>
> ```
> Ideally, we can get SLP to choose the shorter sequence (bypass treating this as a reduction).
>
> I don't think we can ask instcombine to create that sequence because it requires creating semi-arbitrary shuffle instuctions.
>
> Or we can view this as a backend opportunity to reduce a shuffle-binop-shuffle sequence:
> t6: v2i64 = vector_shuffle<1,u> t2, undef:v2i64
> t7: v2i64 = add t6, t2
> t8: v2i64 = vector_shuffle<1,u> t4, undef:v2i64
> t9: v2i64 = add t8, t4
> t10: v2i64 = vector_shuffle<0,2> t7, t9
>
Maybe, try to reduce 2 elements only after regular reduction did not work somehow?
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D59710/new/
https://reviews.llvm.org/D59710
More information about the llvm-commits
mailing list