[PATCH] D59710: [SLP] remove lower limit for forming reduction patterns

Fri Nov 1 08:26:19 PDT 2019

spatel marked an inline comment as done.
spatel added inline comments.

================
Comment at: llvm/test/Transforms/SLPVectorizer/X86/hadd.ll:82-86
+; CHECK-NEXT:    [[RDX_SHUF1:%.*]] = shufflevector <2 x i64> [[A:%.*]], <2 x i64> undef, <2 x i32> <i32 1, i32 undef>
+; CHECK-NEXT:    [[BIN_RDX2:%.*]] = add <2 x i64> [[RDX_SHUF1]], [[A]]
+; CHECK-NEXT:    [[RDX_SHUF:%.*]] = shufflevector <2 x i64> [[B:%.*]], <2 x i64> undef, <2 x i32> <i32 1, i32 undef>
+; CHECK-NEXT:    [[BIN_RDX:%.*]] = add <2 x i64> [[RDX_SHUF]], [[B]]
+; CHECK-NEXT:    [[R01:%.*]] = shufflevector <2 x i64> [[BIN_RDX2]], <2 x i64> [[BIN_RDX]], <2 x i32> <i32 0, i32 2>
----------------
ABataev wrote:
> This one is worse than it was before for SSE
Here are the SSE alternatives:

Without SLP (original IR):

```
movq	%xmm0, %rax
pshufd	$78, %xmm0, %xmm0       ## xmm0 = xmm0[2,3,0,1]
movq	%xmm0, %rcx
addq	%rax, %rcx
movq	%xmm1, %rax
pshufd	$78, %xmm1, %xmm0       ## xmm0 = xmm1[2,3,0,1]
movq	%xmm0, %rdx
addq	%rax, %rdx
movq	%rdx, %xmm1
movq	%rcx, %xmm0
punpcklqdq	%xmm1, %xmm0    ## xmm0 = xmm0[0],xmm1[0]

```
With SLP currently:

```
movdqa	%xmm0, %xmm2
punpcklqdq	%xmm1, %xmm2    ## xmm2 = xmm2[0],xmm1[0]
punpckhqdq	%xmm1, %xmm0    ## xmm0 = xmm0[1],xmm1[1]
paddq	%xmm2, %xmm0

```
With this SLP patch:

```
pshufd	$78, %xmm0, %xmm2       ## xmm2 = xmm0[2,3,0,1]
paddq	%xmm2, %xmm0
pshufd	$78, %xmm1, %xmm2       ## xmm2 = xmm1[2,3,0,1]
paddq	%xmm1, %xmm2
punpcklqdq	%xmm2, %xmm0    ## xmm0 = xmm0[0],xmm2[0]

```
Ideally, we can get SLP to choose the shorter sequence (bypass treating this as a reduction). 

I don't think we can ask instcombine to create that sequence because it requires creating semi-arbitrary shuffle instuctions.

Or we can view this as a backend opportunity to reduce a shuffle-binop-shuffle sequence:
        t6: v2i64 = vector_shuffle<1,u> t2, undef:v2i64
      t7: v2i64 = add t6, t2
        t8: v2i64 = vector_shuffle<1,u> t4, undef:v2i64
      t9: v2i64 = add t8, t4
    t10: v2i64 = vector_shuffle<0,2> t7, t9

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59710/new/

https://reviews.llvm.org/D59710