[PATCH] [SLPVectorization] Vectorize flat addition in a single tree (+(+(+ v1 v2) v3) v4)

Wed Dec 31 02:53:46 PST 2014

Hi nadav, aschwaighofer, mzolotukhin, jmolloy,

This is one more patch based on previous discussions.

This patch vectorizes flat addition of integer type from a single array whose 
expression tree is of type (+(+(+ v1 v2) v3) v4).

e.g.

        int foo (int *a) {
          return a[0] + a[1] + a[2] + a[3];
        }

The IR for above code is :

      define i32 @hadd(i32* %a) {
      entry:
          %0 = load i32* %a, align 4
          %arrayidx1 = getelementptr inbounds i32* %a, i32 1
          %1 = load i32* %arrayidx1, align 4
          %add = add nsw i32 %0, %1
          %arrayidx2 = getelementptr inbounds i32* %a, i32 2
          %2 = load i32* %arrayidx2, align 4
          %add3 = add nsw i32 %add, %2
          %arrayidx4 = getelementptr inbounds i32* %a, i32 3
          %3 = load i32* %arrayidx4, align 4
          %add5 = add nsw i32 %add3, %3
           ret i32 %add5
        }

The above addition can be modeled as combination of two shuffle vectors, two vector adds and an extractelement instruction.

After vectorization with this patch IR :

         define i32 @hadd(i32* %a) {
          entry:
              %0 = bitcast i32* %a to <4 x i32>*
              %1 = load <4 x i32>* %0, align 4
              %rdx.shuf = shufflevector <4 x i32> %1, <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
              %bin.rdx = add <4 x i32> %1, %rdx.shuf
              %rdx.shuf1 = shufflevector <4 x i32> %bin.rdx, <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
              %bin.rdx2 = add <4 x i32> %bin.rdx, %rdx.shuf1
              %2 = extractelement <4 x i32> %bin.rdx2, i32 0
              ret i32 %2
          }

AArch assembly before patch :

        ldp	 w8, w9, [x0]
	ldp	w10, w11, [x0, #8]
	add	 w8, w8, w9
	add	 w8, w8, w10
	add	 w0, w8, w11
	ret

AArch assembly after this patch:

        ldr	q0, [x0]
	ext	v1.16b, v0.16b, v0.16b, #8
	add	v0.4s, v0.4s, v1.4s
	dup	v1.4s, v0.s[1]
	add	v0.4s, v0.4s, v1.4s
	fmov	w0, s0
        ret

This patch handles any number of such addition like a[0]-a[7]. Added test case for same.

I have written a newfunction "matchFlatReduction" to identify this type of tree as i didn't want to disturb the original "matchAssociateReduction".

Please help in reviewing this patch. No make-check regressions observed.

Regards,
Suyog

REPOSITORY
  rL LLVM

http://reviews.llvm.org/D6818

Files:
  lib/Transforms/Vectorize/SLPVectorizer.cpp
  test/Transforms/SLPVectorizer/AArch64/flatadd.ll

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D6818.17744.patch
Type: text/x-patch
Size: 9446 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20141231/41ae8693/attachment.bin>