[PATCH] [SLPVectorization] Enhance Ability to Vectorize Horizontal Reductions from Consecutive Loads

Mon Dec 15 21:22:57 PST 2014

Hi nadav, aschwaighofer, jmolloy,

This patch is enhancement to r224119 which vectorizes horizontal reductions from consecutive loads.

Earlier in r224119, we handled tree :

                                                  +
                                                /    \
                                              /       \
                                            +         +
                                           /  \       /  \
                                         /     \     /    \
                                     a[0]  a[1] a[2] a[3]

where originally, we had
                                      Left              Right
                                      a[0]              a[1]
                                      a[2]              a[3]

In r224119, we compared, (Left[i], Right[i]) and (Right[i], Left[i+1])

                                     Left        Right
                                     a[0] ---> a[1]
                                                /             
                                               /                                                        
                                              /
                                            \/
                                     a[2]       a[3]


And then rearrange it to 
                                     Left        Right
                                     a[0]        a[2]
                                     a[1]        a[3]
so that, we can bundle left and right into vector of loads.

However, with bigger tree,

                                                + 
                                              /    \ 
                                            /       \ 
                                          /          \
                                         /            \
                                        +              +
                                      /   \            /  \
                                     /     \          /    \
                                    /       \        /      \
                                  +        +      +       +
                                 /  \      /  \    /  \     /  \
                               0    1   2   3  4   5  6   7
 
                            
                                  Left              Right
                                   0                  1
                                   4                  5
                                   2                  3
                                   6                  7

In this case, Comparison of Right[i]  and Left[i+1] would fail, and code remains scalar.

If we eliminate comparison Right[i] and Left[i+1], and just compare Left[i] with Right[i],
we would be able to re-arrange Left and Right into :
                               Left               Right
                                0                    4
                                1                    5
                                2                    6
                                3                    7

And then would bundle (0,1) (4,5) and (2,3) (6,7) into vector loads.
And then have vector adds of (01, 45) and (23, 67).

However, notice that, this would disturb the sequence of addition.
Originally, (01) and (23) should have been added. Same with (45) and (67).
For integer type addition, this would not create any issue, but for other 
data types with precision concerns, there might be a problem. 

ffast-math would have eliminated this precision concern, but it would have 
re-associated the tree itself into (+(+(+(+(0,1)2)3....)

Hence, in this patch we are checking for integer types and then only skipping 
the extra comparison of (Right[i], Left[i+1]). 

With this patch, we now vectorize above type of tree for any length of consecutive loads
of integer type.


For test case: 

                #include <arm_neon.h>
                int hadd(int* a){
                    return (a[0] + a[1]) + (a[2] + a[3]) + (a[4] + a[5]) + (a[6] + a[7]);
                }

AArch64 assembly before this patch :

                ldp      w8, w9, [x0]
                ldp     w10, w11, [x0, #8]
                ldp     w12, w13, [x0, #16]
                ldp     w14, w15, [x0, #24]
                add      w8, w8, w9
                add      w9, w10, w11
                add      w10, w12, w13
                add      w11, w14, w15
                add      w8, w8, w9
                add      w9, w10, w11
                add      w0, w8, w9
                ret

AArch64 assembly after this patch :

                ldp      d0, d1, [x0]
                ldp     d2, d3, [x0, #16]
                add     v0.2s, v0.2s, v2.2s
                add     v1.2s, v1.2s, v3.2s
                add     v0.2s, v0.2s, v1.2s
                fmov    w8, s0
                mov             w9, v0.s[1]
                add      w0, w8, w9
                ret



Please help in reviewing this patch. I did not run LNT as of now, since this is just enhancement
to r224119. I will update with LNT results if required. 

Regards,
Suyog

REPOSITORY
  rL LLVM

http://reviews.llvm.org/D6675

Files:
  lib/Transforms/Vectorize/SLPVectorizer.cpp
  test/Transforms/SLPVectorizer/AArch64/horizontaladd.ll

Index: lib/Transforms/Vectorize/SLPVectorizer.cpp
===================================================================

--- lib/Transforms/Vectorize/SLPVectorizer.cpp
+++ lib/Transforms/Vectorize/SLPVectorizer.cpp
@@ -1831,8 +1831,11 @@
   for (unsigned i = 0, e = Left.size(); i < e - 1; ++i) {
     if (!isa<LoadInst>(Left[i]) || !isa<LoadInst>(Right[i]))
       return;
-    if (!(isConsecutiveAccess(Left[i], Right[i]) &&
-          isConsecutiveAccess(Right[i], Left[i + 1])))
+    LoadInst *L = dyn_cast<LoadInst>(Left[i]);
+    bool isInt = L->getType()->isIntegerTy();
+    if (!(isConsecutiveAccess(Left[i], Right[i])))
+      continue;
+    else if (!isInt && !isConsecutiveAccess(Right[i], Left[i + 1]))
       continue;
     else
       std::swap(Left[i + 1], Right[i]);
Index: test/Transforms/SLPVectorizer/AArch64/horizontaladd.ll
===================================================================
--- test/Transforms/SLPVectorizer/AArch64/horizontaladd.ll
+++ test/Transforms/SLPVectorizer/AArch64/horizontaladd.ll
@@ -25,3 +25,34 @@
   %add5 = fadd float %add, %add4
   ret float %add5
 }
+
+; CHECK-LABEL: @hadd_int
+; CHECK: load <2 x i32>*
+; CHECK: add <2 x i32>
+; CHECK: extractelement <2 x i32>
+define i32 @hadd_int(i32* nocapture readonly %a) {
+entry:
+  %0 = load i32* %a, align 4
+  %arrayidx1 = getelementptr inbounds i32* %a, i64 1
+  %1 = load i32* %arrayidx1, align 4
+  %arrayidx2 = getelementptr inbounds i32* %a, i64 2
+  %2 = load i32* %arrayidx2, align 4
+  %arrayidx3 = getelementptr inbounds i32* %a, i64 3
+  %3 = load i32* %arrayidx3, align 4
+  %arrayidx6 = getelementptr inbounds i32* %a, i64 4
+  %4 = load i32* %arrayidx6, align 4
+  %arrayidx7 = getelementptr inbounds i32* %a, i64 5
+  %5 = load i32* %arrayidx7, align 4
+  %arrayidx10 = getelementptr inbounds i32* %a, i64 6
+  %6 = load i32* %arrayidx10, align 4
+  %arrayidx11 = getelementptr inbounds i32* %a, i64 7
+  %7 = load i32* %arrayidx11, align 4
+  %add1 = add i32 %0, %1
+  %add2 = add i32 %2, %3
+  %add3 = add i32 %4, %5
+  %add4 = add i32 %6, %7
+  %add5 = add i32 %add1, %add2
+  %add6 = add i32 %add3, %add4
+  %add7 = add i32 %add5, %add6
+  ret i32 %add7
+}

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D6675.17319.patch
Type: text/x-patch
Size: 2175 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20141216/3e1a973b/attachment.bin>