[PATCH] [SLPVectorization] Enhance Ability to Vectorize Horizontal Reductions from Consecutive Loads
James Molloy
james at jamesmolloy.co.uk
Mon Dec 15 23:25:42 PST 2014
Hi suyog,
This is a good improvement, thanks for working on it!
I'll take a closer look today, but for now I did notice that the generated
aarch64 assembly isn't as optimal as it could be. I'd expect:
Ldp q0, q1
Add v0.4s, v0.4s, v1.4s
Addv s0, v0.4s
Cheers,
James
On Tue, 16 Dec 2014 at 05:29, suyog <suyog.sarda at samsung.com> wrote:
> Hi nadav, aschwaighofer, jmolloy,
>
> This patch is enhancement to r224119 which vectorizes horizontal
> reductions from consecutive loads.
>
> Earlier in r224119, we handled tree :
>
> +
> / \
> / \
> + +
> / \ / \
> / \ / \
> a[0] a[1] a[2] a[3]
>
> where originally, we had
> Left Right
> a[0] a[1]
> a[2] a[3]
>
> In r224119, we compared, (Left[i], Right[i]) and (Right[i], Left[i+1])
>
> Left Right
> a[0] ---> a[1]
> /
> /
> /
> \/
> a[2] a[3]
>
>
> And then rearrange it to
> Left Right
> a[0] a[2]
> a[1] a[3]
> so that, we can bundle left and right into vector of loads.
>
> However, with bigger tree,
>
> +
> / \
> / \
> / \
> / \
> + +
> / \ / \
> / \ / \
> / \ / \
> + + + +
> / \ / \ / \ / \
> 0 1 2 3 4 5 6 7
>
>
> Left Right
> 0 1
> 4 5
> 2 3
> 6 7
>
> In this case, Comparison of Right[i] and Left[i+1] would fail, and code
> remains scalar.
>
> If we eliminate comparison Right[i] and Left[i+1], and just compare
> Left[i] with Right[i],
> we would be able to re-arrange Left and Right into :
> Left Right
> 0 4
> 1 5
> 2 6
> 3 7
>
> And then would bundle (0,1) (4,5) and (2,3) (6,7) into vector loads.
> And then have vector adds of (01, 45) and (23, 67).
>
> However, notice that, this would disturb the sequence of addition.
> Originally, (01) and (23) should have been added. Same with (45) and (67).
> For integer type addition, this would not create any issue, but for other
> data types with precision concerns, there might be a problem.
>
> ffast-math would have eliminated this precision concern, but it would have
> re-associated the tree itself into (+(+(+(+(0,1)2)3....)
>
> Hence, in this patch we are checking for integer types and then only
> skipping
> the extra comparison of (Right[i], Left[i+1]).
>
> With this patch, we now vectorize above type of tree for any length of
> consecutive loads
> of integer type.
>
>
> For test case:
>
> #include <arm_neon.h>
> int hadd(int* a){
> return (a[0] + a[1]) + (a[2] + a[3]) + (a[4] + a[5]) +
> (a[6] + a[7]);
> }
>
> AArch64 assembly before this patch :
>
> ldp w8, w9, [x0]
> ldp w10, w11, [x0, #8]
> ldp w12, w13, [x0, #16]
> ldp w14, w15, [x0, #24]
> add w8, w8, w9
> add w9, w10, w11
> add w10, w12, w13
> add w11, w14, w15
> add w8, w8, w9
> add w9, w10, w11
> add w0, w8, w9
> ret
>
> AArch64 assembly after this patch :
>
> ldp d0, d1, [x0]
> ldp d2, d3, [x0, #16]
> add v0.2s, v0.2s, v2.2s
> add v1.2s, v1.2s, v3.2s
> add v0.2s, v0.2s, v1.2s
> fmov w8, s0
> mov w9, v0.s[1]
> add w0, w8, w9
> ret
>
>
>
> Please help in reviewing this patch. I did not run LNT as of now, since
> this is just enhancement
> to r224119. I will update with LNT results if required.
>
> Regards,
> Suyog
>
> REPOSITORY
> rL LLVM
>
> http://reviews.llvm.org/D6675
>
> Files:
> lib/Transforms/Vectorize/SLPVectorizer.cpp
> test/Transforms/SLPVectorizer/AArch64/horizontaladd.ll
>
> Index: lib/Transforms/Vectorize/SLPVectorizer.cpp
> ===================================================================
> --- lib/Transforms/Vectorize/SLPVectorizer.cpp
> +++ lib/Transforms/Vectorize/SLPVectorizer.cpp
> @@ -1831,8 +1831,11 @@
> for (unsigned i = 0, e = Left.size(); i < e - 1; ++i) {
> if (!isa<LoadInst>(Left[i]) || !isa<LoadInst>(Right[i]))
> return;
> - if (!(isConsecutiveAccess(Left[i], Right[i]) &&
> - isConsecutiveAccess(Right[i], Left[i + 1])))
> + LoadInst *L = dyn_cast<LoadInst>(Left[i]);
> + bool isInt = L->getType()->isIntegerTy();
> + if (!(isConsecutiveAccess(Left[i], Right[i])))
> + continue;
> + else if (!isInt && !isConsecutiveAccess(Right[i], Left[i + 1]))
> continue;
> else
> std::swap(Left[i + 1], Right[i]);
> Index: test/Transforms/SLPVectorizer/AArch64/horizontaladd.ll
> ===================================================================
> --- test/Transforms/SLPVectorizer/AArch64/horizontaladd.ll
> +++ test/Transforms/SLPVectorizer/AArch64/horizontaladd.ll
> @@ -25,3 +25,34 @@
> %add5 = fadd float %add, %add4
> ret float %add5
> }
> +
> +; CHECK-LABEL: @hadd_int
> +; CHECK: load <2 x i32>*
> +; CHECK: add <2 x i32>
> +; CHECK: extractelement <2 x i32>
> +define i32 @hadd_int(i32* nocapture readonly %a) {
> +entry:
> + %0 = load i32* %a, align 4
> + %arrayidx1 = getelementptr inbounds i32* %a, i64 1
> + %1 = load i32* %arrayidx1, align 4
> + %arrayidx2 = getelementptr inbounds i32* %a, i64 2
> + %2 = load i32* %arrayidx2, align 4
> + %arrayidx3 = getelementptr inbounds i32* %a, i64 3
> + %3 = load i32* %arrayidx3, align 4
> + %arrayidx6 = getelementptr inbounds i32* %a, i64 4
> + %4 = load i32* %arrayidx6, align 4
> + %arrayidx7 = getelementptr inbounds i32* %a, i64 5
> + %5 = load i32* %arrayidx7, align 4
> + %arrayidx10 = getelementptr inbounds i32* %a, i64 6
> + %6 = load i32* %arrayidx10, align 4
> + %arrayidx11 = getelementptr inbounds i32* %a, i64 7
> + %7 = load i32* %arrayidx11, align 4
> + %add1 = add i32 %0, %1
> + %add2 = add i32 %2, %3
> + %add3 = add i32 %4, %5
> + %add4 = add i32 %6, %7
> + %add5 = add i32 %add1, %add2
> + %add6 = add i32 %add3, %add4
> + %add7 = add i32 %add5, %add6
> + ret i32 %add7
> +}
>
> EMAIL PREFERENCES
> http://reviews.llvm.org/settings/panel/emailpreferences/
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20141216/79b77f75/attachment.html>
More information about the llvm-commits
mailing list