Hi suyog,<br>This is a good improvement, thanks for working on it!<br><br>I'll take a closer look today, but for now I did notice that the generated aarch64 assembly isn't as optimal as it could be. I'd expect:<br><br>Ldp q0, q1<br>Add v0.4s, v0.4s, v1.4s<br>Addv s0, v0.4s<br><br>Cheers,<br><br>James<br><div class="gmail_quote">On Tue, 16 Dec 2014 at 05:29, suyog <<a href="mailto:suyog.sarda@samsung.com">suyog.sarda@samsung.com</a>> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi nadav, aschwaighofer, jmolloy,<br>
<br>
This patch is enhancement to r224119 which vectorizes horizontal reductions from consecutive loads.<br>
<br>
Earlier in r224119, we handled tree :<br>
<br>
+<br>
/ \<br>
/ \<br>
+ +<br>
/ \ / \<br>
/ \ / \<br>
a[0] a[1] a[2] a[3]<br>
<br>
where originally, we had<br>
Left Right<br>
a[0] a[1]<br>
a[2] a[3]<br>
<br>
In r224119, we compared, (Left[i], Right[i]) and (Right[i], Left[i+1])<br>
<br>
Left Right<br>
a[0] ---> a[1]<br>
/<br>
/<br>
/<br>
\/<br>
a[2] a[3]<br>
<br>
<br>
And then rearrange it to<br>
Left Right<br>
a[0] a[2]<br>
a[1] a[3]<br>
so that, we can bundle left and right into vector of loads.<br>
<br>
However, with bigger tree,<br>
<br>
+<br>
/ \<br>
/ \<br>
/ \<br>
/ \<br>
+ +<br>
/ \ / \<br>
/ \ / \<br>
/ \ / \<br>
+ + + +<br>
/ \ / \ / \ / \<br>
0 1 2 3 4 5 6 7<br>
<br>
<br>
Left Right<br>
0 1<br>
4 5<br>
2 3<br>
6 7<br>
<br>
In this case, Comparison of Right[i] and Left[i+1] would fail, and code remains scalar.<br>
<br>
If we eliminate comparison Right[i] and Left[i+1], and just compare Left[i] with Right[i],<br>
we would be able to re-arrange Left and Right into :<br>
Left Right<br>
0 4<br>
1 5<br>
2 6<br>
3 7<br>
<br>
And then would bundle (0,1) (4,5) and (2,3) (6,7) into vector loads.<br>
And then have vector adds of (01, 45) and (23, 67).<br>
<br>
However, notice that, this would disturb the sequence of addition.<br>
Originally, (01) and (23) should have been added. Same with (45) and (67).<br>
For integer type addition, this would not create any issue, but for other<br>
data types with precision concerns, there might be a problem.<br>
<br>
ffast-math would have eliminated this precision concern, but it would have<br>
re-associated the tree itself into (+(+(+(+(0,1)2)3....)<br>
<br>
Hence, in this patch we are checking for integer types and then only skipping<br>
the extra comparison of (Right[i], Left[i+1]).<br>
<br>
With this patch, we now vectorize above type of tree for any length of consecutive loads<br>
of integer type.<br>
<br>
<br>
For test case:<br>
<br>
#include <arm_neon.h><br>
int hadd(int* a){<br>
return (a[0] + a[1]) + (a[2] + a[3]) + (a[4] + a[5]) + (a[6] + a[7]);<br>
}<br>
<br>
AArch64 assembly before this patch :<br>
<br>
ldp w8, w9, [x0]<br>
ldp w10, w11, [x0, #8]<br>
ldp w12, w13, [x0, #16]<br>
ldp w14, w15, [x0, #24]<br>
add w8, w8, w9<br>
add w9, w10, w11<br>
add w10, w12, w13<br>
add w11, w14, w15<br>
add w8, w8, w9<br>
add w9, w10, w11<br>
add w0, w8, w9<br>
ret<br>
<br>
AArch64 assembly after this patch :<br>
<br>
ldp d0, d1, [x0]<br>
ldp d2, d3, [x0, #16]<br>
add v0.2s, v0.2s, v2.2s<br>
add v1.2s, v1.2s, v3.2s<br>
add v0.2s, v0.2s, v1.2s<br>
fmov w8, s0<br>
mov w9, v0.s[1]<br>
add w0, w8, w9<br>
ret<br>
<br>
<br>
<br>
Please help in reviewing this patch. I did not run LNT as of now, since this is just enhancement<br>
to r224119. I will update with LNT results if required.<br>
<br>
Regards,<br>
Suyog<br>
<br>
REPOSITORY<br>
rL LLVM<br>
<br>
<a href="http://reviews.llvm.org/D6675" target="_blank">http://reviews.llvm.org/D6675</a><br>
<br>
Files:<br>
lib/Transforms/Vectorize/<u></u>SLPVectorizer.cpp<br>
test/Transforms/SLPVectorizer/<u></u>AArch64/horizontaladd.ll<br>
<br>
Index: lib/Transforms/Vectorize/<u></u>SLPVectorizer.cpp<br>
==============================<u></u>==============================<u></u>=======<br>
--- lib/Transforms/Vectorize/<u></u>SLPVectorizer.cpp<br>
+++ lib/Transforms/Vectorize/<u></u>SLPVectorizer.cpp<br>
@@ -1831,8 +1831,11 @@<br>
for (unsigned i = 0, e = Left.size(); i < e - 1; ++i) {<br>
if (!isa<LoadInst>(Left[i]) || !isa<LoadInst>(Right[i]))<br>
return;<br>
- if (!(isConsecutiveAccess(Left[i]<u></u>, Right[i]) &&<br>
- isConsecutiveAccess(Right[i], Left[i + 1])))<br>
+ LoadInst *L = dyn_cast<LoadInst>(Left[i]);<br>
+ bool isInt = L->getType()->isIntegerTy();<br>
+ if (!(isConsecutiveAccess(Left[i]<u></u>, Right[i])))<br>
+ continue;<br>
+ else if (!isInt && !isConsecutiveAccess(Right[i], Left[i + 1]))<br>
continue;<br>
else<br>
std::swap(Left[i + 1], Right[i]);<br>
Index: test/Transforms/SLPVectorizer/<u></u>AArch64/horizontaladd.ll<br>
==============================<u></u>==============================<u></u>=======<br>
--- test/Transforms/SLPVectorizer/<u></u>AArch64/horizontaladd.ll<br>
+++ test/Transforms/SLPVectorizer/<u></u>AArch64/horizontaladd.ll<br>
@@ -25,3 +25,34 @@<br>
%add5 = fadd float %add, %add4<br>
ret float %add5<br>
}<br>
+<br>
+; CHECK-LABEL: @hadd_int<br>
+; CHECK: load <2 x i32>*<br>
+; CHECK: add <2 x i32><br>
+; CHECK: extractelement <2 x i32><br>
+define i32 @hadd_int(i32* nocapture readonly %a) {<br>
+entry:<br>
+ %0 = load i32* %a, align 4<br>
+ %arrayidx1 = getelementptr inbounds i32* %a, i64 1<br>
+ %1 = load i32* %arrayidx1, align 4<br>
+ %arrayidx2 = getelementptr inbounds i32* %a, i64 2<br>
+ %2 = load i32* %arrayidx2, align 4<br>
+ %arrayidx3 = getelementptr inbounds i32* %a, i64 3<br>
+ %3 = load i32* %arrayidx3, align 4<br>
+ %arrayidx6 = getelementptr inbounds i32* %a, i64 4<br>
+ %4 = load i32* %arrayidx6, align 4<br>
+ %arrayidx7 = getelementptr inbounds i32* %a, i64 5<br>
+ %5 = load i32* %arrayidx7, align 4<br>
+ %arrayidx10 = getelementptr inbounds i32* %a, i64 6<br>
+ %6 = load i32* %arrayidx10, align 4<br>
+ %arrayidx11 = getelementptr inbounds i32* %a, i64 7<br>
+ %7 = load i32* %arrayidx11, align 4<br>
+ %add1 = add i32 %0, %1<br>
+ %add2 = add i32 %2, %3<br>
+ %add3 = add i32 %4, %5<br>
+ %add4 = add i32 %6, %7<br>
+ %add5 = add i32 %add1, %add2<br>
+ %add6 = add i32 %add3, %add4<br>
+ %add7 = add i32 %add5, %add6<br>
+ ret i32 %add7<br>
+}<br>
<br>
EMAIL PREFERENCES<br>
<a href="http://reviews.llvm.org/settings/panel/emailpreferences/" target="_blank">http://reviews.llvm.org/<u></u>settings/panel/<u></u>emailpreferences/</a><br>
______________________________<u></u>_________________<br>
llvm-commits mailing list<br>
<a href="mailto:llvm-commits@cs.uiuc.edu" target="_blank">llvm-commits@cs.uiuc.edu</a><br>
<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" target="_blank">http://lists.cs.uiuc.edu/<u></u>mailman/listinfo/llvm-commits</a><br>
</blockquote></div>