Hi Karthik,<br><br><div>This has been discussed in the past. The thread here has a bunch of explanations by Chandler of why this is required:</div><div><br></div><div><a href="http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140825/232509.html">http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140825/232509.html</a><br></div><div><br></div><div>From that thread, my memory says that we want to do (1) - pattern matching these loads. This also ties in quite well with the discussion going on here: <a href="https://groups.google.com/forum/#!topic/llvm-dev/7XMvUpbUjuc">https://groups.google.com/forum/#!topic/llvm-dev/7XMvUpbUjuc</a> where Asghar-Ahmad is trying to pattern match integer promoted loads and stores.</div><div><br></div><div>These two patterns (GVN/SROA widened loads/stores and integer-promoted loads/stores) are similar in that both the SLP and Loop vectorizers should understand them. So perhaps we need some framework to enable this?</div><div><br></div><div>Cheers,</div><div><br></div><div>James</div><br><div class="gmail_quote">On Mon Dec 15 2014 at 12:56:20 PM Karthik Bhat <<a href="mailto:kv.bhat@samsung.com" target="_blank">kv.bhat@samsung.com</a>> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi aschwaighofer, nadav,<br>
<br>
Hi All,<br>
I needed few inputs regarding a patch on which i'm currently working. For a code such as -<br>
<br>
int a[4],b[4],c[4],d[4];<br>
void fn() {<br>
c[0] = a[0]*b[0]+d[0];<br>
c[1] = a[1]*b[1]+d[1];<br>
c[2] = a[2]*b[2]+d[2];<br>
c[3] = a[3]*b[3]+d[3];<br>
}<br>
The current llvm trunc code doesn't vectorize this in 64bit machines as GVN which runs before SLP vectorizer widens the load to 64 bit load and the resulting pattern is not matched by SLP.<br>
<br>
I though of 2 approaches to solve the problem-<br>
<br>
1) Add pattern matching capability in SLPVectorizer to recognize the widned load. [Need some input if we have to follow this approach]<br>
<br>
- I'm able to match the pattern such as trunc 32 bit -> load and trunc 32-> lshr->load.<br>
But i'm stuck as i'm not sure how to split the widned 64 bit load to 2 32 bit load to vectorize the same.<br>
I tried Builder.CreateLoad but in a code like -<br>
load i64* bitcast (i32* getelementptr inbounds ([4 x i32]* @a, i64 0, i64 2) to i64*), align 8<br>
<br>
I'm unable to get the "getelementptr" info out of the widened load to construct the new load as<br>
load i32* getelementptr inbounds ([4 x i32]* @a, i32 0, i32 2), align 4, !tbaa !1 and<br>
load i32* getelementptr inbounds ([4 x i32]* @a, i32 0, i32 3), align 4, !tbaa !1<br>
which can be pushed into Operands while creating buildTree_rec.<br>
<br>
2) Run GVN with load widnening only after vectorization to prevent GVN from combining loads before vectorization. [This patch].<br>
<br>
Although 1st approach looks more appropriate to me i'm unable to split the widned load to generate the vectorized code. Any inputs in this regard would be of great help.<br>
<br>
Please let me know if we should go ahead with approach 1 or 2. If 1 few inputs on how to split the widned load would be of great help.<br>
<br>
Thanks and Regards<br>
Karthik Bhat<br>
<br>
REPOSITORY<br>
rL LLVM<br>
<br>
<a href="http://reviews.llvm.org/D6654" target="_blank">http://reviews.llvm.org/D6654</a><br>
<br>
Files:<br>
lib/Transforms/IPO/<u></u>PassManager<u></u>Builder.cpp<br>
test/Transforms/SLPVectorizer/<u></u><u></u>X86/gvn-slp_ordering.ll<br>
<br>
Index: lib/Transforms/IPO/<u></u>PassManager<u></u>Builder.cpp<br>
==============================<u></u><u></u>==============================<u></u><u></u>=======<br>
--- lib/Transforms/IPO/<u></u>PassManager<u></u>Builder.cpp<br>
+++ lib/Transforms/IPO/<u></u>PassManager<u></u>Builder.cpp<br>
@@ -244,7 +244,7 @@<br>
if (OptLevel > 1) {<br>
if (EnableMLSM)<br>
MPM.add(<u></u>createMergedLoadStore<u></u>MotionPas<u></u>s()); // Merge ld/st in diamonds<br>
- MPM.add(createGVNPass(<u></u>DisableG<u></u>VNLoadPRE)); // Remove redundancies<br>
+ MPM.add(createGVNPass(true)); // Remove redundancies<br>
}<br>
MPM.add(createMemCpyOptPass()<u></u>)<u></u>; // Remove memcpy / form memset<br>
MPM.add(createSCCPPass()); // Constant prop with SCCP<br>
@@ -278,6 +278,9 @@<br>
if (!DisableUnrollLoops)<br>
MPM.add(createLoopUnrollPass(<u></u>)<u></u>);<br>
}<br>
+<br>
+ if (!UseGVNAfterVectorization)<br>
+ MPM.add(createGVNPass(<u></u>DisableG<u></u>VNLoadPRE));<br>
}<br>
<br>
if (LoadCombine)<br>
@@ -343,6 +346,8 @@<br>
if (!DisableUnrollLoops)<br>
MPM.add(createLoopUnrollPass(<u></u>)<u></u>);<br>
}<br>
+ if (!UseGVNAfterVectorization)<br>
+ MPM.add(createGVNPass(<u></u>DisableG<u></u>VNLoadPRE));<br>
}<br>
<br>
addExtensionsToPM(EP_<u></u>Peephole, MPM);<br>
Index: test/Transforms/SLPVectorizer/<u></u><u></u>X86/gvn-slp_ordering.ll<br>
==============================<u></u><u></u>==============================<u></u><u></u>=======<br>
--- test/Transforms/SLPVectorizer/<u></u><u></u>X86/gvn-slp_ordering.ll<br>
+++ test/Transforms/SLPVectorizer/<u></u><u></u>X86/gvn-slp_ordering.ll<br>
@@ -0,0 +1,45 @@<br>
+; RUN: opt -S -O2 -mtriple=x86_64-unknown-linux-<u></u><u></u>gnu -mcpu=corei7-avx %s | FileCheck %s<br>
+<br>
+target datalayout = "e-m:e-i64:64-f80:128-n8:16:<u></u>32<u></u>:64-S128"<br>
+target triple = "x86_64-unknown-linux-gnu"<br>
+<br>
+; CHECK: load <4 x i32><br>
+; CHECK: mul nsw <4 x i32><br>
+; CHECK: add nsw <4 x i32><br>
+; CHECK: store <4 x i32<br>
+<br>
+target datalayout = "e-m:e-i64:64-f80:128-n8:16:<u></u>32<u></u>:64-S128"<br>
+target triple = "x86_64-unknown-linux-gnu"<br>
+<br>
+@a = common global [4 x i32] zeroinitializer, align 16<br>
+@b = common global [4 x i32] zeroinitializer, align 16<br>
+@d = common global [4 x i32] zeroinitializer, align 16<br>
+@c = common global [4 x i32] zeroinitializer, align 16<br>
+<br>
+define void @fn() {<br>
+ %1 = load i32* getelementptr inbounds ([4 x i32]* @a, i32 0, i64 0), align 4<br>
+ %2 = load i32* getelementptr inbounds ([4 x i32]* @b, i32 0, i64 0), align 4<br>
+ %3 = mul nsw i32 %1, %2<br>
+ %4 = load i32* getelementptr inbounds ([4 x i32]* @d, i32 0, i64 0), align 4<br>
+ %5 = add nsw i32 %3, %4<br>
+ store i32 %5, i32* getelementptr inbounds ([4 x i32]* @c, i32 0, i64 0), align 4<br>
+ %6 = load i32* getelementptr inbounds ([4 x i32]* @a, i32 0, i64 1), align 4<br>
+ %7 = load i32* getelementptr inbounds ([4 x i32]* @b, i32 0, i64 1), align 4<br>
+ %8 = mul nsw i32 %6, %7<br>
+ %9 = load i32* getelementptr inbounds ([4 x i32]* @d, i32 0, i64 1), align 4<br>
+ %10 = add nsw i32 %8, %9<br>
+ store i32 %10, i32* getelementptr inbounds ([4 x i32]* @c, i32 0, i64 1), align 4<br>
+ %11 = load i32* getelementptr inbounds ([4 x i32]* @a, i32 0, i64 2), align 4<br>
+ %12 = load i32* getelementptr inbounds ([4 x i32]* @b, i32 0, i64 2), align 4<br>
+ %13 = mul nsw i32 %11, %12<br>
+ %14 = load i32* getelementptr inbounds ([4 x i32]* @d, i32 0, i64 2), align 4<br>
+ %15 = add nsw i32 %13, %14<br>
+ store i32 %15, i32* getelementptr inbounds ([4 x i32]* @c, i32 0, i64 2), align 4<br>
+ %16 = load i32* getelementptr inbounds ([4 x i32]* @a, i32 0, i64 3), align 4<br>
+ %17 = load i32* getelementptr inbounds ([4 x i32]* @b, i32 0, i64 3), align 4<br>
+ %18 = mul nsw i32 %16, %17<br>
+ %19 = load i32* getelementptr inbounds ([4 x i32]* @d, i32 0, i64 3), align 4<br>
+ %20 = add nsw i32 %18, %19<br>
+ store i32 %20, i32* getelementptr inbounds ([4 x i32]* @c, i32 0, i64 3), align 4<br>
+ ret void<br>
+}<br>
<br>
EMAIL PREFERENCES<br>
<a href="http://reviews.llvm.org/settings/panel/emailpreferences/" target="_blank">http://reviews.llvm.org/<u></u>settin<u></u>gs/panel/<u></u>emailpreferences/</a><br>
______________________________<u></u><u></u>_________________<br>
llvm-commits mailing list<br>
<a href="mailto:llvm-commits@cs.uiuc.edu" target="_blank">llvm-commits@cs.uiuc.edu</a><br>
<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" target="_blank">http://lists.cs.uiuc.edu/<u></u>mailm<u></u>an/listinfo/llvm-commits</a><br>
</blockquote></div>