[PATCH] Load Widened by GVN not Vectorized by SLPVectorizer

Mon Dec 15 04:45:59 PST 2014

Hi aschwaighofer, nadav,

Hi All,
I needed few inputs regarding a patch on which i'm currently working. For a code such as -

int a[4],b[4],c[4],d[4];
void fn() {
  c[0] = a[0]*b[0]+d[0];
  c[1] = a[1]*b[1]+d[1];
  c[2] = a[2]*b[2]+d[2];
  c[3] = a[3]*b[3]+d[3];
}
The current llvm trunc code doesn't vectorize this in 64bit machines as GVN which runs before SLP vectorizer widens the load to 64 bit load and the resulting pattern is not matched by SLP.

I though of 2 approaches to solve the problem-

1) Add pattern matching capability in SLPVectorizer to recognize the widned load. [Need some input if we have to follow this approach]

 - I'm able to match the pattern such as trunc 32 bit -> load and trunc 32-> lshr->load.
But i'm stuck as i'm not sure how to split the widned 64 bit load to  2 32 bit load to vectorize the same.
I tried Builder.CreateLoad but in a code like -
  load i64* bitcast (i32* getelementptr inbounds ([4 x i32]* @a, i64 0, i64 2) to i64*), align 8

I'm unable to get the "getelementptr" info out of the widened load to construct the new load as  
  load i32* getelementptr inbounds ([4 x i32]* @a, i32 0, i32 2), align 4, !tbaa !1 and 
  load i32* getelementptr inbounds ([4 x i32]* @a, i32 0, i32 3), align 4, !tbaa !1
which can be pushed into Operands while creating buildTree_rec.

2) Run GVN with load widnening only after vectorization to prevent GVN from combining loads before vectorization. [This patch].

Although 1st approach looks more appropriate to me i'm unable to split the widned load to generate the vectorized code. Any inputs in this regard would be of great help.

Please let me know if we should go ahead with approach 1 or 2. If 1 few inputs on how to split the widned load would be of great help.

Thanks and Regards
Karthik Bhat

REPOSITORY
  rL LLVM

http://reviews.llvm.org/D6654

Files:
  lib/Transforms/IPO/PassManagerBuilder.cpp
  test/Transforms/SLPVectorizer/X86/gvn-slp_ordering.ll

Index: lib/Transforms/IPO/PassManagerBuilder.cpp
===================================================================

--- lib/Transforms/IPO/PassManagerBuilder.cpp
+++ lib/Transforms/IPO/PassManagerBuilder.cpp
@@ -244,7 +244,7 @@
   if (OptLevel > 1) {
     if (EnableMLSM)
       MPM.add(createMergedLoadStoreMotionPass()); // Merge ld/st in diamonds
-    MPM.add(createGVNPass(DisableGVNLoadPRE));  // Remove redundancies
+    MPM.add(createGVNPass(true));  // Remove redundancies
   }
   MPM.add(createMemCpyOptPass());             // Remove memcpy / form memset
   MPM.add(createSCCPPass());                  // Constant prop with SCCP
@@ -278,6 +278,9 @@
       if (!DisableUnrollLoops)
         MPM.add(createLoopUnrollPass());
     }
+
+    if (!UseGVNAfterVectorization)
+      MPM.add(createGVNPass(DisableGVNLoadPRE));
   }
 
   if (LoadCombine)
@@ -343,6 +346,8 @@
       if (!DisableUnrollLoops)
         MPM.add(createLoopUnrollPass());
     }
+    if (!UseGVNAfterVectorization)
+      MPM.add(createGVNPass(DisableGVNLoadPRE));
   }
 
   addExtensionsToPM(EP_Peephole, MPM);
Index: test/Transforms/SLPVectorizer/X86/gvn-slp_ordering.ll
===================================================================
--- test/Transforms/SLPVectorizer/X86/gvn-slp_ordering.ll
+++ test/Transforms/SLPVectorizer/X86/gvn-slp_ordering.ll
@@ -0,0 +1,45 @@
+; RUN: opt -S -O2 -mtriple=x86_64-unknown-linux-gnu -mcpu=corei7-avx %s | FileCheck %s
+
+target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+; CHECK: load <4 x i32>
+; CHECK: mul nsw <4 x i32>
+; CHECK: add nsw <4 x i32>
+; CHECK: store <4 x i32
+
+target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+ at a = common global [4 x i32] zeroinitializer, align 16
+ at b = common global [4 x i32] zeroinitializer, align 16
+ at d = common global [4 x i32] zeroinitializer, align 16
+ at c = common global [4 x i32] zeroinitializer, align 16
+
+define void @fn() {
+  %1 = load i32* getelementptr inbounds ([4 x i32]* @a, i32 0, i64 0), align 4
+  %2 = load i32* getelementptr inbounds ([4 x i32]* @b, i32 0, i64 0), align 4
+  %3 = mul nsw i32 %1, %2
+  %4 = load i32* getelementptr inbounds ([4 x i32]* @d, i32 0, i64 0), align 4
+  %5 = add nsw i32 %3, %4
+  store i32 %5, i32* getelementptr inbounds ([4 x i32]* @c, i32 0, i64 0), align 4
+  %6 = load i32* getelementptr inbounds ([4 x i32]* @a, i32 0, i64 1), align 4
+  %7 = load i32* getelementptr inbounds ([4 x i32]* @b, i32 0, i64 1), align 4
+  %8 = mul nsw i32 %6, %7
+  %9 = load i32* getelementptr inbounds ([4 x i32]* @d, i32 0, i64 1), align 4
+  %10 = add nsw i32 %8, %9
+  store i32 %10, i32* getelementptr inbounds ([4 x i32]* @c, i32 0, i64 1), align 4
+  %11 = load i32* getelementptr inbounds ([4 x i32]* @a, i32 0, i64 2), align 4
+  %12 = load i32* getelementptr inbounds ([4 x i32]* @b, i32 0, i64 2), align 4
+  %13 = mul nsw i32 %11, %12
+  %14 = load i32* getelementptr inbounds ([4 x i32]* @d, i32 0, i64 2), align 4
+  %15 = add nsw i32 %13, %14
+  store i32 %15, i32* getelementptr inbounds ([4 x i32]* @c, i32 0, i64 2), align 4
+  %16 = load i32* getelementptr inbounds ([4 x i32]* @a, i32 0, i64 3), align 4
+  %17 = load i32* getelementptr inbounds ([4 x i32]* @b, i32 0, i64 3), align 4
+  %18 = mul nsw i32 %16, %17
+  %19 = load i32* getelementptr inbounds ([4 x i32]* @d, i32 0, i64 3), align 4
+  %20 = add nsw i32 %18, %19
+  store i32 %20, i32* getelementptr inbounds ([4 x i32]* @c, i32 0, i64 3), align 4
+  ret void
+}

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D6654.17277.patch
Type: text/x-patch
Size: 3559 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20141215/fc03f2cb/attachment.bin>