[PATCH] Load Widened by GVN not Vectorized by SLPVectorizer

Mon Dec 15 08:42:22 PST 2014

Hi Karthik,

This has been discussed in the past. The thread here has a bunch of
explanations by Chandler of why this is required:

http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140825/232509.html

>From that thread, my memory says that we want to do (1) - pattern matching
these loads. This also ties in quite well with the discussion going on
here: https://groups.google.com/forum/#!topic/llvm-dev/7XMvUpbUjuc where
Asghar-Ahmad is trying to pattern match integer promoted loads and stores.

These two patterns (GVN/SROA widened loads/stores and integer-promoted
loads/stores) are similar in that both the SLP and Loop vectorizers should
understand them. So perhaps we need some framework to enable this?

Cheers,

James

On Mon Dec 15 2014 at 12:56:20 PM Karthik Bhat <kv.bhat at samsung.com> wrote:

> Hi aschwaighofer, nadav,
>
> Hi All,
> I needed few inputs regarding a patch on which i'm currently working. For
> a code such as -
>
> int a[4],b[4],c[4],d[4];
> void fn() {
>   c[0] = a[0]*b[0]+d[0];
>   c[1] = a[1]*b[1]+d[1];
>   c[2] = a[2]*b[2]+d[2];
>   c[3] = a[3]*b[3]+d[3];
> }
> The current llvm trunc code doesn't vectorize this in 64bit machines as
> GVN which runs before SLP vectorizer widens the load to 64 bit load and the
> resulting pattern is not matched by SLP.
>
> I though of 2 approaches to solve the problem-
>
> 1) Add pattern matching capability in SLPVectorizer to recognize the
> widned load. [Need some input if we have to follow this approach]
>
>  - I'm able to match the pattern such as trunc 32 bit -> load and trunc
> 32-> lshr->load.
> But i'm stuck as i'm not sure how to split the widned 64 bit load to  2 32
> bit load to vectorize the same.
> I tried Builder.CreateLoad but in a code like -
>   load i64* bitcast (i32* getelementptr inbounds ([4 x i32]* @a, i64 0,
> i64 2) to i64*), align 8
>
> I'm unable to get the "getelementptr" info out of the widened load to
> construct the new load as
>   load i32* getelementptr inbounds ([4 x i32]* @a, i32 0, i32 2), align 4,
> !tbaa !1 and
>   load i32* getelementptr inbounds ([4 x i32]* @a, i32 0, i32 3), align 4,
> !tbaa !1
> which can be pushed into Operands while creating buildTree_rec.
>
> 2) Run GVN with load widnening only after vectorization to prevent GVN
> from combining loads before vectorization. [This patch].
>
> Although 1st approach looks more appropriate to me i'm unable to split the
> widned load to generate the vectorized code. Any inputs in this regard
> would be of great help.
>
> Please let me know if we should go ahead with approach 1 or 2. If 1 few
> inputs on how to split the widned load would be of great help.
>
> Thanks and Regards
> Karthik Bhat
>
> REPOSITORY
>   rL LLVM
>
> http://reviews.llvm.org/D6654
>
> Files:
>   lib/Transforms/IPO/PassManagerBuilder.cpp
>   test/Transforms/SLPVectorizer/X86/gvn-slp_ordering.ll
>
> Index: lib/Transforms/IPO/PassManagerBuilder.cpp
> ===================================================================
> --- lib/Transforms/IPO/PassManagerBuilder.cpp
> +++ lib/Transforms/IPO/PassManagerBuilder.cpp
> @@ -244,7 +244,7 @@
>    if (OptLevel > 1) {
>      if (EnableMLSM)
>        MPM.add(createMergedLoadStoreMotionPass()); // Merge ld/st in
> diamonds
> -    MPM.add(createGVNPass(DisableGVNLoadPRE));  // Remove redundancies
> +    MPM.add(createGVNPass(true));  // Remove redundancies
>    }
>    MPM.add(createMemCpyOptPass());             // Remove memcpy / form
> memset
>    MPM.add(createSCCPPass());                  // Constant prop with SCCP
> @@ -278,6 +278,9 @@
>        if (!DisableUnrollLoops)
>          MPM.add(createLoopUnrollPass());
>      }
> +
> +    if (!UseGVNAfterVectorization)
> +      MPM.add(createGVNPass(DisableGVNLoadPRE));
>    }
>
>    if (LoadCombine)
> @@ -343,6 +346,8 @@
>        if (!DisableUnrollLoops)
>          MPM.add(createLoopUnrollPass());
>      }
> +    if (!UseGVNAfterVectorization)
> +      MPM.add(createGVNPass(DisableGVNLoadPRE));
>    }
>
>    addExtensionsToPM(EP_Peephole, MPM);
> Index: test/Transforms/SLPVectorizer/X86/gvn-slp_ordering.ll
> ===================================================================
> --- test/Transforms/SLPVectorizer/X86/gvn-slp_ordering.ll
> +++ test/Transforms/SLPVectorizer/X86/gvn-slp_ordering.ll
> @@ -0,0 +1,45 @@
> +; RUN: opt -S -O2 -mtriple=x86_64-unknown-linux-gnu -mcpu=corei7-avx %s
> | FileCheck %s
> +
> +target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
> +target triple = "x86_64-unknown-linux-gnu"
> +
> +; CHECK: load <4 x i32>
> +; CHECK: mul nsw <4 x i32>
> +; CHECK: add nsw <4 x i32>
> +; CHECK: store <4 x i32
> +
> +target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
> +target triple = "x86_64-unknown-linux-gnu"
> +
> + at a = common global [4 x i32] zeroinitializer, align 16
> + at b = common global [4 x i32] zeroinitializer, align 16
> + at d = common global [4 x i32] zeroinitializer, align 16
> + at c = common global [4 x i32] zeroinitializer, align 16
> +
> +define void @fn() {
> +  %1 = load i32* getelementptr inbounds ([4 x i32]* @a, i32 0, i64 0),
> align 4
> +  %2 = load i32* getelementptr inbounds ([4 x i32]* @b, i32 0, i64 0),
> align 4
> +  %3 = mul nsw i32 %1, %2
> +  %4 = load i32* getelementptr inbounds ([4 x i32]* @d, i32 0, i64 0),
> align 4
> +  %5 = add nsw i32 %3, %4
> +  store i32 %5, i32* getelementptr inbounds ([4 x i32]* @c, i32 0, i64
> 0), align 4
> +  %6 = load i32* getelementptr inbounds ([4 x i32]* @a, i32 0, i64 1),
> align 4
> +  %7 = load i32* getelementptr inbounds ([4 x i32]* @b, i32 0, i64 1),
> align 4
> +  %8 = mul nsw i32 %6, %7
> +  %9 = load i32* getelementptr inbounds ([4 x i32]* @d, i32 0, i64 1),
> align 4
> +  %10 = add nsw i32 %8, %9
> +  store i32 %10, i32* getelementptr inbounds ([4 x i32]* @c, i32 0, i64
> 1), align 4
> +  %11 = load i32* getelementptr inbounds ([4 x i32]* @a, i32 0, i64 2),
> align 4
> +  %12 = load i32* getelementptr inbounds ([4 x i32]* @b, i32 0, i64 2),
> align 4
> +  %13 = mul nsw i32 %11, %12
> +  %14 = load i32* getelementptr inbounds ([4 x i32]* @d, i32 0, i64 2),
> align 4
> +  %15 = add nsw i32 %13, %14
> +  store i32 %15, i32* getelementptr inbounds ([4 x i32]* @c, i32 0, i64
> 2), align 4
> +  %16 = load i32* getelementptr inbounds ([4 x i32]* @a, i32 0, i64 3),
> align 4
> +  %17 = load i32* getelementptr inbounds ([4 x i32]* @b, i32 0, i64 3),
> align 4
> +  %18 = mul nsw i32 %16, %17
> +  %19 = load i32* getelementptr inbounds ([4 x i32]* @d, i32 0, i64 3),
> align 4
> +  %20 = add nsw i32 %18, %19
> +  store i32 %20, i32* getelementptr inbounds ([4 x i32]* @c, i32 0, i64
> 3), align 4
> +  ret void
> +}
>
> EMAIL PREFERENCES
>   http://reviews.llvm.org/settings/panel/emailpreferences/
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20141215/b8f527db/attachment.html>