[llvm] r320548 - [SLP] Vectorize jumbled memory loads.
Shahid, Asghar-ahmad via llvm-commits
llvm-commits at lists.llvm.org
Wed Dec 20 07:50:35 PST 2017
Hi Sanjoy,
Thanks for pointing the bug. I have reverted this patch in r321181 to fix the pending issues.
Regards,
Shahid
> -----Original Message-----
> From: Sanjoy Das [mailto:sanjoy at playingwithpointers.com]
> Sent: Wednesday, December 20, 2017 5:17 AM
> To: Shahid, Asghar-ahmad <Asghar-ahmad.Shahid at amd.com>
> Cc: llvm-commits <llvm-commits at lists.llvm.org>
> Subject: Re: [llvm] r320548 - [SLP] Vectorize jumbled memory loads.
>
> Hi,
>
> I think this change is buggy for cases like this:
>
> target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
> target triple = "x86_64-grtev4-linux-gnu"
>
> @array = external global [20 x [13 x float]]
>
> define void @hoge(i64 %idx, <4 x i32>* %sink) {
> bb:
> %tmp = getelementptr inbounds [20 x [13 x float]], [20 x [13 x
> float]]* @array, i64 0, i64 %idx, i64 5
> %tmp1 = getelementptr inbounds [20 x [13 x float]], [20 x [13 x
> float]]* @array, i64 0, i64 %idx, i64 6
> %tmp2 = getelementptr inbounds [20 x [13 x float]], [20 x [13 x
> float]]* @array, i64 0, i64 %idx, i64 7
> %tmp3 = getelementptr inbounds [20 x [13 x float]], [20 x [13 x
> float]]* @array, i64 0, i64 %idx, i64 8
> %tmp4 = bitcast float* %tmp1 to i32*
> %tmp5 = load i32, i32* %tmp4, align 4 ;; index = 6
> %tmp6 = insertelement <4 x i32> undef, i32 %tmp5, i32 0
> %tmp7 = bitcast float* %tmp2 to i32*
> %tmp8 = load i32, i32* %tmp7, align 4 ;; index = 7
> %tmp9 = insertelement <4 x i32> %tmp6, i32 %tmp8, i32 1
> %tmp10 = bitcast float* %tmp3 to i32*
> %tmp11 = load i32, i32* %tmp10, align 4 ;; index = 8
> %tmp12 = insertelement <4 x i32> %tmp9, i32 %tmp11, i32 2
> %tmp13 = bitcast float* %tmp to i32*
> %tmp14 = load i32, i32* %tmp13, align 4 ;; index = 5
> %tmp15 = insertelement <4 x i32> %tmp12, i32 %tmp14, i32 3
> store <4 x i32> %tmp15, <4 x i32>* %sink ;; Storing [6, 7, 8, 5]
> ret void
> }
>
>
> opt -slp-vectorizer results in
>
> target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
> target triple = "x86_64-grtev4-linux-gnu"
>
> @array = external global [20 x [13 x float]]
>
> define void @hoge(i64 %idx, <4 x i32>* %sink) {
> bb:
> %tmp = getelementptr inbounds [20 x [13 x float]], [20 x [13 x
> float]]* @array, i64 0, i64 %idx, i64 5
> %tmp1 = getelementptr inbounds [20 x [13 x float]], [20 x [13 x
> float]]* @array, i64 0, i64 %idx, i64 6
> %tmp2 = getelementptr inbounds [20 x [13 x float]], [20 x [13 x
> float]]* @array, i64 0, i64 %idx, i64 7
> %tmp3 = getelementptr inbounds [20 x [13 x float]], [20 x [13 x
> float]]* @array, i64 0, i64 %idx, i64 8
> %tmp4 = bitcast float* %tmp1 to i32*
> %tmp7 = bitcast float* %tmp2 to i32*
> %tmp10 = bitcast float* %tmp3 to i32*
> %tmp13 = bitcast float* %tmp to i32*
> %0 = bitcast i32* %tmp13 to <4 x i32>*
> %1 = load <4 x i32>, <4 x i32>* %0, align 4 ;; [5, 6, 7, 8]
> %2 = extractelement <4 x i32> %1, i32 0
> %tmp6 = insertelement <4 x i32> undef, i32 %2, i32 0
> %3 = extractelement <4 x i32> %1, i32 1
> %tmp9 = insertelement <4 x i32> %tmp6, i32 %3, i32 1
> %4 = extractelement <4 x i32> %1, i32 2
> %tmp12 = insertelement <4 x i32> %tmp9, i32 %4, i32 2
> %5 = extractelement <4 x i32> %1, i32 3
> %tmp15 = insertelement <4 x i32> %tmp12, i32 %5, i32 3
> store <4 x i32> %tmp15, <4 x i32>* %sink ;; Storing [5, 6, 7, 8]
> ret void
> }
>
> which looks incorrect.
>
> The problem is more obvious with this assert:
>
> diff --git a/lib/Transforms/Vectorize/SLPVectorizer.cpp
> b/lib/Transforms/Vectorize/SLPVectorizer.cpp
> index 76ba62f5d59..1978169f909 100644
> --- a/lib/Transforms/Vectorize/SLPVectorizer.cpp
> +++ b/lib/Transforms/Vectorize/SLPVectorizer.cpp
> @@ -766,6 +766,10 @@ private:
> if (UserTreeIdx != -1)
> UserTreeEntry = &VectorizableTree[UserTreeIdx];
>
> + if (!ShuffleMask.empty()) {
> + assert(UserTreeEntry);
> + }
> +
> if (UserTreeEntry && !ShuffleMask.empty()) {
> if ((unsigned)OpdNum >= UserTreeEntry->ShuffleMask.size())
> UserTreeEntry->ShuffleMask.resize(OpdNum + 1);
>
>
> In that, the SLP vectorizer "forgets" the shuffle mask if the shuffled load
> instruction does not have a UserTreeIdx.
>
> If my analysis is correct, can you please revert for now?
>
> The commit also has some stylistic issues, I'll put them on the
> https://reviews.llvm.org/D36130 phabricator review.
>
> -- Sanjoy
More information about the llvm-commits
mailing list