[PATCH] D123516: Fix SLP score for out of order contiguous loads

Alban Bridonneau via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Apr 12 04:18:43 PDT 2022


alban.bridonneau updated this revision to Diff 422172.
alban.bridonneau added a comment.

I have simplified the unit tests.
As far as this patch goes, both tests were checking the same
behaviour, so I have only kept one. The complex data structures
and unnecesary attributes have been removed, and the IR has been reordered
to make the patterns clearer. Let me know if further reduction
is needed.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123516/new/

https://reviews.llvm.org/D123516

Files:
  llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
  llvm/test/Transforms/SLPVectorizer/AArch64/tsc-s116.ll


Index: llvm/test/Transforms/SLPVectorizer/AArch64/tsc-s116.ll
===================================================================
--- /dev/null
+++ llvm/test/Transforms/SLPVectorizer/AArch64/tsc-s116.ll
@@ -0,0 +1,50 @@
+; RUN: opt < %s -slp-vectorizer -S -mtriple=aarch64-unknown-unknown | FileCheck %s
+
+; This test is reduced from the TSVC evaluation of vectorizers:
+; https://github.com/llvm/llvm-test-suite/commits/main/MultiSource/Benchmarks/TSVC/LoopRerolling-flt/tsc.c
+
+define void @s116_modified(float* %a) {
+; CHECK-LABEL: @s116_modified(
+; CHECK: [[VEC1:%.*]] = load <4 x float>, <4 x float>* %{{.*}}
+; CHECK: [[VEC2A:%.*]] = insertelement <4 x float> poison, float %{{.*}}, i32 0
+; CHECK: [[EL1:%.*]] = extractelement <4 x float> [[VEC1]], i32 0
+; CHECK: [[VEC2B:%.*]] = insertelement <4 x float> [[VEC2A]], float [[EL1]], i32 1
+; CHECK: [[EL2:%.*]] = extractelement <4 x float> [[VEC1]], i32 1
+; CHECK: [[VEC2C:%.*]] = insertelement <4 x float> [[VEC2B]], float [[EL2]], i32 2
+; CHECK: [[EL3:%.*]] = extractelement <4 x float> [[VEC1]], i32 2
+; CHECK: [[VEC2D:%.*]] = insertelement <4 x float> [[VEC2C]], float [[EL3]], i32 3
+; CHECK: [[FMUL1:%.*]] = fmul fast <4 x float> [[VEC1]], [[VEC2D]]
+entry:
+  br label %for.body
+
+for.cond.cleanup:                                 ; preds = %for.body
+  ret void
+
+for.body:                                         ; preds = %entry, %for.body
+  %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
+  %offset_1 = or i64 %indvars.iv, 1
+  %offset_2 = or i64 %indvars.iv, 2
+  %offset_3 = or i64 %indvars.iv, 3
+  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 4
+  %arrayidx = getelementptr inbounds float, float* %a, i64 %offset_1
+  %0 = load float, float* %arrayidx
+  %arrayidx2 = getelementptr inbounds float, float* %a, i64 %indvars.iv
+  %1 = load float, float* %arrayidx2
+  %arrayidx7 = getelementptr inbounds float, float* %a, i64 %offset_2
+  %2 = load float, float* %arrayidx7
+  %arrayidx17 = getelementptr inbounds float, float* %a, i64 %offset_3
+  %3 = load float, float* %arrayidx17
+  %arrayidx27 = getelementptr inbounds float, float* %a, i64 %indvars.iv.next
+  %4 = load float, float* %arrayidx27
+  %mul = fmul fast float %1, %0
+  %mul11 = fmul fast float %2, %0
+  %mul21 = fmul fast float %3, %2
+  %mul31 = fmul fast float %4, %3
+  store float %mul, float* %arrayidx2
+  store float %mul11, float* %arrayidx
+  store float %mul21, float* %arrayidx7
+  store float %mul31, float* %arrayidx17
+  %cmp = icmp ult i64 %indvars.iv.next, 100
+  br i1 %cmp, label %for.body, label %for.cond.cleanup
+}
+
Index: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
===================================================================
--- llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+++ llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
@@ -1196,7 +1196,7 @@
           return VLOperands::ScoreFail;
         // The distance is too large - still may be profitable to use masked
         // loads/gathers.
-        if (std::abs(*Dist) > NumLanes / 2)
+        if (std::abs(*Dist) > 1)
           return VLOperands::ScoreAltOpcodes;
         // This still will detect consecutive loads, but we might have "holes"
         // in some cases. It is ok for non-power-2 vectorization and may produce


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D123516.422172.patch
Type: text/x-patch
Size: 3297 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20220412/bff77161/attachment.bin>


More information about the llvm-commits mailing list