[PATCH] D113973: [LoopVectorize][CostModel] Choose smaller VFs for in-loop reductions with no loads/stores

Tue Nov 16 01:49:54 PST 2021

RosieSumpter created this revision.
RosieSumpter added reviewers: sdesmalen, david-arm, kmclaughlin.
Herald added a subscriber: hiraditya.
RosieSumpter requested review of this revision.
Herald added a project: LLVM.
Herald added a subscriber: llvm-commits.

For loops that contain in-loop reductions but no loads or stores, large
VFs are chosen because LoopVectorizationCostModel::getSmallestAndWidestTypes
has no element types to check through and so returns the default widths
(-1U for the smallest and 8 for the widest). This results in the widest
VF being chosen for the following example,

float s = 0;
for (int i = 0; i < N; ++i)

  s += (float) i*i;

which, for more computationally intensive loops, leads to large loop
sizes when the operations end up being scalarized.

In this patch, the default max width has been changed from 8 to a more
conservative value of 32 in the case where ElementTypesInLoop is empty
(i.e. in-loop reductions without any loads or stores) which means VF=4
is chosen instead for the above example. However, this impacts other
targets, and there may be another solution.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D113973

Files:
  llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
  llvm/test/Transforms/LoopVectorize/AArch64/smallest-and-widest-types.ll
  llvm/test/Transforms/LoopVectorize/X86/funclet.ll
  llvm/test/Transforms/LoopVectorize/pr32859.ll
  llvm/test/Transforms/LoopVectorize/pr36983.ll


Index: llvm/test/Transforms/LoopVectorize/pr36983.ll
===================================================================

--- llvm/test/Transforms/LoopVectorize/pr36983.ll
+++ llvm/test/Transforms/LoopVectorize/pr36983.ll
@@ -1,4 +1,4 @@
-; RUN: opt < %s -loop-vectorize -S | FileCheck %s
+; RUN: opt < %s -loop-vectorize -force-vector-width=4 -S | FileCheck %s
 
 ; There could be more than one LCSSA PHIs in loop exit block.
 
Index: llvm/test/Transforms/LoopVectorize/pr32859.ll
===================================================================
--- llvm/test/Transforms/LoopVectorize/pr32859.ll
+++ llvm/test/Transforms/LoopVectorize/pr32859.ll
@@ -1,4 +1,4 @@
-; RUN: opt < %s -loop-vectorize -S | FileCheck %s
+; RUN: opt < %s -loop-vectorize -force-vector-width=4 -S | FileCheck %s
 
 ; Out of the LCSSA form we could have 'phi i32 [ loop-invariant, %for.inc.2.i ]'
 ; but the IR Verifier requires for PHI one entry for each predecessor of
Index: llvm/test/Transforms/LoopVectorize/X86/funclet.ll
===================================================================
--- llvm/test/Transforms/LoopVectorize/X86/funclet.ll
+++ llvm/test/Transforms/LoopVectorize/X86/funclet.ll
@@ -33,7 +33,7 @@
 
 ; CHECK-LABEL: define void @test1(
 ; CHECK: %[[cpad:.*]] = catchpad within {{.*}} [i8* null, i32 64, i8* null]
-; CHECK: call <16 x double> @llvm.floor.v16f64(<16 x double> {{.*}}) [ "funclet"(token %[[cpad]]) ]
+; CHECK: call <4 x double> @llvm.floor.v4f64(<4 x double> {{.*}}) [ "funclet"(token %[[cpad]]) ]
 
 declare x86_stdcallcc void @_CxxThrowException(i8*, i8*)
 
Index: llvm/test/Transforms/LoopVectorize/AArch64/smallest-and-widest-types.ll
===================================================================
--- llvm/test/Transforms/LoopVectorize/AArch64/smallest-and-widest-types.ll
+++ llvm/test/Transforms/LoopVectorize/AArch64/smallest-and-widest-types.ll
@@ -31,3 +31,30 @@
 for.end:
   ret void
 }
+
+; For in-loop reductions with no loads or stores in the loop the default widest
+; type is 32 bits, which allows a sensible VF to be chosen.
+
+; CHECK-LABEL: Checking a loop in "no_loads_stores"
+; CHECK: The Smallest and Widest types: 4294967295 / 32 bits
+; CHECK: Selecting VF: 4
+
+define double @no_loads_stores() {
+entry:
+  br label %for.body
+
+for.body:
+  %s.09 = phi double [ 0.000000e+00, %entry ], [ %add, %for.body ]
+  %i.08 = phi i64 [ 0, %entry ], [ %inc, %for.body ]
+  %conv = sitofp i64 %i.08 to double
+  %mul = fmul double %conv, %conv
+  %add = fadd double %s.09, %mul
+  %inc = add nuw nsw i64 %i.08, 1
+  %exitcond.not = icmp eq i64 %inc, 1234567
+  br i1 %exitcond.not, label %for.end, label %for.body
+
+for.end:
+  %.lcssa = phi double [ %add, %for.body ]
+  ret double %.lcssa
+}
+
Index: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
===================================================================
--- llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -6319,6 +6319,11 @@
   unsigned MinWidth = -1U;
   unsigned MaxWidth = 8;
   const DataLayout &DL = TheFunction->getParent()->getDataLayout();
+  // For in-loop reductions, no element types are added to ElementTypesInLoop
+  // if there are no loads/stores in the loop. In this case, set the default
+  // maximum width to be 32 bits so that a sensible VF is chosen.
+  if (ElementTypesInLoop.empty())
+    MaxWidth = 32;
   for (Type *T : ElementTypesInLoop) {
     MinWidth = std::min<unsigned>(
         MinWidth, DL.getTypeSizeInBits(T->getScalarType()).getFixedSize());


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D113973.387531.patch
Type: text/x-patch
Size: 3538 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20211116/f9aefdb3/attachment.bin>