[PATCH] D68082: [LV] Emitting SCEV checks with OptForSize

Thu Sep 26 05:39:43 PDT 2019

SjoerdMeijer created this revision.
SjoerdMeijer added reviewers: Ayal, dorit, hsaito, fhahn.
Herald added subscribers: rkruppe, javed.absar, hiraditya.
Herald added a project: LLVM.

[LV] Emitting SCEV checks with OptForSize

Hi @Ayal  , this is addressing PR43371, and wanted to get your opinion on what
the best direction is.

When optimizing for size, and SCEV runtime checks need to be emitted to check
overflow behaviour, we can run in this assert:

    
  LoopVectorize.cpp:2699: void llvm::InnerLoopVectorizer::emitSCEVChecks(
  llvm::Loop *, llvm::BasicBlock *): Assertion `!BB->getParent()->hasOptSize()
  && "Cannot SCEV check stride or overflow when optimizing for size"'

This assert makes sense, i.e. we should not vectorize in these case with OptForsize.

However, the problem is that `runtimeChecksRequired()` is checked very early in
the flow. At this point, `PSE.getUnionPredicate().getPredicates()` returns 0
predicates, and thus it is assumed code-size won't be increased. But then we
start vectorization transformations, `createVectorizedLoopSkeleton()` does the
prep work, the CFG is restructured, and the vectorization/unroll factor is
applied.  Then, deep in the transformation flow, in `emitSCEVChecks()` we find
a new SCEV predicate that we haven't seen before (unroll factor of 2):

  
  {((2 * (zext i16 undef to i64))<nuw><nsw> + @cm_array),+,2}<%for.body29>


And this SCEV predicate results in emitting SCEV overflow checks, triggering
the assert. At this point, we cannot gracefully bail vectorization, because
most of the transformations have been applied already.

      

The obvious idea to solve this is to do the more extended checks earlier,
except this doesn't look that easy because that would involve a lot of work
reshuffling, and big overhaul of the flow.

      

With this patch I would like to get some feedback:

- if I haven't missed something here,
- and discuss the direction that we would like to take.

I think roughly we have these options:

- reject this patch, because it increases code-size while it shouldn't, and work on a proper patch.
- accept this patch:
  - and accept the regression. It can be argued that this is an edge case: this is about vectorization with optsize, and happens when overflow checks are required that are identified the 2nd this is checked.
  - accept it for the time being as a stopgap, because it might be better than asserting, while we work on the proper patch.

The questions of course is why do the 'proper' patch right away. The reason
is that I don't know the solution yet, and also how long that will take.

      

Any thoughts welcome.

      

PS I thought it would be easier to discuss this with a patch than on
the corresponding bugzilla ticket.


https://reviews.llvm.org/D68082

Files:
  llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
  llvm/test/Transforms/LoopVectorize/pr43371-optsize-scevchecks.ll


Index: llvm/test/Transforms/LoopVectorize/pr43371-optsize-scevchecks.ll
===================================================================

--- /dev/null
+++ llvm/test/Transforms/LoopVectorize/pr43371-optsize-scevchecks.ll
@@ -0,0 +1,41 @@
+; RUN: opt -S -loop-vectorize  -pass-remarks-analysis='loop-vectorize' < %s 2>&1 | FileCheck %s
+
+ at cm_array = external global [2592 x i16], align 1
+
+define void @pr43371() optsize {
+;
+; remark: <unknown>:0:0: Optimizing for size, but overflow checks are emitted. Can this be avoided by e.g. using a larger loop induction variable type?
+
+; CHECK-LABEL: @pr43371
+;
+; FIXME: SCEV checks need to emitted. So when optimizing for size, we shouldn't
+; vectorize this.
+;
+; CHECK: vector.scevcheck:
+; CHECK: vector.body:
+;
+entry:
+  br label %for.body
+
+for.cond.for.cond.cleanup_crit_edge:
+  br label %for.body29
+
+for.body:
+  br label %for.cond.for.cond.cleanup_crit_edge
+
+for.cond.cleanup28:
+  unreachable
+
+for.body29:
+  %i24.0170 = phi i16 [ 0, %for.cond.for.cond.cleanup_crit_edge ], [ %inc37, %for.inc36 ]
+  %add33 = add i16 undef, %i24.0170
+  %idxprom34 = zext i16 %add33 to i32
+  %arrayidx35 = getelementptr [2592 x i16], [2592 x i16] * @cm_array, i32 0, i32 %idxprom34
+  store i16 0, i16 * %arrayidx35, align 1
+  br label %for.inc36
+
+for.inc36:
+  %inc37 = add i16 %i24.0170, 1
+  %cmp26 = icmp ult i16 %inc37, 756
+  br i1 %cmp26, label %for.body29, label %for.cond.cleanup28
+}
Index: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
===================================================================
--- llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -2695,8 +2695,21 @@
     if (C->isZero())
       return;
 
-  assert(!BB->getParent()->hasOptSize() &&
-         "Cannot SCEV check stride or overflow when optimizing for size");
+  // FIXME: we should gracefully bail a lot earlier. The problem is that
+  // in runtimeChecksRequired(), where code-size increase is checked the first
+  // time, everything is different as e.g. unrolling is not applied yet. Thus,
+  // the SCEV information is different, and won't find any SCEV predicates,
+  // which we do find later, resulting in emitting SCEV runtime checks. As a
+  // workaround, we emit an optimization remark warning about the increase.
+  if (BB->getParent()->hasOptSize()) {
+    ORE->emit([&]() {
+      return OptimizationRemarkAnalysis(DEBUG_TYPE, "VectorizationCodeSize",
+                                        L->getStartLoc(), L->getHeader())
+             << "Optimizing for size, but overflow checks are emitted. Can "
+                "this be avoided by e.g. using a larger loop induction "
+                "variable type?";
+    });
+  }
 
   // Create a new block containing the stride check.
   BB->setName("vector.scevcheck");


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D68082.221928.patch
Type: text/x-patch
Size: 2837 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20190926/72267dcf/attachment.bin>