[PATCH] D132490: [LoopVectorize] Emit runtime checks correctly for nested loops

Wed Aug 24 05:47:26 PDT 2022

fhahn added a comment.

Thanks for the patch! I don't think this is the ideal solution, it looks like the source of the issue is a missing check that the AddRecs for step expressions are both in the inner loop. I'll include a diff with the fix below which I am planning to submit.

As you already shared a test case, it would be great if you could clean it up a bit and land just the patch to start with. I left some comments inline.  What do you think?

  index 5a01a8e4b055..220e48643245 100644
  --- a/llvm/include/llvm/Analysis/LoopAccessAnalysis.h
  +++ b/llvm/include/llvm/Analysis/LoopAccessAnalysis.h
  @@ -253,6 +253,10 @@ public:
       return {};
     }

  +  const Loop *getInnermostLoop() const {
  +    return InnermostLoop;
  +  }
  +
   private:
     /// A wrapper around ScalarEvolution, used to add runtime SCEV checks, and
     /// applies dynamic knowledge to simplify SCEV expressions and convert them
  diff --git a/llvm/lib/Analysis/LoopAccessAnalysis.cpp b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
  index a9a4a820db50..4febf0bfd62c 100644
  --- a/llvm/lib/Analysis/LoopAccessAnalysis.cpp
  +++ b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
  @@ -281,7 +281,7 @@ void RuntimePointerChecking::tryToCreateDiffCheck(

     auto *SrcAR = dyn_cast<SCEVAddRecExpr>(Src->Expr);
     auto *SinkAR = dyn_cast<SCEVAddRecExpr>(Sink->Expr);
  -  if (!SrcAR || !SinkAR) {
  +  if (!SrcAR || !SinkAR || SrcAR->getLoop() != DC.getInnermostLoop() || SinkAR->getLoop() != DC.getInnermostLoop()) {
       CanUseDiffCheck = false;
       return;
     }

================
Comment at: llvm/test/Transforms/LoopVectorize/nested-loop.ll:8
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
----------------
I don't think this test needs the X86 cost model. `target triple` could probably be dropped and replaced by passing `-force-vector-width=4 -force-vector-interleave=1` to `opt`. If it requires the target triple, it needs to be moved to the `X86` sub-directory.

================
Comment at: llvm/test/Transforms/LoopVectorize/nested-loop.ll:16
+  %cmp24 = icmp sgt i32 %len, 0
+  br i1 %cmp24, label %for.cond1.preheader.us.preheader, label %for.end12
+
----------------
This check shouldn't be really needed.

================
Comment at: llvm/test/Transforms/LoopVectorize/nested-loop.ll:19
+for.cond1.preheader.us.preheader:                 ; preds = %entry
+  %wide.trip.count30 = zext i32 %len to i64
+  br label %for.cond1.preheader.us
----------------
could pass `%Len` as `i64`

================
Comment at: llvm/test/Transforms/LoopVectorize/nested-loop.ll:22
+
+for.cond1.preheader.us:                           ; preds = %for.cond1.preheader.us.preheader, %for.cond1.for.end_crit_edge.us
+  %indvars.iv27 = phi i64 [ 0, %for.cond1.preheader.us.preheader ], [ %indvars.iv.next28, %for.cond1.for.end_crit_edge.us ]
----------------
block names could be simplified. e.g. `%for.cond1.preheader.us -> %outer.header`, `%for.cond1.for.end_crit_edge.us: -> %outer.latch`,  `%for.body3.us -> %inner` `for.end12 -> %exit`.

================
Comment at: llvm/test/Transforms/LoopVectorize/nested-loop.ll:30
+  %0 = phi i32 [ %.pre, %for.cond1.preheader.us ], [ %sub.us, %for.body3.us ]
+  %indvars.iv = phi i64 [ 0, %for.cond1.preheader.us ], [ %indvars.iv.next, %for.body3.us ]
+  %arrayidx5.us = getelementptr inbounds i32, ptr %b, i64 %indvars.iv
----------------
same could be simplified by dropping `indvars.` prefix.

================
Comment at: llvm/test/Transforms/LoopVectorize/nested-loop.ll:34
+  %sub.us = sub i32 %0, %1
+  store i32 %sub.us, ptr %arrayidx.us, align 4, !tbaa !5
+  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
----------------
!tbaa shouldn't be needed.

================
Comment at: llvm/test/Transforms/LoopVectorize/nested-loop.ll:41
+  %sub.us.lcssa = phi i32 [ %sub.us, %for.body3.us ]
+  %call.us = tail call i32 (ptr, ...) @printf(ptr noundef nonnull @.str, i32 noundef %sub.us.lcssa)
+  %indvars.iv.next28 = add nuw nsw i64 %indvars.iv27, 1
----------------
This could be simplified, if a use  of `%sub.us.lcssa` is needed then a function that only takes an `i32` should be sufficient.

================
Comment at: llvm/test/Transforms/LoopVectorize/nested-loop.ll:56
+
+attributes #0 = { nofree nounwind uwtable "frame-pointer"="none" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
+attributes #1 = { nofree nounwind "frame-pointer"="none" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
----------------
attributes shouldn't be needed

================
Comment at: llvm/test/Transforms/LoopVectorize/nested-loop.ll:59
+
+!llvm.module.flags = !{!0, !1, !2, !3}
+!llvm.ident = !{!4}
----------------
none of the metadata should be needed

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132490/new/

https://reviews.llvm.org/D132490