[PATCH] D132490: [LoopVectorize] Emit runtime checks correctly for nested loops
Florian Hahn via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Aug 24 05:47:26 PDT 2022
fhahn added a comment.
Thanks for the patch! I don't think this is the ideal solution, it looks like the source of the issue is a missing check that the AddRecs for step expressions are both in the inner loop. I'll include a diff with the fix below which I am planning to submit.
As you already shared a test case, it would be great if you could clean it up a bit and land just the patch to start with. I left some comments inline. What do you think?
index 5a01a8e4b055..220e48643245 100644
--- a/llvm/include/llvm/Analysis/LoopAccessAnalysis.h
+++ b/llvm/include/llvm/Analysis/LoopAccessAnalysis.h
@@ -253,6 +253,10 @@ public:
return {};
}
+ const Loop *getInnermostLoop() const {
+ return InnermostLoop;
+ }
+
private:
/// A wrapper around ScalarEvolution, used to add runtime SCEV checks, and
/// applies dynamic knowledge to simplify SCEV expressions and convert them
diff --git a/llvm/lib/Analysis/LoopAccessAnalysis.cpp b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
index a9a4a820db50..4febf0bfd62c 100644
--- a/llvm/lib/Analysis/LoopAccessAnalysis.cpp
+++ b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
@@ -281,7 +281,7 @@ void RuntimePointerChecking::tryToCreateDiffCheck(
auto *SrcAR = dyn_cast<SCEVAddRecExpr>(Src->Expr);
auto *SinkAR = dyn_cast<SCEVAddRecExpr>(Sink->Expr);
- if (!SrcAR || !SinkAR) {
+ if (!SrcAR || !SinkAR || SrcAR->getLoop() != DC.getInnermostLoop() || SinkAR->getLoop() != DC.getInnermostLoop()) {
CanUseDiffCheck = false;
return;
}
================
Comment at: llvm/test/Transforms/LoopVectorize/nested-loop.ll:8
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
----------------
I don't think this test needs the X86 cost model. `target triple` could probably be dropped and replaced by passing `-force-vector-width=4 -force-vector-interleave=1` to `opt`. If it requires the target triple, it needs to be moved to the `X86` sub-directory.
================
Comment at: llvm/test/Transforms/LoopVectorize/nested-loop.ll:16
+ %cmp24 = icmp sgt i32 %len, 0
+ br i1 %cmp24, label %for.cond1.preheader.us.preheader, label %for.end12
+
----------------
This check shouldn't be really needed.
================
Comment at: llvm/test/Transforms/LoopVectorize/nested-loop.ll:19
+for.cond1.preheader.us.preheader: ; preds = %entry
+ %wide.trip.count30 = zext i32 %len to i64
+ br label %for.cond1.preheader.us
----------------
could pass `%Len` as `i64`
================
Comment at: llvm/test/Transforms/LoopVectorize/nested-loop.ll:22
+
+for.cond1.preheader.us: ; preds = %for.cond1.preheader.us.preheader, %for.cond1.for.end_crit_edge.us
+ %indvars.iv27 = phi i64 [ 0, %for.cond1.preheader.us.preheader ], [ %indvars.iv.next28, %for.cond1.for.end_crit_edge.us ]
----------------
block names could be simplified. e.g. `%for.cond1.preheader.us -> %outer.header`, `%for.cond1.for.end_crit_edge.us: -> %outer.latch`, `%for.body3.us -> %inner` `for.end12 -> %exit`.
================
Comment at: llvm/test/Transforms/LoopVectorize/nested-loop.ll:30
+ %0 = phi i32 [ %.pre, %for.cond1.preheader.us ], [ %sub.us, %for.body3.us ]
+ %indvars.iv = phi i64 [ 0, %for.cond1.preheader.us ], [ %indvars.iv.next, %for.body3.us ]
+ %arrayidx5.us = getelementptr inbounds i32, ptr %b, i64 %indvars.iv
----------------
same could be simplified by dropping `indvars.` prefix.
================
Comment at: llvm/test/Transforms/LoopVectorize/nested-loop.ll:34
+ %sub.us = sub i32 %0, %1
+ store i32 %sub.us, ptr %arrayidx.us, align 4, !tbaa !5
+ %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
----------------
!tbaa shouldn't be needed.
================
Comment at: llvm/test/Transforms/LoopVectorize/nested-loop.ll:41
+ %sub.us.lcssa = phi i32 [ %sub.us, %for.body3.us ]
+ %call.us = tail call i32 (ptr, ...) @printf(ptr noundef nonnull @.str, i32 noundef %sub.us.lcssa)
+ %indvars.iv.next28 = add nuw nsw i64 %indvars.iv27, 1
----------------
This could be simplified, if a use of `%sub.us.lcssa` is needed then a function that only takes an `i32` should be sufficient.
================
Comment at: llvm/test/Transforms/LoopVectorize/nested-loop.ll:56
+
+attributes #0 = { nofree nounwind uwtable "frame-pointer"="none" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
+attributes #1 = { nofree nounwind "frame-pointer"="none" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
----------------
attributes shouldn't be needed
================
Comment at: llvm/test/Transforms/LoopVectorize/nested-loop.ll:59
+
+!llvm.module.flags = !{!0, !1, !2, !3}
+!llvm.ident = !{!4}
----------------
none of the metadata should be needed
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D132490/new/
https://reviews.llvm.org/D132490
More information about the llvm-commits
mailing list