[PATCH] D97947: [AArch64] Force runtime unrolling for in-order scheduling models
Nicholas Guy via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Mar 8 05:11:23 PST 2021
NickGuy updated this revision to Diff 328978.
NickGuy added a comment.
I've done some more benchmarks with the provided suggestions;
in D97947#2603450 <https://reviews.llvm.org/D97947#2603450>, @SjoerdMeijer wrote:
> Do we need that too? Have you benchmarked this, and can we try this too?
In the benchmark I ran with this change, I saw a slight regression (~0.01%) which given the gains from unrolling is a negligible change.
In D97947#2604347 <https://reviews.llvm.org/D97947#2604347>, @dmgreen wrote:
> If the loop has already been vectorized (and possibly interleaved) it will already have been somewhat unrolled though. We may want to be more careful with cases like that and not runtime unroll the loop that has already been vectorized. We do the same thing for MVE, where that is more important due to the low register count and tail predicated loops. But the same reasoning may well apply here too, where unrolling it further just make the loop body too large to be useful
With this change added onto the previous, I saw an improvement of ~0.4% over the unrestricted runtime unrolling.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D97947/new/
https://reviews.llvm.org/D97947
Files:
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
llvm/test/Transforms/LoopUnroll/AArch64/runtime-loop.ll
Index: llvm/test/Transforms/LoopUnroll/AArch64/runtime-loop.ll
===================================================================
--- llvm/test/Transforms/LoopUnroll/AArch64/runtime-loop.ll
+++ llvm/test/Transforms/LoopUnroll/AArch64/runtime-loop.ll
@@ -1,5 +1,7 @@
; RUN: opt < %s -S -loop-unroll -mtriple aarch64 -mcpu=cortex-a57 -unroll-runtime-epilog=true | FileCheck %s -check-prefix=EPILOG
; RUN: opt < %s -S -loop-unroll -mtriple aarch64 -mcpu=cortex-a57 -unroll-runtime-epilog=false | FileCheck %s -check-prefix=PROLOG
+; RUN: opt < %s -S -loop-unroll -mtriple aarch64 -mcpu=cortex-r82 -unroll-runtime-epilog=true | FileCheck %s -check-prefix=EPILOG
+; RUN: opt < %s -S -loop-unroll -mtriple aarch64 -mcpu=cortex-r82 -unroll-runtime-epilog=false | FileCheck %s -check-prefix=PROLOG
; Tests for unrolling loops with run-time trip counts
Index: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
===================================================================
--- llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -983,6 +983,27 @@
if (ST->getProcFamily() == AArch64Subtarget::Falkor &&
EnableFalkorHWPFUnrollFix)
getFalkorUnrollingPreferences(L, SE, UP);
+
+ // Scan the loop: don't unroll loops with calls as this could prevent
+ // inlining.
+ for (auto *BB : L->getBlocks()) {
+ for (auto &I : *BB) {
+ // Don't unroll vectorised loop.
+ if (I.getType()->isVectorTy())
+ return;
+
+ if (isa<CallInst>(I) || isa<InvokeInst>(I)) {
+ if (const Function *F = cast<CallBase>(I).getCalledFunction()) {
+ if (!isLoweredToCall(F))
+ continue;
+ }
+ return;
+ }
+ }
+ }
+
+ // Force runtime unrolling for in-order models
+ UP.Runtime |= !ST->getSchedModel().isOutOfOrder();
}
void AArch64TTIImpl::getPeelingPreferences(Loop *L, ScalarEvolution &SE,
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D97947.328978.patch
Type: text/x-patch
Size: 1934 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20210308/396a5adc/attachment.bin>
More information about the llvm-commits
mailing list