[PATCH] D97947: [AArch64] Force runtime unrolling for in-order scheduling models

Mon Mar 8 05:11:23 PST 2021

NickGuy updated this revision to Diff 328978.
NickGuy added a comment.

I've done some more benchmarks with the provided suggestions;

in D97947#2603450 <https://reviews.llvm.org/D97947#2603450>, @SjoerdMeijer wrote:

> Do we need that too? Have you benchmarked this, and can we try this too?

In the benchmark I ran with this change, I saw a slight regression (~0.01%) which given the gains from unrolling is a negligible change.

In D97947#2604347 <https://reviews.llvm.org/D97947#2604347>, @dmgreen wrote:

> If the loop has already been vectorized (and possibly interleaved) it will already have been somewhat unrolled though. We may want to be more careful with cases like that and not runtime unroll the loop that has already been vectorized. We do the same thing for MVE, where that is more important due to the low register count and tail predicated loops. But the same reasoning may well apply here too, where unrolling it further just make the loop body too large to be useful

With this change added onto the previous, I saw an improvement of ~0.4% over the unrestricted runtime unrolling.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97947/new/

https://reviews.llvm.org/D97947

Files:
  llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
  llvm/test/Transforms/LoopUnroll/AArch64/runtime-loop.ll


Index: llvm/test/Transforms/LoopUnroll/AArch64/runtime-loop.ll
===================================================================

--- llvm/test/Transforms/LoopUnroll/AArch64/runtime-loop.ll
+++ llvm/test/Transforms/LoopUnroll/AArch64/runtime-loop.ll
@@ -1,5 +1,7 @@
 ; RUN: opt < %s -S -loop-unroll -mtriple aarch64 -mcpu=cortex-a57 -unroll-runtime-epilog=true  | FileCheck %s -check-prefix=EPILOG
 ; RUN: opt < %s -S -loop-unroll -mtriple aarch64 -mcpu=cortex-a57 -unroll-runtime-epilog=false | FileCheck %s -check-prefix=PROLOG
+; RUN: opt < %s -S -loop-unroll -mtriple aarch64 -mcpu=cortex-r82 -unroll-runtime-epilog=true  | FileCheck %s -check-prefix=EPILOG
+; RUN: opt < %s -S -loop-unroll -mtriple aarch64 -mcpu=cortex-r82 -unroll-runtime-epilog=false | FileCheck %s -check-prefix=PROLOG
 
 ; Tests for unrolling loops with run-time trip counts
 
Index: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
===================================================================
--- llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -983,6 +983,27 @@
   if (ST->getProcFamily() == AArch64Subtarget::Falkor &&
       EnableFalkorHWPFUnrollFix)
     getFalkorUnrollingPreferences(L, SE, UP);
+
+  // Scan the loop: don't unroll loops with calls as this could prevent
+  // inlining.
+  for (auto *BB : L->getBlocks()) {
+    for (auto &I : *BB) {
+      // Don't unroll vectorised loop.
+      if (I.getType()->isVectorTy())
+        return;
+
+      if (isa<CallInst>(I) || isa<InvokeInst>(I)) {
+        if (const Function *F = cast<CallBase>(I).getCalledFunction()) {
+          if (!isLoweredToCall(F))
+            continue;
+        }
+        return;
+      }
+    }
+  }
+
+  // Force runtime unrolling for in-order models
+  UP.Runtime |= !ST->getSchedModel().isOutOfOrder();
 }
 
 void AArch64TTIImpl::getPeelingPreferences(Loop *L, ScalarEvolution &SE,


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D97947.328978.patch
Type: text/x-patch
Size: 1934 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20210308/396a5adc/attachment.bin>