[llvm] [RISCV][TTI][RFC] Conservatively enable partial loop unrolling for single block loops (PR #91332)

Tue May 7 07:05:54 PDT 2024

https://github.com/asb created https://github.com/llvm/llvm-project/pull/91332

This is part of a larger conversation that's maybe most efficient to have in the next sync-up call.

Two separate issues are:
* Handling of unrolling defaults on RISC-V in general. Having a completely different path for processors that opt in to TuneNoDefaultUnroll probably isn't ideal long-term, and as written it means the default unrolling is never called - while it might be better to layer on our customisations on top.
* Partial unrolling is not enabled at all unless you're targeting a CPU that set TuneNoDefaultUnroll or (following the logic in the default unrolling preferences implementation) have LoopMicroopBufferSize set in the scheduling model.

This PR focuses on the second.

There are at least some cases where partially unrolling is a sensible strategy regardless of whether you have an in-order or out-of-order microarchitecture. e.g. in some cases partially unrolling a loop allows load/store merging to take place and of course it reduces overhead from the loop. I've tried to start with something very conservative, taking thresholds from Webassembly and enabling partial unrolling when the loop has a single block. This mostly means it's enabled in simple cases that produce a larger block that's looped fewer times.

A case where this is obviously much better is simple initialisation patterns where unrolling can allow store merging to kick in (provided misaligned loads/stores are supported). Admittedly for that specific case I'm looking to introduce an `llvm.memset_pattern.inline` intrinsic which LoopIdiomRecognition can produce when it sees memset_pattern but the target doesn't have the libfunc (basically any non-Apple target). One case where the output is arguably worse is where some of the IR instructions expand to control flow (precise impact depends on microarchitecture).

The logic isn't particularly nicely factored, as this is mainly to get feedback and precise factoring depends on discussion about improving/unifying handling of unrolling preferences.

>From da2fddd4c8fb47978d69b2d5244662d78b8e5b05 Mon Sep 17 00:00:00 2001
From: Alex Bradbury <asb at igalia.com>
Date: Tue, 7 May 2024 14:52:14 +0100
Subject: [PATCH] [RISCV] Conservatively enable partial loop unrolling for
 single block loops

---
 .../Target/RISCV/RISCVTargetTransformInfo.cpp | 31 +++++++++++++++++--
 1 file changed, 28 insertions(+), 3 deletions(-)

diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
index 5f84175da703d..006a601daf2b6 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
@@ -1699,9 +1699,34 @@ void RISCVTTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE,
   // TODO: More tuning on benchmarks and metrics with changes as needed
   //       would apply to all settings below to enable performance.
 
-
-  if (ST->enableDefaultUnroll())
-    return BasicTTIImplBase::getUnrollingPreferences(L, SE, UP, ORE);
+  if (ST->enableDefaultUnroll()) {
+    BasicTTIImplBase::getUnrollingPreferences(L, SE, UP, ORE);
+
+    // Enable a conservative form of partial unrolling when not optimizing for
+    // size and when the loop only has a single block.
+    if (L->getNumBlocks() > 1)
+      return;
+    for (auto *BB : L->getBlocks()) {
+      for (auto &I : *BB) {
+        // Don't partially unroll loops containing vectorized instructions.
+        if (I.getType()->isVectorTy())
+          return;
+        if (isa<CallInst>(I) || isa<InvokeInst>(I)) {
+          if (const Function *F = cast<CallBase>(I).getCalledFunction()) {
+            if (!isLoweredToCall(F))
+              continue;
+          }
+          return;
+        }
+      }
+    }
+    UP.Partial = UP.UpperBound = true;
+    UP.PartialThreshold = 30;
+    // Avoid unrolling when optimizing for size.
+    UP.OptSizeThreshold = 0;
+    UP.PartialOptSizeThreshold = 0;
+    return;
+  }
 
   // Enable Upper bound unrolling universally, not dependant upon the conditions
   // below.