[llvm] [JumpThreading] Limit number of free instructions (PR #75671)

Fri Dec 15 16:06:22 PST 2023

https://github.com/terrelln created https://github.com/llvm/llvm-project/pull/75671

`JumpThreading` bails out when the duplication cost is too high, however it doesn't consider free instructions (like lifetime annotations) in this cost. Normally this is fine, but in edge case behavior it can end up quadratically increasing the size of the IR, like in Issue #75666.

This PR bails out if there are far too many free instructions, but otherwise doesn't include them in the cost threshold is met, it doesn't include them in the cost. This is desirable because we just want to avoid edge case behavior in compilation speed, and the code size cost is still actually zero.

I picked the default threshold somewhat arbitrarily. I selected the highest multiple of 100 that kept the compile time for the repro in Issue #75666 under 1 minute. That example compiles in 6 seconds without the `JumpThreading` pass, 37 seconds with a threshold of 500, and hours without the threshold. This is likely conservative, because I didn't want to impact normal compilations that aren't hitting this edge case, so please let me know if I should lower it.

Fixes Issue #75666.

>From 9425f56a2703280ae087876468715c481fb61244 Mon Sep 17 00:00:00 2001
From: Nick Terrell <terrelln at fb.com>
Date: Fri, 15 Dec 2023 15:09:11 -0800
Subject: [PATCH] [JumpThreading] Limit number of free instructions

`JumpThreading` bails out when the duplication cost is too high, however it
doesn't consider free instructions (like lifetime annotations) in this cost.
Normally this is fine, but in edge case behavior it can end up quadratically
increasing the size of the IR, like in Issue #75666.

This PR bails out if there are far too many free instructions, but otherwise
doesn't include them in the cost threshold is met, it doesn't include them in
the cost. This is desirable because we just want to avoid edge case behavior in
compilation speed, and the code size cost is still actually zero.

I picked the default threshold somewhat arbitrarily. I selected the highest
multiple of 100 that kept the compile time for the repro in Issue #75666 under
1 minute. That example compiles in 6 seconds without the `JumpThreading` pass,
37 seconds with a threshold of 500, and hours without the threshold. This is
likely conservative, because I didn't want to impact normal compilations that
aren't hitting this edge case, so please let me know if I should lower it.

Fixes Issue #75666.
---
 llvm/lib/Transforms/Scalar/JumpThreading.cpp | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/llvm/lib/Transforms/Scalar/JumpThreading.cpp b/llvm/lib/Transforms/Scalar/JumpThreading.cpp
index 8603c5cf9c022c..621299efb6b6a8 100644
--- a/llvm/lib/Transforms/Scalar/JumpThreading.cpp
+++ b/llvm/lib/Transforms/Scalar/JumpThreading.cpp
@@ -102,6 +102,11 @@ static cl::opt<unsigned> PhiDuplicateThreshold(
     cl::desc("Max PHIs in BB to duplicate for jump threading"), cl::init(76),
     cl::Hidden);
 
+static cl::opt<unsigned> FreeInstDuplicateThreshold(
+    "jump-threading-free-inst-threshold",
+    cl::desc("Max free instructions in BB to duplicate for jump threading"), cl::init(500),
+    cl::Hidden);
+
 static cl::opt<bool> ThreadAcrossLoopHeaders(
     "jump-threading-across-loop-headers",
     cl::desc("Allow JumpThreading to thread across loop headers, for testing"),
@@ -467,6 +472,7 @@ static unsigned getJumpThreadDuplicationCost(const TargetTransformInfo *TTI,
   // terminator-based Size adjustment at the end.
   Threshold += Bonus;
 
+  unsigned Free = 0;
   // Sum up the cost of each instruction until we get to the terminator.  Don't
   // include the terminator because the copy won't include it.
   unsigned Size = 0;
@@ -488,8 +494,14 @@ static unsigned getJumpThreadDuplicationCost(const TargetTransformInfo *TTI,
         return ~0U;
 
     if (TTI->getInstructionCost(&*I, TargetTransformInfo::TCK_SizeAndLatency) ==
-        TargetTransformInfo::TCC_Free)
+        TargetTransformInfo::TCC_Free) {
+      // Do not duplicate the BB if it has a lot of free instructions.
+      // In edge cases they can add up and significantly increase compile time of
+      // later passes by bloating the IR.
+      if (Free++ > FreeInstDuplicateThreshold)
+        return ~0U;
       continue;
+    }
 
     // All other instructions count for at least one unit.
     ++Size;