[llvm] [Hexagon] Fix O(N^2) compile-time regression in HexagonOptAddrMode (PR #189531)

via llvm-commits llvm-commits at lists.llvm.org
Tue Mar 31 01:37:17 PDT 2026


llvmbot wrote:


<!--LLVM PR SUMMARY COMMENT-->

@llvm/pr-subscribers-backend-hexagon

Author: Abinaya Saravanan (quic-asaravan)

<details>
<summary>Changes</summary>

    In HexagonOptAddrMode::processAddUses, isSafeToExtLR was called inside
    the loop over UNodeList with loop-invariant arguments. isSafeToExtLR
    iterates over UNodeList, so the total work was O(N^2) in the number of
    uses.

    The arguments (AddSN, AddMI, BaseReg, UNodeList) do not change across
    iterations. Move the call to after the loop; the function returns the
    same value regardless of which iteration calls it, and the complexity
    drops to O(N).

    Background
    ----------
    Commit 8c0483bba2d2 ("RegisterCoalescer: Fix assert on remat to
    copy-to-physreg with subregs") introduced register coalescer
    rematerialization changes that produce additional uses of A2_addi
    instructions on the Hexagon backend, inflating UNodeList. This exposed
    the pre-existing O(N^2) behavior in processAddUses.

    Measurements
    ------------
      Input: rkvdec-vdpu383-h264.i (Hexagon kernel driver, -O2)
      Tool:  hexagon-linux-musl-clang (clang-20)

      Scenario                              | HexagonOptAddrMode | Total
      --------------------------------------|-------------------|---------
      Before blamed commit (baseline)       |          1.35 s   |  ~195 s
      After blamed commit, without fix      |    221,845    s   |  >61 h
      After blamed commit, with fix         |         52.16 s   |  ~225 s

    Fixes: https://github.com/llvm/llvm-project/issues/178535

---
Full diff: https://github.com/llvm/llvm-project/pull/189531.diff


2 Files Affected:

- (modified) llvm/lib/Target/Hexagon/HexagonOptAddrMode.cpp (+12-10) 
- (added) llvm/test/CodeGen/Hexagon/opt-addr-mode-large-unodelist.ll (+33) 


``````````diff
diff --git a/llvm/lib/Target/Hexagon/HexagonOptAddrMode.cpp b/llvm/lib/Target/Hexagon/HexagonOptAddrMode.cpp
index f266678071b5b..7a4b51ad11082 100644
--- a/llvm/lib/Target/Hexagon/HexagonOptAddrMode.cpp
+++ b/llvm/lib/Target/Hexagon/HexagonOptAddrMode.cpp
@@ -723,18 +723,20 @@ bool HexagonOptAddrMode::processAddUses(NodeAddr<StmtNode *> AddSN,
     int64_t newOffset = OffsetOp.getImm() + AddMI->getOperand(2).getImm();
     if (!isValidOffset(MI, newOffset))
       return false;
-
-    // Since we'll be extending the live range of Rt in the following example,
-    // make sure that is safe. another definition of Rt doesn't exist between 'add'
-    // and load/store instruction.
-    //
-    // Ex: Rx= add(Rt,#10)
-    //     memw(Rx+#0) = Rs
-    // will be replaced with =>  memw(Rt+#10) = Rs
-    if (!isSafeToExtLR(AddSN, AddMI, BaseReg, UNodeList))
-      return false;
   }
 
+  // Since we'll be extending the live range of Rt in the following example,
+  // make sure that is safe. another definition of Rt doesn't exist between
+  // 'add' and load/store instruction.
+  //
+  // Ex: Rx= add(Rt,#10)
+  //     memw(Rx+#0) = Rs
+  // will be replaced with =>  memw(Rt+#10) = Rs
+  // Note: isSafeToExtLR arguments are loop-invariant; call it once after
+  // validating all uses to avoid O(N^2) behavior when UNodeList is large.
+  if (!isSafeToExtLR(AddSN, AddMI, BaseReg, UNodeList))
+    return false;
+
   NodeId LRExtRegRD = 0;
   // Iterate through all the UseNodes in SN and find the reaching def
   // for the LRExtReg.
diff --git a/llvm/test/CodeGen/Hexagon/opt-addr-mode-large-unodelist.ll b/llvm/test/CodeGen/Hexagon/opt-addr-mode-large-unodelist.ll
new file mode 100644
index 0000000000000..a5e5ce2b8a7be
--- /dev/null
+++ b/llvm/test/CodeGen/Hexagon/opt-addr-mode-large-unodelist.ll
@@ -0,0 +1,33 @@
+; RUN: llc -mtriple=hexagon -O2 < %s | FileCheck %s
+
+; Verify that processAddUses correctly folds an A2_addi into multiple
+; store offsets. With many uses, isSafeToExtLR must be called once
+; (not once per use) to avoid O(N^2) compile time.
+
+; The A2_addi computing the base address should be eliminated; all
+; stores should use the original base register with the combined offset.
+
+; CHECK-LABEL: f0:
+; CHECK-NOT: = add(r
+
+define void @f0(ptr %base, i8 %a, i8 %b, i8 %c, i8 %d,
+                i8 %e, i8 %f, i8 %g, i8 %h) nounwind {
+entry:
+  %p0 = getelementptr inbounds i8, ptr %base, i32 13
+  store i8 %a, ptr %p0, align 1
+  %p1 = getelementptr inbounds i8, ptr %p0, i32 1
+  store i8 %b, ptr %p1, align 1
+  %p2 = getelementptr inbounds i8, ptr %p0, i32 2
+  store i8 %c, ptr %p2, align 1
+  %p3 = getelementptr inbounds i8, ptr %p0, i32 3
+  store i8 %d, ptr %p3, align 1
+  %p4 = getelementptr inbounds i8, ptr %p0, i32 4
+  store i8 %e, ptr %p4, align 1
+  %p5 = getelementptr inbounds i8, ptr %p0, i32 5
+  store i8 %f, ptr %p5, align 1
+  %p6 = getelementptr inbounds i8, ptr %p0, i32 6
+  store i8 %g, ptr %p6, align 1
+  %p7 = getelementptr inbounds i8, ptr %p0, i32 7
+  store i8 %h, ptr %p7, align 1
+  ret void
+}

``````````

</details>


https://github.com/llvm/llvm-project/pull/189531


More information about the llvm-commits mailing list