[llvm] [MC] Fuse relaxation and layout into a single forward pass (PR #190318)

Fri Apr 3 00:29:52 PDT 2026

llvmbot wrote:



@llvm/pr-subscribers-llvm-mc

@llvm/pr-subscribers-backend-x86

Author: Fangrui Song (MaskRay)

<details>
<summary>Changes</summary>

This relands debb2514ea7f, which was reverted by #189548 due to ARM
spurious `cbz` out of range error (Chromium, Android).

---

Replace the two-pass inner loop in relaxOnce (relaxFragment +
layoutSection) with a single forward pass that sets each fragment's
offset before processing it.

- Extract relaxAlign from layoutSection's FT_Align handling and call
  it from relaxFragment. FT_Align padding is computed inline with the
  tracked Offset, so alignment fragments always see fresh upstream
  offsets. This structurally eliminates the O(N) convergence pitfall
  where stale offsets caused each iteration to fix only one more
  alignment fragment.

- The new MCAssembler::Stretch field tracks the cumulative upstream size
  delta. In evaluateFixup, for PCRel fixups during relaxation, Stretch
  is added to forward-reference target values (LayoutOrder comparison).
  This makes displacement = target_old - source_old, identical to the
  old two-pass approach, preventing premature relaxation for
  span-dependent instructions.

- FT_Fill/FT_Org removed from relaxFragment; `if (F.Offset != Offset) in
  the fused loop detects their size changes.

- layoutSection is retained for initial layout and post-finishLayout.

This fixes the FT_BoundaryAlign linear time convergence issue reported
by #176535. In addition, backward branches near the short/near boundary
may benefit from tighter encoding when a .p2align between the target and
the branch absorbs upstream growth (see relax-branch-align.s).

Key commits that updated relaxFragment/layoutSection:
- 742ecfc13e8a [MC] Relax MCFillFragment and compute fragment offsets
eagerly
- 9f66ebe42715 MC: Eliminate redundant fragment relaxation
- df71243fa885 MC: Evaluate .org during fragment relaxation
- b1d58f025e83 MCAssembler: Simplify fragment relaxation
- 58d16db8b5d2 MCAssembler: Simplify relaxation of FT_Fill and FT_Org

---

Fix for the ARM regression (see thumb-ldr-stretch.s):

ARM `evaluateFixup` pre-seeds Value with
`(F.Offset + fixup_offset) % 4` for Thumb's AlignDown(PC, 4) semantics.
The full displacement formulas:

```
Old two-pass:
  (source_old % 4) + target_old - source_old
  = target_old - alignDown(source_old, 4)

Reverted buggy fused:
  (source_new % 4) + target_old - source_new + Stretch
  = target_old + Stretch - alignDown(source_new, 4)

Fixed fused:
  ((source_new - Stretch) % 4) + target_old - source_new + Stretch
  = (source_old % 4) + target_old - source_old
  = target_old - alignDown(source_old, 4)
```

With the fused pass, F.Offset is already updated by Stretch, so the
pre-seed used the new offset while the generic Stretch compensation
assumed the old offset, producing a misaligned displacement. This caused
spurious tLDRpci -> t2LDRpci relaxation (+2 bytes each), which cascaded
to push the initial CBZ target out of range.

The fix uses pre-Stretch source offsets for the AlignDown(PC, 4)
pre-seed in evaluateFixup, leading to the same displacement
as the old two-pass algorithm.

Note: both the old and new approaches can unnecessarily relax an LDR
when a downstream alignment would absorb the displacement increase,
leading to out-of-range CBZ fixups. The reverted buggy fused approach
made it easier to trigger such worse-case scenarios.


---
Full diff: https://github.com/llvm/llvm-project/pull/190318.diff


7 Files Affected:

- (modified) llvm/include/llvm/MC/MCAssembler.h (+7-1) 
- (modified) llvm/lib/MC/MCAssembler.cpp (+60-42) 
- (modified) llvm/lib/Target/ARM/MCTargetDesc/ARMAsmBackend.cpp (+5-1) 
- (modified) llvm/lib/Target/CSKY/MCTargetDesc/CSKYAsmBackend.cpp (+5-1) 
- (added) llvm/test/MC/ARM/thumb-ldr-stretch.s (+66) 
- (modified) llvm/test/MC/ELF/relax-branch-align.s (+6-8) 
- (added) llvm/test/MC/X86/align-branch-convergence.s (+37) 


``````````diff

diff --git a/llvm/include/llvm/MC/MCAssembler.h b/llvm/include/llvm/MC/MCAssembler.h
index dbae271a1c198..7f4a1433e1c2a 100644
--- a/llvm/include/llvm/MC/MCAssembler.h
+++ b/llvm/include/llvm/MC/MCAssembler.h
@@ -60,6 +60,10 @@ class MCAssembler {
   bool HasFinalLayout = false;
   bool RelaxAll = false;
 
+  // Cumulative upstream size change during `relaxOnce`. Used to compensate
+  // forward-reference displacements in `evaluateFixup`.
+  int64_t Stretch = 0;
+
   SectionListType Sections;
 
   SmallVector<const MCSymbol *, 0> Symbols;
@@ -108,7 +112,8 @@ class MCAssembler {
   unsigned relaxOnce(unsigned FirstStable);
 
   /// Perform relaxation on a single fragment.
-  bool relaxFragment(MCFragment &F);
+  void relaxFragment(MCFragment &F);
+  void relaxAlign(MCFragment &F);
   void relaxInstruction(MCFragment &F);
   void relaxLEB(MCFragment &F);
   void relaxBoundaryAlign(MCBoundaryAlignFragment &BF);
@@ -190,6 +195,7 @@ class MCAssembler {
   bool hasFinalLayout() const { return HasFinalLayout; }
   bool getRelaxAll() const { return RelaxAll; }
   void setRelaxAll(bool Value) { RelaxAll = Value; }
+  int64_t getStretch() const { return Stretch; }
 
   const_iterator begin() const { return Sections.begin(); }
   const_iterator end() const { return Sections.end(); }
diff --git a/llvm/lib/MC/MCAssembler.cpp b/llvm/lib/MC/MCAssembler.cpp
index e649ea7fedabe..671fb14908a71 100644
--- a/llvm/lib/MC/MCAssembler.cpp
+++ b/llvm/lib/MC/MCAssembler.cpp
@@ -173,6 +173,19 @@ bool MCAssembler::evaluateFixup(const MCFragment &F, MCFixup &Fixup,
 
     if (Fixup.isPCRel()) {
       Value -= getFragmentOffset(F) + Fixup.getOffset();
+      // During relaxation, F's offset is already updated but forward reference
+      // targets are stale. Add Stretch so that the displacement equals
+      // target_old - source_old, preventing premature relaxation.
+      if (Stretch) {
+        assert(!RecordReloc &&
+               "Stretch should only be applied during relaxation");
+        MCFragment *AF = Add ? Add->getFragment() : nullptr;
+        if (AF && AF->getLayoutOrder() > F.getLayoutOrder())
+          Value += Stretch;
+        MCFragment *SF = Sub ? Sub->getFragment() : nullptr;
+        if (SF && SF->getLayoutOrder() > F.getLayoutOrder())
+          Value -= Stretch;
+      }
       if (Add && !Sub && !Add->isUndefined() && !Add->isAbsolute()) {
         IsResolved = getWriter().isSymbolRefDifferenceFullyResolvedImpl(
             *Add, F, false, true);
@@ -743,6 +756,24 @@ void MCAssembler::Finish() {
   assert(PendingErrors.empty());
 }
 
+void MCAssembler::relaxAlign(MCFragment &F) {
+  uint64_t Offset = F.Offset + F.getFixedSize();
+  unsigned Size = offsetToAlignment(Offset, F.getAlignment());
+  bool AlignFixup = false;
+  if (F.hasAlignEmitNops()) {
+    AlignFixup = getBackend().relaxAlign(F, Size);
+    if (!AlignFixup)
+      while (Size % getBackend().getMinimumNopSize())
+        Size += F.getAlignment().value();
+  }
+  if (!AlignFixup && Size > F.getAlignMaxBytesToEmit())
+    Size = 0;
+  F.VarContentStart = F.getFixedSize();
+  F.VarContentEnd = F.VarContentStart + Size;
+  if (F.VarContentEnd > F.getParent()->ContentStorage.size())
+    F.getParent()->ContentStorage.resize(F.VarContentEnd);
+}
+
 bool MCAssembler::fixupNeedsRelaxation(const MCFragment &F,
                                        const MCFixup &Fixup) const {
   ++stats::FixupEvalForRelax;
@@ -938,11 +969,13 @@ void MCAssembler::relaxSFrameFragment(MCFragment &F) {
   F.clearVarFixups();
 }
 
-bool MCAssembler::relaxFragment(MCFragment &F) {
-  auto Size = computeFragmentSize(F);
+void MCAssembler::relaxFragment(MCFragment &F) {
   switch (F.getKind()) {
   default:
-    return false;
+    return;
+  case MCFragment::FT_Align:
+    relaxAlign(F);
+    break;
   case MCFragment::FT_Relaxable:
     assert(!getRelaxAll() && "Did not expect a FT_Relaxable in RelaxAll mode");
     relaxInstruction(F);
@@ -970,61 +1003,42 @@ bool MCAssembler::relaxFragment(MCFragment &F) {
     getContext().getCVContext().encodeDefRange(
         *this, static_cast<MCCVDefRangeFragment &>(F));
     break;
-  case MCFragment::FT_Fill:
-  case MCFragment::FT_Org:
-    return F.getNext()->Offset - F.Offset != Size;
   }
-  return computeFragmentSize(F) != Size;
 }
 
 void MCAssembler::layoutSection(MCSection &Sec) {
   uint64_t Offset = 0;
   for (MCFragment &F : Sec) {
     F.Offset = Offset;
-    if (F.getKind() == MCFragment::FT_Align) {
-      Offset += F.getFixedSize();
-      unsigned Size = offsetToAlignment(Offset, F.getAlignment());
-      // In the nops mode, RISC-V style linker relaxation might adjust the size
-      // and add a fixup, even if `Size` is originally 0.
-      bool AlignFixup = false;
-      if (F.hasAlignEmitNops()) {
-        AlignFixup = getBackend().relaxAlign(F, Size);
-        // If the backend does not handle the fragment specially, pad with nops,
-        // but ensure that the padding is larger than the minimum nop size.
-        if (!AlignFixup)
-          while (Size % getBackend().getMinimumNopSize())
-            Size += F.getAlignment().value();
-      }
-      if (!AlignFixup && Size > F.getAlignMaxBytesToEmit())
-        Size = 0;
-      // Update the variable tail size, offset by FixedSize to prevent ubsan
-      // pointer-overflow in evaluateFixup. The content is ignored.
-      F.VarContentStart = F.getFixedSize();
-      F.VarContentEnd = F.VarContentStart + Size;
-      if (F.VarContentEnd > F.getParent()->ContentStorage.size())
-        F.getParent()->ContentStorage.resize(F.VarContentEnd);
-      Offset += Size;
-    } else {
-      Offset += computeFragmentSize(F);
-    }
+    if (F.getKind() == MCFragment::FT_Align)
+      relaxAlign(F);
+    Offset += computeFragmentSize(F);
   }
 }
 
+// Fused relaxation and layout: a single forward pass that updates each
+// fragment's offset before processing it, so upstream size changes are
+// immediately visible.
 unsigned MCAssembler::relaxOnce(unsigned FirstStable) {
-  ++stats::RelaxationSteps;
+  uint64_t MaxIterations = 0;
   PendingErrors.clear();
-
   unsigned Res = 0;
   for (unsigned I = 0; I != FirstStable; ++I) {
-    // Assume each iteration finalizes at least one extra fragment. If the
-    // layout does not converge after N+1 iterations, bail out.
     auto &Sec = *Sections[I];
-    auto MaxIter = Sec.curFragList()->Tail->getLayoutOrder() + 1;
+    uint64_t Iters = 0;
     for (;;) {
       bool Changed = false;
-      for (MCFragment &F : Sec)
-        if (F.getKind() != MCFragment::FT_Data && relaxFragment(F))
+      uint64_t Offset = 0;
+      for (MCFragment &F : Sec) {
+        if (F.Offset != Offset)
           Changed = true;
+        Stretch = Offset - F.Offset;
+        F.Offset = Offset;
+        if (F.getKind() != MCFragment::FT_Data)
+          relaxFragment(F);
+        Offset += computeFragmentSize(F);
+      }
+      ++Iters;
 
       if (!Changed)
         break;
@@ -1032,11 +1046,15 @@ unsigned MCAssembler::relaxOnce(unsigned FirstStable) {
       // sections. Therefore, we must re-evaluate all sections.
       FirstStable = Sections.size();
       Res = I;
-      if (--MaxIter == 0)
+      // Assume each iteration finalizes at least one extra fragment. If the
+      // layout does not converge after N+1 iterations, bail out.
+      if (Iters > Sec.curFragList()->Tail->getLayoutOrder())
         break;
-      layoutSection(Sec);
     }
+    MaxIterations = std::max(MaxIterations, Iters);
   }
+  stats::RelaxationSteps += MaxIterations;
+  Stretch = 0;
   // The subsequent relaxOnce call only needs to visit Sections [0,Res) if no
   // change occurred.
   return Res;
diff --git a/llvm/lib/Target/ARM/MCTargetDesc/ARMAsmBackend.cpp b/llvm/lib/Target/ARM/MCTargetDesc/ARMAsmBackend.cpp
index 28277ece97c91..23cff1731cce5 100644
--- a/llvm/lib/Target/ARM/MCTargetDesc/ARMAsmBackend.cpp
+++ b/llvm/lib/Target/ARM/MCTargetDesc/ARMAsmBackend.cpp
@@ -1093,7 +1093,11 @@ std::optional<bool> ARMAsmBackend::evaluateFixup(const MCFragment &F,
   case ARM::fixup_t2_adr_pcrel_12:
   case ARM::fixup_arm_thumb_blx:
   case ARM::fixup_arm_thumb_cp:
-    Value = (Asm->getFragmentOffset(F) + Fixup.getOffset()) % 4;
+    // Use the pre-Stretch offset so that the AlignDown(PC, 4) compensation
+    // is consistent with the Stretch-adjusted displacement computed by the
+    // generic evaluateFixup.
+    Value =
+        (Asm->getFragmentOffset(F) - Asm->getStretch() + Fixup.getOffset()) % 4;
   }
   return {};
 }
diff --git a/llvm/lib/Target/CSKY/MCTargetDesc/CSKYAsmBackend.cpp b/llvm/lib/Target/CSKY/MCTargetDesc/CSKYAsmBackend.cpp
index d3bbe7306f2b2..0bcd3bc67db71 100644
--- a/llvm/lib/Target/CSKY/MCTargetDesc/CSKYAsmBackend.cpp
+++ b/llvm/lib/Target/CSKY/MCTargetDesc/CSKYAsmBackend.cpp
@@ -191,7 +191,11 @@ std::optional<bool> CSKYAsmBackend::evaluateFixup(const MCFragment &F,
   case CSKY::fixup_csky_pcrel_uimm16_scale4:
   case CSKY::fixup_csky_pcrel_uimm8_scale4:
   case CSKY::fixup_csky_pcrel_uimm7_scale4:
-    Value = (Asm->getFragmentOffset(F) + Fixup.getOffset()) % 4;
+    // Use the pre-Stretch offset so that the AlignDown(PC, 4) compensation
+    // is consistent with the Stretch-adjusted displacement computed by the
+    // generic evaluateFixup.
+    Value =
+        (Asm->getFragmentOffset(F) - Asm->getStretch() + Fixup.getOffset()) % 4;
   }
   return {};
 }
diff --git a/llvm/test/MC/ARM/thumb-ldr-stretch.s b/llvm/test/MC/ARM/thumb-ldr-stretch.s
new file mode 100644
index 0000000000000..1f829b8078e34
--- /dev/null
+++ b/llvm/test/MC/ARM/thumb-ldr-stretch.s
@@ -0,0 +1,66 @@
+@ RUN: llvm-mc -triple thumbv7m -filetype=obj -o %t %s
+@ RUN: llvm-objdump -d --no-show-raw-insn --triple=thumbv7m %t | FileCheck %s
+
+@ The three "b external" instructions relax from 2-byte tB to 4-byte t2B
+@ (unresolved fixup). During the fused relaxation+layout pass, the cumulative
+@ Stretch must not cause the tLDRpci instructions to appear misaligned and
+@ spuriously relax to 4-byte t2LDRpci. The .p2align 2 before the constant pool
+@ absorbs the upstream growth, keeping the targets 4-byte aligned.
+@
+@ If tLDRpci spuriously widens, the extra bytes push the cbz target past the
+@ 126-byte range, causing "out of range pc-relative fixup value".
+
+@ CHECK-LABEL: <fn>:
+@ CHECK:        0: cbz r0, 0x7e
+@ CHECK:       4c: ldr r2, [pc, #0x30]
+@ CHECK-NOT:       ldr.w
+@ CHECK:       5e: ldr r2, [pc, #0x24]
+@ CHECK-NOT:       ldr.w
+@ CHECK:       70: ldr r2, [pc, #0x14]
+@ CHECK-NOT:       ldr.w
+@ CHECK:       7e: nop
+
+	.syntax	unified
+	.text
+	.globl	fn
+	.type	fn,%function
+	.thumb_func
+fn:
+	cbz	r0, .LBB0_11
+	.rept 6
+	nop
+	.endr
+	b	.LBB0_11
+	.rept 6
+	nop
+	.endr
+	b	external
+	.rept 14
+	nop
+	.endr
+	b	.LBB0_11
+	.rept 7
+	nop
+	.endr
+	ldr.n	r2, .LCPI0_3
+	.rept 6
+	nop
+	.endr
+	b	external
+	ldr	r2, .LCPI0_4
+	.rept 6
+	nop
+	.endr
+	b	external
+	ldr.n	r2, .LCPI0_5
+	.rept 6
+	nop
+	.endr
+.LBB0_11:
+	.p2align	2
+.LCPI0_3:
+	.long	4
+.LCPI0_4:
+	.long	5
+.LCPI0_5:
+	.long	6
diff --git a/llvm/test/MC/ELF/relax-branch-align.s b/llvm/test/MC/ELF/relax-branch-align.s
index 680bd98d452c5..4010254d5116d 100644
--- a/llvm/test/MC/ELF/relax-branch-align.s
+++ b/llvm/test/MC/ELF/relax-branch-align.s
@@ -3,12 +3,11 @@
 
 ## In the initial all-short layout, `jmp target` has displacement -129 (1 beyond
 ## [-128,127]). `je far_target` relaxes from 2B to 6B, shifting `target` by
-## +4B, and `.p2align 4` absorbs this growth. With fresh offsets, the true
-## displacement is -125, which fits in a short jmp.
+## +4B, and `.p2align 4` absorbs this growth. With fresh offsets, `jmp target`
+## sees displacement -125 and stays short (2B).
 ##
-## The two-pass approach (relaxFragment, then layoutSection) evaluates the
-## backward jmp with stale target offsets (before je relaxation shifts it),
-## causing unnecessary relaxation from short (2B) to near (5B).
+## The fused relaxation+layout computes alignment padding with fresh offsets,
+## so the backward jmp sees the correct displacement and stays short (2B).
 
   .text
   .p2align 4
@@ -33,11 +32,10 @@ past_loop:
   cmpq %rdi, (%r8)
   jne loop
   cmpl %esi, 8(%r8)
-## The jmp is unnecessarily relaxed to near (5B). after is at 0x86 instead of
-## 0x83.
+## The jmp stays short (2B). after is at 0x83, not 0x86.
 # CHECK:       81: jmp 0x6 <target>
 # CHECK-EMPTY:
-# CHECK-NEXT: 0000000000000086 <after>:
+# CHECK-NEXT: 0000000000000083 <after>:
   jmp target
 after:
   popq %rbx
diff --git a/llvm/test/MC/X86/align-branch-convergence.s b/llvm/test/MC/X86/align-branch-convergence.s
new file mode 100644
index 0000000000000..a5e4e521d5646
--- /dev/null
+++ b/llvm/test/MC/X86/align-branch-convergence.s
@@ -0,0 +1,37 @@
+# REQUIRES: asserts
+## Verify that boundary alignment converges in O(1) inner iterations,
+## not O(N) where N is the number of BoundaryAlign fragments.
+## The fused relaxation+layout pass gives each BoundaryAlign fragment fresh
+## upstream offsets, so all padding is computed correctly in a single pass.
+
+# RUN: llvm-mc -filetype=obj -triple=x86_64 --stats \
+# RUN:   --x86-align-branch-boundary=32 --x86-align-branch=jcc+call %s \
+# RUN:   -o %t 2>&1 | FileCheck %s --check-prefix=STATS
+# STATS: 2 assembler - Number of assembler layout and relaxation steps
+
+# RUN: llvm-objdump -d --no-show-raw-insn %t | FileCheck %s
+# CHECK:       0: testl
+# CHECK:       2: je
+# CHECK:       4: callq
+# CHECK:      1d: nop
+# CHECK-NEXT: 20: callq
+# CHECK:      2f: testl
+# CHECK-NEXT: 31: jne
+# CHECK:      3d: nop
+# CHECK-NEXT: 40: callq
+# CHECK:      4a: retq
+
+  .p2align 5
+func:
+  testl %eax, %eax
+  je .Lend
+  .rept 8
+  callq foo
+  .endr
+  testl %ecx, %ecx
+  jne func
+  .rept 4
+  callq foo
+  .endr
+.Lend:
+  retq

``````````

</details>


https://github.com/llvm/llvm-project/pull/190318