[llvm] [MC,CodeGen] Update .prefalign for symbol-based preferred alignment (PR #184032)

Mon Mar 2 22:05:26 PST 2026

https://github.com/MaskRay updated https://github.com/llvm/llvm-project/pull/184032

>From f42b3383765cfaeea7a0654995bb6939278325f4 Mon Sep 17 00:00:00 2001
From: Fangrui Song <i at maskray.me>
Date: Sat, 28 Feb 2026 13:25:09 -0800
Subject: [PATCH 1/3] [MC,CodeGen] Update .prefalign for symbol-based preferred
 alignment

https://discourse.llvm.org/t/rfc-enhancing-function-alignment-attributes/88019/17
The recently-introduced .prefalign only worked when each function was in
its own section (-ffunction-sections), because the section size gave the
function body size needed for the alignment rule.

This led to -ffunction-sections and -fno-function-sections AsmPrinter
differences (#155529), which is rather unusual.

This patch fixes this AsmPrinter difference by extending .prefalign to
accept an end symbol and a required fill operand:

    .prefalign <pref_align>, <end_sym>, nop
    .prefalign <pref_align>, <end_sym>, <fill_byte>

The body size (end_sym_offset - start_offset) determines the alignment:

    0 < body_size < pref_align => ComputedAlign = NextPowerOf2(body_size-1)
    body_size >= pref_align    => ComputedAlign = pref_align

To also enforce a minimum alignment, emit a .p2align before .prefalign.

The fill operand is required: `nop` generates target-appropriate NOP
instructions via writeNopData, while an integer in [0,255] fills the
padding with that byte value.

In ELFObjectWriter::writeSectionHeader, sh_addralign is set to the
maximum of regular alignment values and ComputedAlign over all
FT_PrefAlign fragments.

Initialize MCSection::CurFragList to nullptr and add a null check
to skip ELFObjectWriter-created sections like .strtab/.symtab
that never receive changeSection calls.
---
 llvm/docs/Extensions.rst                      |  32 ++--
 llvm/include/llvm/MC/MCAssembler.h            |   1 +
 llvm/include/llvm/MC/MCObjectStreamer.h       |   3 +-
 llvm/include/llvm/MC/MCSection.h              |  70 +++++--
 llvm/include/llvm/MC/MCStreamer.h             |   3 +-
 llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp    |  32 ++--
 llvm/lib/MC/ELFObjectWriter.cpp               |  17 +-
 llvm/lib/MC/MCAsmStreamer.cpp                 |  14 +-
 llvm/lib/MC/MCAssembler.cpp                   |  63 +++++++
 llvm/lib/MC/MCFragment.cpp                    |   8 +-
 llvm/lib/MC/MCObjectStreamer.cpp              |  10 +-
 llvm/lib/MC/MCParser/AsmParser.cpp            |  47 ++++-
 llvm/lib/MC/MCSection.cpp                     |  10 -
 llvm/lib/MC/MCStreamer.cpp                    |   3 +-
 .../AArch64/preferred-function-alignment.ll   |   9 +-
 .../ARM/preferred-function-alignment.ll       |  13 +-
 .../CodeGen/LoongArch/linker-relaxation.ll    |   1 -
 llvm/test/CodeGen/PowerPC/code-align.ll       |   4 +-
 llvm/test/CodeGen/PowerPC/ppc64-calls.ll      |   2 +-
 llvm/test/CodeGen/SystemZ/vec-perm-14.ll      |   6 +-
 llvm/test/CodeGen/X86/eh-label.ll             |   2 +-
 llvm/test/CodeGen/X86/empty-function.ll       |   2 +-
 llvm/test/CodeGen/X86/kcfi-arity.ll           |   3 +-
 .../X86/kcfi-patchable-function-prefix.ll     |  14 +-
 llvm/test/CodeGen/X86/kcfi.ll                 |   3 +-
 llvm/test/CodeGen/X86/prefalign.ll            |  12 +-
 llvm/test/CodeGen/X86/statepoint-invoke.ll    |   4 +-
 llvm/test/MC/ELF/prefalign-errors.s           |  46 ++++-
 llvm/test/MC/ELF/prefalign.s                  | 175 +++++++++---------
 llvm/test/MC/RISCV/prefalign.s                |  34 ++++
 30 files changed, 454 insertions(+), 189 deletions(-)
 create mode 100644 llvm/test/MC/RISCV/prefalign.s

diff --git a/llvm/docs/Extensions.rst b/llvm/docs/Extensions.rst
index c8de7f59de5c0..e910c2bdff5e8 100644
--- a/llvm/docs/Extensions.rst
+++ b/llvm/docs/Extensions.rst
@@ -31,17 +31,27 @@ hexadecimal format instead of decimal if desired.
 ``.prefalign`` directive
 ------------------------
 
-The ``.prefalign`` directive sets the preferred alignment for a section,
-and enables the section's final alignment to be set in a way that is
-dependent on the section size (currently only supported with ELF).
-
-If the section size is less than the section's minimum alignment as
-determined using ``.align`` family directives, the section's alignment
-will be equal to its minimum alignment. Otherwise, if the section size is
-between the minimum alignment and the preferred alignment, the section's
-alignment will be equal to the power of 2 greater than or equal to the
-section size. Otherwise, the section's alignment will be equal to the
-preferred alignment.
+.. code-block:: gas
+
+  .prefalign <pref_align>, <end_sym>, nop
+  .prefalign <pref_align>, <end_sym>, <fill_byte>
+
+The ``.prefalign`` directive pads the current location so that the code
+between the directive and ``end_sym`` starts at an alignment that depends
+on the size of that code (currently only supported with ELF). ``align``
+must be a power of 2. ``end_sym`` must be a symbol defined in the same
+section. The fill operand is required: ``nop`` fills the padding with
+target-appropriate NOP instructions, while an integer in ``[0, 255]``
+fills the padding with that byte value.
+
+The alignment is determined by the *body_size* (the number of bytes between
+the padded start and ``end_sym``):
+
+- If *body_size* < *pref_align*: align to the smallest power of 2
+  greater than or equal to *body_size*.
+- If *body_size* ≥ *pref_align*: align to *pref_align*.
+
+To also enforce a minimum alignment, emit a ``.p2align`` before ``.prefalign``.
 
 Machine-specific Assembly Syntax
 ================================
diff --git a/llvm/include/llvm/MC/MCAssembler.h b/llvm/include/llvm/MC/MCAssembler.h
index dbae271a1c198..a7b865cb16b81 100644
--- a/llvm/include/llvm/MC/MCAssembler.h
+++ b/llvm/include/llvm/MC/MCAssembler.h
@@ -112,6 +112,7 @@ class MCAssembler {
   void relaxInstruction(MCFragment &F);
   void relaxLEB(MCFragment &F);
   void relaxBoundaryAlign(MCBoundaryAlignFragment &BF);
+  void relaxPrefAlign(MCFragment &F);
   void relaxDwarfLineAddr(MCFragment &F);
   void relaxDwarfCallFrameFragment(MCFragment &F);
   void relaxSFrameFragment(MCFragment &DF);
diff --git a/llvm/include/llvm/MC/MCObjectStreamer.h b/llvm/include/llvm/MC/MCObjectStreamer.h
index 5fc17b2b383b1..cb2694b231d5b 100644
--- a/llvm/include/llvm/MC/MCObjectStreamer.h
+++ b/llvm/include/llvm/MC/MCObjectStreamer.h
@@ -139,7 +139,8 @@ class LLVM_ABI MCObjectStreamer : public MCStreamer {
                             unsigned MaxBytesToEmit = 0) override;
   void emitCodeAlignment(Align ByteAlignment, const MCSubtargetInfo *STI,
                          unsigned MaxBytesToEmit = 0) override;
-  void emitPrefAlign(Align Alignment) override;
+  void emitPrefAlign(Align Alignment, const MCSymbol &End, bool EmitNops,
+                     uint8_t Fill, const MCSubtargetInfo &STI) override;
   void emitValueToOffset(const MCExpr *Offset, unsigned char Value,
                          SMLoc Loc) override;
   void emitDwarfLocDirective(unsigned FileNo, unsigned Line, unsigned Column,
diff --git a/llvm/include/llvm/MC/MCSection.h b/llvm/include/llvm/MC/MCSection.h
index 4c36ed567de62..8dc6a62dc77eb 100644
--- a/llvm/include/llvm/MC/MCSection.h
+++ b/llvm/include/llvm/MC/MCSection.h
@@ -53,6 +53,7 @@ class MCFragment {
     FT_Data,
     FT_Relaxable,
     FT_Align,
+    FT_PrefAlign,
     FT_Fill,
     FT_LEB,
     FT_Nops,
@@ -132,6 +133,19 @@ class MCFragment {
       // Value to use for filling padding bytes.
       int64_t Fill;
     } align;
+    struct {
+      // Symbol denoting the end of the region; always non-null.
+      const MCSymbol *End;
+      // The preferred (maximum) alignment.
+      Align PreferredAlign;
+      // The alignment computed during relaxation.
+      Align ComputedAlign;
+      // If true, fill padding with target NOPs via writeNopData; the STI field
+      // holds the subtarget info needed.  If false, fill with Fill byte.
+      bool EmitNops;
+      // Fill byte used when !EmitNops.
+      uint8_t Fill;
+    } prefalign;
     struct {
       // True if this is a sleb128, false if uleb128.
       bool IsSigned;
@@ -268,6 +282,45 @@ class MCFragment {
     return u.align.EmitNops;
   }
 
+  //== FT_PrefAlign functions
+  // Initialize an FT_PrefAlign fragment. The region starts at this fragment and
+  // ends at \p End. ComputedAlign is set during relaxation:
+  //   body_size == 0             => ComputedAlign = 1
+  //   0 < body_size < PrefAlign  => ComputedAlign = NextPowerOf2(body_size-1)
+  //   body_size >= PrefAlign     => ComputedAlign = PrefAlign
+  void makePrefAlign(Align PrefAlign, const MCSymbol &End, bool EmitNops,
+                     uint8_t Fill) {
+    Kind = FT_PrefAlign;
+    u.prefalign.End = &End;
+    u.prefalign.PreferredAlign = PrefAlign;
+    u.prefalign.EmitNops = EmitNops;
+    u.prefalign.Fill = Fill;
+  }
+  const MCSymbol &getPrefAlignEnd() const {
+    assert(Kind == FT_PrefAlign);
+    return *u.prefalign.End;
+  }
+  Align getPrefAlignPreferred() const {
+    assert(Kind == FT_PrefAlign);
+    return u.prefalign.PreferredAlign;
+  }
+  Align getPrefAlignComputed() const {
+    assert(Kind == FT_PrefAlign);
+    return u.prefalign.ComputedAlign;
+  }
+  void setPrefAlignComputed(Align A) {
+    assert(Kind == FT_PrefAlign);
+    u.prefalign.ComputedAlign = A;
+  }
+  bool getPrefAlignEmitNops() const {
+    assert(Kind == FT_PrefAlign);
+    return u.prefalign.EmitNops;
+  }
+  uint8_t getPrefAlignFill() const {
+    assert(Kind == FT_PrefAlign);
+    return u.prefalign.Fill;
+  }
+
   //== FT_LEB functions
   void makeLEB(bool IsSigned, const MCExpr *Value) {
     assert(Kind == FT_Data);
@@ -538,14 +591,14 @@ class LLVM_ABI MCSection {
 private:
   // At parse time, this holds the fragment list of the current subsection. At
   // layout time, this holds the concatenated fragment lists of all subsections.
-  FragList *CurFragList;
+  // Null until the first fragment is added to this section.
+  FragList *CurFragList = nullptr;
   // In many object file formats, this denotes the section symbol. In Mach-O,
   // this denotes an optional temporary label at the section start.
   MCSymbol *Begin;
   MCSymbol *End = nullptr;
   /// The alignment requirement of this section.
   Align Alignment;
-  MaybeAlign PreferredAlignment;
   /// The section index in the assemblers section list.
   unsigned Ordinal = 0;
   // If not -1u, the first linker-relaxable fragment's order within the
@@ -606,19 +659,6 @@ class LLVM_ABI MCSection {
       Alignment = MinAlignment;
   }
 
-  Align getPreferredAlignment() const {
-    if (!PreferredAlignment || Alignment > *PreferredAlignment)
-      return Alignment;
-    return *PreferredAlignment;
-  }
-
-  void ensurePreferredAlignment(Align PrefAlign) {
-    if (!PreferredAlignment || PrefAlign > *PreferredAlignment)
-      PreferredAlignment = PrefAlign;
-  }
-
-  Align getAlignmentForObjectFile(uint64_t Size) const;
-
   unsigned getOrdinal() const { return Ordinal; }
   void setOrdinal(unsigned Value) { Ordinal = Value; }
 
diff --git a/llvm/include/llvm/MC/MCStreamer.h b/llvm/include/llvm/MC/MCStreamer.h
index 148d69ae5098f..05cd0f214c025 100644
--- a/llvm/include/llvm/MC/MCStreamer.h
+++ b/llvm/include/llvm/MC/MCStreamer.h
@@ -845,7 +845,8 @@ class LLVM_ABI MCStreamer {
   virtual void emitCodeAlignment(Align Alignment, const MCSubtargetInfo *STI,
                                  unsigned MaxBytesToEmit = 0);
 
-  virtual void emitPrefAlign(Align A);
+  virtual void emitPrefAlign(Align A, const MCSymbol &End, bool EmitNops,
+                             uint8_t Fill, const MCSubtargetInfo &STI);
 
   /// Emit some number of copies of \p Value until the byte offset \p
   /// Offset is reached.
diff --git a/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp b/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
index 083b83567e47f..1d9550c53db09 100644
--- a/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
@@ -1046,19 +1046,20 @@ void AsmPrinter::emitFunctionHeader() {
 
   emitLinkage(&F, CurrentFnSym);
   if (MAI->hasFunctionAlignment()) {
-    // Make sure that the preferred alignment directive (.prefalign) is
-    // supported before using it. The preferred alignment directive will not
-    // have the intended effect unless function sections are enabled, so check
-    // for that as well.
+    Align PrefAlign = MF->getPreferredAlignment();
+    // Use .prefalign when the integrated assembler supports it and the target
+    // has a preferred alignment distinct from the minimum. The end symbol must
+    // be created here, before the function body, so that .prefalign can
+    // reference it; emitFunctionBody will emit the label at the function end.
     if (MAI->useIntegratedAssembler() && MAI->hasPreferredAlignment() &&
-        TM.getFunctionSections()) {
-      Align Alignment = MF->getAlignment();
-      Align PrefAlignment = MF->getPreferredAlignment();
-      emitAlignment(Alignment, &F);
-      if (Alignment != PrefAlignment)
-        OutStreamer->emitPrefAlign(PrefAlignment);
+        MF->getAlignment() != PrefAlign) {
+      emitAlignment(MF->getAlignment(), &F);
+      CurrentFnEnd = createTempSymbol("func_end");
+      OutStreamer->emitPrefAlign(PrefAlign, *CurrentFnEnd,
+                                 /*EmitNops=*/true, /*Fill=*/0,
+                                 getSubtargetInfo());
     } else {
-      emitAlignment(MF->getPreferredAlignment(), &F);
+      emitAlignment(PrefAlign, &F);
     }
   }
 
@@ -2365,9 +2366,11 @@ void AsmPrinter::emitFunctionBody() {
   // SPIR-V supports label instructions only inside a block, not after the
   // function body.
   if (TT.getObjectFormat() != Triple::SPIRV &&
-      (EmitFunctionSize || needFuncLabels(*MF, *this))) {
-    // Create a symbol for the end of function.
-    CurrentFnEnd = createTempSymbol("func_end");
+      (EmitFunctionSize || needFuncLabels(*MF, *this) || CurrentFnEnd)) {
+    // Create a symbol for the end of function, if not already pre-created
+    // (e.g. for .prefalign directive).
+    if (!CurrentFnEnd)
+      CurrentFnEnd = createTempSymbol("func_end");
     OutStreamer->emitLabel(CurrentFnEnd);
   }
 
@@ -3121,6 +3124,7 @@ void AsmPrinter::SetupMachineFunction(MachineFunction &MF) {
   CurrentFnSymForSize = CurrentFnSym;
   CurrentFnBegin = nullptr;
   CurrentFnBeginLocal = nullptr;
+  CurrentFnEnd = nullptr;
   CurrentSectionBeginSym = nullptr;
   CurrentFnCallsiteEndSymbols.clear();
   MBBSectionRanges.clear();
diff --git a/llvm/lib/MC/ELFObjectWriter.cpp b/llvm/lib/MC/ELFObjectWriter.cpp
index b23fa92ac194d..408aee4f6bc68 100644
--- a/llvm/lib/MC/ELFObjectWriter.cpp
+++ b/llvm/lib/MC/ELFObjectWriter.cpp
@@ -912,10 +912,19 @@ void ELFWriter::writeSectionHeader(uint32_t GroupSymbolIndex, uint64_t Offset,
       sh_link = Sym->getSection().getOrdinal();
   }
 
-  writeSectionHeaderEntry(
-      StrTabBuilder.getOffset(Section.getName()), Section.getType(),
-      Section.getFlags(), 0, Offset, Size, sh_link, sh_info,
-      Section.getAlignmentForObjectFile(Size), Section.getEntrySize());
+  // Compute sh_addralign as the maximum ComputedAlign over all FT_PrefAlign
+  // fragments, falling back to the section's minimum alignment. curFragList()
+  // can be nullptr for ELFObjectWriter-created sections like .strtab and
+  // .symtab.
+  Align SHAlign = Section.getAlign();
+  if (Section.curFragList())
+    for (const MCFragment &F : Section)
+      if (F.getKind() == MCFragment::FT_PrefAlign)
+        SHAlign = std::max(SHAlign, F.getPrefAlignComputed());
+  writeSectionHeaderEntry(StrTabBuilder.getOffset(Section.getName()),
+                          Section.getType(), Section.getFlags(), 0, Offset,
+                          Size, sh_link, sh_info, SHAlign,
+                          Section.getEntrySize());
 }
 
 void ELFWriter::writeSectionHeaders() {
diff --git a/llvm/lib/MC/MCAsmStreamer.cpp b/llvm/lib/MC/MCAsmStreamer.cpp
index 1a50ae43cd9c9..78b5e8309d004 100644
--- a/llvm/lib/MC/MCAsmStreamer.cpp
+++ b/llvm/lib/MC/MCAsmStreamer.cpp
@@ -286,7 +286,8 @@ class MCAsmStreamer final : public MCStreamer {
 
   void emitCodeAlignment(Align Alignment, const MCSubtargetInfo *STI,
                          unsigned MaxBytesToEmit = 0) override;
-  void emitPrefAlign(Align Alignment) override;
+  void emitPrefAlign(Align Alignment, const MCSymbol &End, bool EmitNops,
+                     uint8_t Fill, const MCSubtargetInfo &STI) override;
 
   void emitValueToOffset(const MCExpr *Offset,
                          unsigned char Value,
@@ -1562,8 +1563,15 @@ void MCAsmStreamer::emitCodeAlignment(Align Alignment,
     emitAlignmentDirective(Alignment.value(), std::nullopt, 1, MaxBytesToEmit);
 }
 
-void MCAsmStreamer::emitPrefAlign(Align Alignment) {
-  OS << "\t.prefalign\t" << Alignment.value();
+void MCAsmStreamer::emitPrefAlign(Align Alignment, const MCSymbol &End,
+                                  bool EmitNops, uint8_t Fill,
+                                  const MCSubtargetInfo &) {
+  OS << "\t.prefalign\t" << Alignment.value() << ", ";
+  End.print(OS, MAI);
+  if (EmitNops)
+    OS << ", nop";
+  else
+    OS << ", " << static_cast<unsigned>(Fill);
   EmitEOL();
 }
 
diff --git a/llvm/lib/MC/MCAssembler.cpp b/llvm/lib/MC/MCAssembler.cpp
index e649ea7fedabe..f6f64a6f64f3d 100644
--- a/llvm/lib/MC/MCAssembler.cpp
+++ b/llvm/lib/MC/MCAssembler.cpp
@@ -219,6 +219,9 @@ uint64_t MCAssembler::computeFragmentSize(const MCFragment &F) const {
     return Size;
   }
 
+  case MCFragment::FT_PrefAlign:
+    return F.getSize();
+
   case MCFragment::FT_Nops:
     return cast<MCNopsFragment>(F).getNumBytes();
 
@@ -451,6 +454,23 @@ static void writeFragment(raw_ostream &OS, const MCAssembler &Asm,
     }
   } break;
 
+  case MCFragment::FT_PrefAlign: {
+    OS << StringRef(F.getContents().data(), F.getContents().size());
+    uint64_t PadSize = FragmentSize - F.getContents().size();
+    if (F.getPrefAlignEmitNops()) {
+      if (!Asm.getBackend().writeNopData(OS, PadSize, F.getSubtargetInfo()))
+        reportFatalInternalError("unable to write nop sequence of " +
+                                 Twine(PadSize) + " bytes");
+    } else if (F.getPrefAlignFill() == 0) {
+      OS.write_zeros(PadSize);
+    } else {
+      char B = char(F.getPrefAlignFill());
+      for (uint64_t I = 0; I < PadSize; ++I)
+        OS << B;
+    }
+    break;
+  }
+
   case MCFragment::FT_Fill: {
     ++stats::EmittedFillFragments;
     const MCFillFragment &FF = cast<MCFillFragment>(F);
@@ -584,6 +604,10 @@ void MCAssembler::writeSectionData(raw_ostream &OS,
         // 0.
         assert(F.getAlignFill() == 0 && "Invalid align in virtual section!");
         break;
+      case MCFragment::FT_PrefAlign:
+        assert(!F.getPrefAlignEmitNops() && F.getPrefAlignFill() == 0 &&
+               "Invalid align in BSS");
+        break;
       case MCFragment::FT_Fill:
         HasNonZero = cast<MCFillFragment>(F).getValue() != 0;
         break;
@@ -884,6 +908,39 @@ void MCAssembler::relaxBoundaryAlign(MCBoundaryAlignFragment &BF) {
   BF.setSize(NewSize);
 }
 
+void MCAssembler::relaxPrefAlign(MCFragment &F) {
+  const MCSymbol &End = F.getPrefAlignEnd();
+  if (!End.getFragment() || End.getFragment()->getParent() != F.getParent()) {
+    recordError(SMLoc(), "end symbol '" + End.getName() +
+                             "' must be a symbol in the current section");
+    return;
+  }
+  uint64_t EndOffset;
+  if (!getSymbolOffset(End, EndOffset))
+    return;
+  // RawStart is the start of the (variable) padding region; StartOffset is
+  // the start of the body (RawStart plus current padding). BodySize is
+  // measured from StartOffset, not RawStart, so that padding is not counted
+  // as part of the body.
+  uint64_t RawStart = F.Offset + F.getFixedSize();
+  uint64_t StartOffset = RawStart + F.getVarSize();
+  Align NewAlign;
+  if (StartOffset < EndOffset) {
+    uint64_t BodySize = EndOffset - StartOffset;
+    if (BodySize < F.getPrefAlignPreferred().value())
+      NewAlign = Align(NextPowerOf2(BodySize - 1));
+    else
+      NewAlign = F.getPrefAlignPreferred();
+  }
+  F.setPrefAlignComputed(NewAlign);
+  // Compute padding to align the body start to NewAlign.
+  uint64_t NewPadSize = offsetToAlignment(RawStart, NewAlign);
+  F.VarContentStart = F.getFixedSize();
+  F.VarContentEnd = F.VarContentStart + NewPadSize;
+  if (F.VarContentEnd > F.getParent()->ContentStorage.size())
+    F.getParent()->ContentStorage.resize(F.VarContentEnd);
+}
+
 void MCAssembler::relaxDwarfLineAddr(MCFragment &F) {
   if (getBackend().relaxDwarfLineAddr(F))
     return;
@@ -962,6 +1019,9 @@ bool MCAssembler::relaxFragment(MCFragment &F) {
   case MCFragment::FT_BoundaryAlign:
     relaxBoundaryAlign(static_cast<MCBoundaryAlignFragment &>(F));
     break;
+  case MCFragment::FT_PrefAlign:
+    relaxPrefAlign(F);
+    break;
   case MCFragment::FT_CVInlineLines:
     getContext().getCVContext().encodeInlineLineTable(
         *this, static_cast<MCCVInlineLineTableFragment &>(F));
@@ -979,6 +1039,9 @@ bool MCAssembler::relaxFragment(MCFragment &F) {
 
 void MCAssembler::layoutSection(MCSection &Sec) {
   uint64_t Offset = 0;
+  // Note: fragments are not relaxed here. Some fragments depend on
+  // downstream symbols whose offsets have not been set in this pass yet.
+  // They are instead relaxed by relaxFragment.
   for (MCFragment &F : Sec) {
     F.Offset = Offset;
     if (F.getKind() == MCFragment::FT_Align) {
diff --git a/llvm/lib/MC/MCFragment.cpp b/llvm/lib/MC/MCFragment.cpp
index 85d1c5888f1da..21a304da0bb4f 100644
--- a/llvm/lib/MC/MCFragment.cpp
+++ b/llvm/lib/MC/MCFragment.cpp
@@ -55,7 +55,8 @@ LLVM_DUMP_METHOD void MCFragment::dump() const {
   case MCFragment::FT_DwarfFrame:    OS << "DwarfCallFrame"; break;
   case MCFragment::FT_SFrame:        OS << "SFrame"; break;
   case MCFragment::FT_LEB:           OS << "LEB"; break;
-  case MCFragment::FT_BoundaryAlign: OS<<"BoundaryAlign"; break;
+  case MCFragment::FT_BoundaryAlign: OS << "BoundaryAlign"; break;
+  case MCFragment::FT_PrefAlign:     OS << "PrefAlign"; break;
   case MCFragment::FT_SymbolId:      OS << "SymbolId"; break;
   case MCFragment::FT_CVInlineLines: OS << "CVInlineLineTable"; break;
   case MCFragment::FT_CVDefRange:    OS << "CVDefRangeTable"; break;
@@ -170,6 +171,11 @@ LLVM_DUMP_METHOD void MCFragment::dump() const {
        << " Size:" << BF->getSize();
     break;
   }
+  case MCFragment::FT_PrefAlign:
+    OS << " PrefAlign:" << getPrefAlignPreferred().value()
+       << " End:" << getPrefAlignEnd().getName()
+       << " ComputedAlign:" << getPrefAlignComputed().value();
+    break;
   case MCFragment::FT_SymbolId: {
     const auto *F = cast<MCSymbolIdFragment>(this);
     OS << " Sym:" << F->getSymbol();
diff --git a/llvm/lib/MC/MCObjectStreamer.cpp b/llvm/lib/MC/MCObjectStreamer.cpp
index 58aa7945d7393..f6d1ae7e50295 100644
--- a/llvm/lib/MC/MCObjectStreamer.cpp
+++ b/llvm/lib/MC/MCObjectStreamer.cpp
@@ -690,8 +690,14 @@ void MCObjectStreamer::emitCodeAlignment(Align Alignment,
   F->STI = STI;
 }
 
-void MCObjectStreamer::emitPrefAlign(Align Alignment) {
-  getCurrentSectionOnly()->ensurePreferredAlignment(Alignment);
+void MCObjectStreamer::emitPrefAlign(Align Alignment, const MCSymbol &End,
+                                     bool EmitNops, uint8_t Fill,
+                                     const MCSubtargetInfo &STI) {
+  auto *F = getCurrentFragment();
+  F->makePrefAlign(Alignment, End, EmitNops, Fill);
+  if (EmitNops)
+    F->STI = &STI;
+  newFragment();
 }
 
 void MCObjectStreamer::emitValueToOffset(const MCExpr *Offset,
diff --git a/llvm/lib/MC/MCParser/AsmParser.cpp b/llvm/lib/MC/MCParser/AsmParser.cpp
index 3452708bcec8a..c30c4d09797f0 100644
--- a/llvm/lib/MC/MCParser/AsmParser.cpp
+++ b/llvm/lib/MC/MCParser/AsmParser.cpp
@@ -3468,13 +3468,54 @@ bool AsmParser::parseDirectivePrefAlign() {
   int64_t Alignment;
   if (checkForValidSection() || parseAbsoluteExpression(Alignment))
     return true;
-  if (parseEOL())
-    return true;
 
   if (!isPowerOf2_64(Alignment))
     return Error(AlignmentLoc, "alignment must be a power of 2");
-  getStreamer().emitPrefAlign(Align(Alignment));
 
+  // Parse end symbol: .prefalign N, sym
+  SMLoc SymLoc = getLexer().getLoc();
+  if (!getLexer().is(AsmToken::Comma))
+    return Error(SymLoc, "expected ',' and end symbol");
+  Lex();
+  StringRef Name;
+  SymLoc = getLexer().getLoc();
+  if (parseIdentifier(Name))
+    return Error(SymLoc, "expected symbol name");
+  MCSymbol *End = getContext().getOrCreateSymbol(Name);
+
+  // Parse fill operand: integer byte [0, 255] or "nop".
+  SMLoc FillLoc = getLexer().getLoc();
+  if (!getLexer().is(AsmToken::Comma))
+    return Error(FillLoc, "expected ',' followed by 'nops' or fill byte");
+  Lex();
+
+  bool EmitNops = false;
+  uint8_t Fill = 0;
+  SMLoc FillLoc2 = getLexer().getLoc();
+  if (getLexer().is(AsmToken::Integer)) {
+    int64_t FillVal = getLexer().getTok().getIntVal();
+    Lex();
+    if (FillVal < 0 || FillVal > 255)
+      return Error(FillLoc2, "fill value must be in range [0, 255]");
+    Fill = static_cast<uint8_t>(FillVal);
+  } else if (getLexer().is(AsmToken::Identifier) &&
+             getLexer().getTok().getIdentifier() == "nop") {
+    EmitNops = true;
+    Lex();
+  } else {
+    return Error(FillLoc2, "expected integer fill byte or 'nop'");
+  }
+
+  if (parseEOL())
+    return true;
+  if ((EmitNops || Fill != 0) &&
+      getStreamer().getCurrentSectionOnly()->isBssSection())
+    return Error(FillLoc, "non-zero fill in BSS section '" +
+                              getStreamer().getCurrentSectionOnly()->getName() +
+                              "'");
+
+  getStreamer().emitPrefAlign(Align(Alignment), *End, EmitNops, Fill,
+                              getTargetParser().getSTI());
   return false;
 }
 
diff --git a/llvm/lib/MC/MCSection.cpp b/llvm/lib/MC/MCSection.cpp
index 8285379eeaf81..a668e7919b7b9 100644
--- a/llvm/lib/MC/MCSection.cpp
+++ b/llvm/lib/MC/MCSection.cpp
@@ -30,16 +30,6 @@ MCSymbol *MCSection::getEndSymbol(MCContext &Ctx) {
   return End;
 }
 
-Align MCSection::getAlignmentForObjectFile(uint64_t Size) const {
-  if (Size < getAlign().value())
-    return getAlign();
-
-  if (Size < getPreferredAlignment().value())
-    return Align(NextPowerOf2(Size - 1));
-
-  return getPreferredAlignment();
-}
-
 bool MCSection::hasEnded() const { return End && End->isInSection(); }
 
 #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
diff --git a/llvm/lib/MC/MCStreamer.cpp b/llvm/lib/MC/MCStreamer.cpp
index a913528d53a70..4133ea1235227 100644
--- a/llvm/lib/MC/MCStreamer.cpp
+++ b/llvm/lib/MC/MCStreamer.cpp
@@ -1354,7 +1354,8 @@ void MCStreamer::emitFill(const MCExpr &NumBytes, uint64_t Value, SMLoc Loc) {}
 void MCStreamer::emitFill(const MCExpr &NumValues, int64_t Size, int64_t Expr,
                           SMLoc Loc) {}
 void MCStreamer::emitValueToAlignment(Align, int64_t, uint8_t, unsigned) {}
-void MCStreamer::emitPrefAlign(Align A) {}
+void MCStreamer::emitPrefAlign(Align A, const MCSymbol &End, bool EmitNops,
+                               uint8_t Fill, const MCSubtargetInfo &STI) {}
 void MCStreamer::emitCodeAlignment(Align Alignment, const MCSubtargetInfo *STI,
                                    unsigned MaxBytesToEmit) {}
 void MCStreamer::emitValueToOffset(const MCExpr *Offset, unsigned char Value,
diff --git a/llvm/test/CodeGen/AArch64/preferred-function-alignment.ll b/llvm/test/CodeGen/AArch64/preferred-function-alignment.ll
index a6cb7123e5af4..d272dae33e814 100644
--- a/llvm/test/CodeGen/AArch64/preferred-function-alignment.ll
+++ b/llvm/test/CodeGen/AArch64/preferred-function-alignment.ll
@@ -29,10 +29,11 @@ define void @test() {
 }
 
 ; CHECK-LABEL: test
-; ALIGN2: .p2align 2
-; ALIGN3: .p2align 3
-; ALIGN4: .p2align 4
-; ALIGN5: .p2align 5
+; CHECK: .p2align 2
+; ALIGN2-NOT: .prefalign
+; ALIGN3-NEXT: .prefalign 8
+; ALIGN4-NEXT: .prefalign 16
+; ALIGN5-NEXT: .prefalign 32
 
 define void @test_optsize() optsize {
   ret void
diff --git a/llvm/test/CodeGen/ARM/preferred-function-alignment.ll b/llvm/test/CodeGen/ARM/preferred-function-alignment.ll
index 2fc67905f6db7..3ae20da251df1 100644
--- a/llvm/test/CodeGen/ARM/preferred-function-alignment.ll
+++ b/llvm/test/CodeGen/ARM/preferred-function-alignment.ll
@@ -1,15 +1,18 @@
 ; RUN: llc -mtriple=arm-none-eabi -mcpu=cortex-m85 < %s | FileCheck --check-prefixes=CHECK,ALIGN-64,ALIGN-CS-16 %s
 ; RUN: llc -mtriple=arm-none-eabi -mcpu=cortex-m23 < %s | FileCheck --check-prefixes=CHECK,ALIGN-16,ALIGN-CS-16 %s
 
-; RUN: llc -mtriple=arm-none-eabi -mcpu=cortex-a5 < %s  | FileCheck --check-prefixes=CHECK,ALIGN-32,ALIGN-CS-32 %s
-; RUN: llc -mtriple=arm-none-eabi -mcpu=cortex-m33 < %s | FileCheck --check-prefixes=CHECK,ALIGN-32,ALIGN-CS-16 %s
-; RUN: llc -mtriple=arm-none-eabi -mcpu=cortex-m55 < %s | FileCheck --check-prefixes=CHECK,ALIGN-32,ALIGN-CS-16 %s
+; RUN: llc -mtriple=arm-none-eabi -mcpu=cortex-a5 < %s  | FileCheck --check-prefixes=CHECK,ALIGN-32A,ALIGN-CS-32 %s
+; RUN: llc -mtriple=arm-none-eabi -mcpu=cortex-m33 < %s | FileCheck --check-prefixes=CHECK,ALIGN-32T,ALIGN-CS-16 %s
+; RUN: llc -mtriple=arm-none-eabi -mcpu=cortex-m55 < %s | FileCheck --check-prefixes=CHECK,ALIGN-32T,ALIGN-CS-16 %s
 ; RUN: llc -mtriple=arm-none-eabi -mcpu=cortex-m7 < %s | FileCheck --check-prefixes=CHECK,ALIGN-64,ALIGN-CS-16 %s
 
 ; CHECK-LABEL: test
 ; ALIGN-16: .p2align 1
-; ALIGN-32: .p2align 2
-; ALIGN-64: .p2align 3
+; ALIGN-32A: .p2align 2
+; ALIGN-32T: .p2align 1
+; ALIGN-32T-NEXT: .prefalign 4
+; ALIGN-64: .p2align 1
+; ALIGN-64-NEXT: .prefalign 8
 
 define void @test() {
   ret void
diff --git a/llvm/test/CodeGen/LoongArch/linker-relaxation.ll b/llvm/test/CodeGen/LoongArch/linker-relaxation.ll
index 6b197bc578919..873a1f9168323 100644
--- a/llvm/test/CodeGen/LoongArch/linker-relaxation.ll
+++ b/llvm/test/CodeGen/LoongArch/linker-relaxation.ll
@@ -77,7 +77,6 @@ declare dso_local void @callee3() nounwind
 ; RELAX-NEXT:       R_LARCH_RELAX - 0x0
 ; CHECK-RELOC-NEXT: R_LARCH_PCALA_LO12 g_i1 0x0
 ; RELAX-NEXT:       R_LARCH_RELAX - 0x0
-; RELAX-NEXT:       R_LARCH_ALIGN - 0x1C
 ; CHECK-RELOC-NEXT: R_LARCH_CALL36 callee1 0x0
 ; RELAX-NEXT:       R_LARCH_RELAX - 0x0
 ; CHECK-RELOC-NEXT: R_LARCH_CALL36 callee2 0x0
diff --git a/llvm/test/CodeGen/PowerPC/code-align.ll b/llvm/test/CodeGen/PowerPC/code-align.ll
index 805873816c4d9..841636d65d87e 100644
--- a/llvm/test/CodeGen/PowerPC/code-align.ll
+++ b/llvm/test/CodeGen/PowerPC/code-align.ll
@@ -20,9 +20,7 @@ entry:
   ret i32 %mul
 
 ; CHECK-LABEL: .globl  foo
-; GENERIC: .p2align  2
-; BASIC: .p2align  4
-; PWR: .p2align  4
+; CHECK: .p2align  2
 ; CHECK: @foo
 }
 
diff --git a/llvm/test/CodeGen/PowerPC/ppc64-calls.ll b/llvm/test/CodeGen/PowerPC/ppc64-calls.ll
index 2c2743f5400d9..67ff626b4f680 100644
--- a/llvm/test/CodeGen/PowerPC/ppc64-calls.ll
+++ b/llvm/test/CodeGen/PowerPC/ppc64-calls.ll
@@ -19,7 +19,7 @@ define dso_local void @test_direct() nounwind readnone {
   tail call void @foo() nounwind
 ; Because of tail call optimization, it can be 'b' instruction.
 ; CHECK: [[BR:b[l]?]] foo
-; CHECK-NOT: nop
+; CHECK-NOT: {{^[[:space:]]+}}nop
   ret void
 }
 
diff --git a/llvm/test/CodeGen/SystemZ/vec-perm-14.ll b/llvm/test/CodeGen/SystemZ/vec-perm-14.ll
index 0b392676fa3ec..5d437ce8b091d 100644
--- a/llvm/test/CodeGen/SystemZ/vec-perm-14.ll
+++ b/llvm/test/CodeGen/SystemZ/vec-perm-14.ll
@@ -61,7 +61,8 @@ define <4 x i8> @fun1(<2 x i8> %arg) {
 ; CHECK-NEXT:        .space  1
 ; CHECK-NEXT:        .text
 ; CHECK-NEXT:        .globl  fun1
-; CHECK-NEXT:        .p2align        4
+; CHECK-NEXT:        .p2align        1
+; CHECK-NEXT:        .prefalign      16, .Lfunc_end1, nop
 ; CHECK-NEXT:        .type   fun1, at function
 ; CHECK-NEXT: fun1:                                  # @fun1
 ; CHECK-NEXT:        .cfi_startproc
@@ -96,7 +97,8 @@ define <4 x i8> @fun2(<2 x i8> %arg) {
 ; CHECK-NEXT:        .space  1
 ; CHECK-NEXT:        .text
 ; CHECK-NEXT:        .globl  fun2
-; CHECK-NEXT:        .p2align        4
+; CHECK-NEXT:        .p2align        1
+; CHECK-NEXT:        .prefalign      16, .Lfunc_end2, nop
 ; CHECK-NEXT:        .type   fun2, at function
 ; CHECK-NEXT:fun2:                                   # @fun2
 ; CHECK-NEXT:        .cfi_startproc
diff --git a/llvm/test/CodeGen/X86/eh-label.ll b/llvm/test/CodeGen/X86/eh-label.ll
index 78611000e18dd..b3954700463eb 100644
--- a/llvm/test/CodeGen/X86/eh-label.ll
+++ b/llvm/test/CodeGen/X86/eh-label.ll
@@ -7,7 +7,7 @@ define void @f() personality ptr @g {
 bb0:
   call void asm ".Lfunc_end0:", ""()
 ; CHECK: #APP
-; CHECK-NEXT: .Lfunc_end0:
+; CHECK-NEXT: .Lfunc_end0{{.*}}:
 ; CHECK-NEXT: #NO_APP
 
   invoke void @g() to label %bb2 unwind label %bb1
diff --git a/llvm/test/CodeGen/X86/empty-function.ll b/llvm/test/CodeGen/X86/empty-function.ll
index 7d908311ec8dc..bf05c8e359130 100644
--- a/llvm/test/CodeGen/X86/empty-function.ll
+++ b/llvm/test/CodeGen/X86/empty-function.ll
@@ -16,7 +16,7 @@ entry:
 ; CHECK-LABEL: f:
 ; WIN32: nop
 ; WIN64: nop
-; LINUX-NOT: nop
+; LINUX-NOT: {{^[[:space:]]+}}nop
 ; LINUX-NOT: ud2
 
 }
diff --git a/llvm/test/CodeGen/X86/kcfi-arity.ll b/llvm/test/CodeGen/X86/kcfi-arity.ll
index 5a19bcd7835ea..d84e7aae9a07c 100644
--- a/llvm/test/CodeGen/X86/kcfi-arity.ll
+++ b/llvm/test/CodeGen/X86/kcfi-arity.ll
@@ -3,7 +3,8 @@
 ; RUN: llc -mtriple=x86_64-unknown-linux-gnu -verify-machineinstrs -stop-after=finalize-isel < %s | FileCheck %s --check-prefixes=MIR,ISEL
 ; RUN: llc -mtriple=x86_64-unknown-linux-gnu -verify-machineinstrs -stop-after=kcfi < %s | FileCheck %s --check-prefixes=MIR,KCFI
 
-; ASM:       .p2align 4
+; ASM:       .p2align 2
+; ASM:       .prefalign 16
 ; ASM:       .type __cfi_f1, at function
 ; ASM-LABEL: __cfi_f1:
 ; ASM-NEXT:    nop
diff --git a/llvm/test/CodeGen/X86/kcfi-patchable-function-prefix.ll b/llvm/test/CodeGen/X86/kcfi-patchable-function-prefix.ll
index 1b7bd7835e890..cc99739febe41 100644
--- a/llvm/test/CodeGen/X86/kcfi-patchable-function-prefix.ll
+++ b/llvm/test/CodeGen/X86/kcfi-patchable-function-prefix.ll
@@ -1,6 +1,6 @@
 ; RUN: llc -mtriple=x86_64-unknown-linux-gnu -verify-machineinstrs < %s | FileCheck %s
 
-; CHECK:          .p2align 4
+; CHECK:          .prefalign 16
 ; CHECK-LABEL:    __cfi_f1:
 ; CHECK-COUNT-11:   nop
 ; CHECK-NEXT:       movl $12345678, %eax
@@ -13,9 +13,9 @@ define void @f1(ptr noundef %x) !kcfi_type !1 {
   ret void
 }
 
-; CHECK:          .p2align 4
+; CHECK:          .prefalign 16
 ; CHECK-NOT:      __cfi_f2:
-; CHECK-NOT:        nop
+; CHECK-NOT:        {{^[[:space:]]+}}nop
 ; CHECK-LABEL:    f2:
 define void @f2(ptr noundef %x) {
 ; CHECK:            addl -4(%r{{..}}), %r10d
@@ -23,9 +23,9 @@ define void @f2(ptr noundef %x) {
   ret void
 }
 
-; CHECK:          .p2align 4
+; CHECK:          .prefalign 16
 ; CHECK-LABEL:    __cfi_f3:
-; CHECK-NOT:        nop
+; CHECK-NOT:        {{^[[:space:]]+}}nop
 ; CHECK-NEXT:       movl $12345678, %eax
 ; CHECK-COUNT-11:   nop
 ; CHECK-LABEL:    f3:
@@ -35,9 +35,9 @@ define void @f3(ptr noundef %x) #0 !kcfi_type !1 {
   ret void
 }
 
-; CHECK:          .p2align 4
+; CHECK:          .prefalign 16
 ; CHECK-NOT:      __cfi_f4:
-; CHECK-COUNT-16:   nop
+; CHECK-COUNT-16:   {{^[[:space:]]+}}nop
 ; CHECK-LABEL:    f4:
 define void @f4(ptr noundef %x) #0 {
 ; CHECK:            addl -15(%r{{..}}), %r10d
diff --git a/llvm/test/CodeGen/X86/kcfi.ll b/llvm/test/CodeGen/X86/kcfi.ll
index fd93b8e3d4188..62cb78e770d6c 100644
--- a/llvm/test/CodeGen/X86/kcfi.ll
+++ b/llvm/test/CodeGen/X86/kcfi.ll
@@ -2,7 +2,8 @@
 ; RUN: llc -mtriple=x86_64-unknown-linux-gnu -verify-machineinstrs -stop-after=finalize-isel < %s | FileCheck %s --check-prefixes=MIR,ISEL
 ; RUN: llc -mtriple=x86_64-unknown-linux-gnu -verify-machineinstrs -stop-after=kcfi < %s | FileCheck %s --check-prefixes=MIR,KCFI
 
-; ASM:       .p2align 4
+; ASM:       .p2align 2
+; ASM:       .prefalign 16
 ; ASM:       .type __cfi_f1, at function
 ; ASM-LABEL: __cfi_f1:
 ; ASM-NEXT:    nop
diff --git a/llvm/test/CodeGen/X86/prefalign.ll b/llvm/test/CodeGen/X86/prefalign.ll
index 062cf740eabeb..45b700e611fa0 100644
--- a/llvm/test/CodeGen/X86/prefalign.ll
+++ b/llvm/test/CodeGen/X86/prefalign.ll
@@ -1,12 +1,11 @@
-; RUN: llc < %s | FileCheck --check-prefixes=CHECK,NOFS %s
-; RUN: llc -function-sections < %s | FileCheck --check-prefixes=CHECK,FS %s
+; RUN: llc < %s | FileCheck %s
+; RUN: llc -function-sections < %s | FileCheck %s
 
 target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
 target triple = "x86_64-unknown-linux-gnu"
 
 ; CHECK: .globl f1
-; NOFS-NEXT: .p2align 4
-; FS-NEXT: .prefalign 16
+; CHECK-NEXT: .prefalign 16
 define void @f1() {
   ret void
 }
@@ -19,9 +18,8 @@ define void @f2() prefalign(1) {
 }
 
 ; CHECK: .globl f3
-; NOFS-NEXT: .p2align 2
-; FS-NEXT: .p2align 1
-; FS-NEXT: .prefalign 4
+; CHECK-NEXT: .p2align 1
+; CHECK-NEXT: .prefalign 4
 define void @f3() align 2 prefalign(4) {
   ret void
 }
diff --git a/llvm/test/CodeGen/X86/statepoint-invoke.ll b/llvm/test/CodeGen/X86/statepoint-invoke.ll
index 34dbc21a8a8cb..b9400974b4136 100644
--- a/llvm/test/CodeGen/X86/statepoint-invoke.ll
+++ b/llvm/test/CodeGen/X86/statepoint-invoke.ll
@@ -56,7 +56,7 @@ exceptional_return:
 ; CHECK: .uleb128  .Ltmp{{[0-9]+}}-.Ltmp{{[0-9]+}}
 ; CHECK: .uleb128  .Ltmp{{[0-9]+}}-.Lfunc_begin{{[0-9]+}}
 ; CHECK: .byte  0
-; CHECK: .p2align 4
+; CHECK: .prefalign 16
 
 define ptr addrspace(1) @test_result(ptr addrspace(1) %obj,
 ; CHECK-LABEL: test_result:
@@ -99,7 +99,7 @@ exceptional_return:
 ; CHECK: .uleb128 .Ltmp{{[0-9]+}}-.Ltmp{{[0-9]+}}
 ; CHECK: .uleb128 .Ltmp{{[0-9]+}}-.Lfunc_begin{{[0-9]+}}
 ; CHECK: .byte 0
-; CHECK: .p2align 4
+; CHECK: .prefalign 16
 
 define ptr addrspace(1) @test_same_val(i1 %cond, ptr addrspace(1) %val1, ptr addrspace(1) %val2, ptr addrspace(1) %val3)
 ; CHECK-LABEL: test_same_val:
diff --git a/llvm/test/MC/ELF/prefalign-errors.s b/llvm/test/MC/ELF/prefalign-errors.s
index 802a78fde7c44..35f35834e308e 100644
--- a/llvm/test/MC/ELF/prefalign-errors.s
+++ b/llvm/test/MC/ELF/prefalign-errors.s
@@ -1,5 +1,47 @@
-// RUN: not llvm-mc -filetype=asm -triple x86_64-pc-linux-gnu %s -o - 2>&1 | FileCheck %s
+# RUN: rm -fr %t && split-file %s %t && cd %t
+# RUN: not llvm-mc -triple=x86_64 a.s 2>&1 | FileCheck a.s
+# RUN: not llvm-mc -triple=x86_64 -filetype=obj b.s 2>&1 | FileCheck b.s
+# RUN: not llvm-mc -triple=x86_64 -filetype=obj c.s 2>&1 | FileCheck c.s
 
+#--- a.s
 .section .text.f1,"ax", at progbits
-// CHECK: {{.*}}.s:[[# @LINE+1]]:12: error: alignment must be a power of 2
+# CHECK: [[#@LINE+1]]:12: error: alignment must be a power of 2
 .prefalign 3
+
+# CHECK: [[#@LINE+1]]:13: error: expected ',' and end symbol
+.prefalign 4
+
+# CHECK: [[#@LINE+1]]:14: error: expected symbol name
+.prefalign 4,
+
+# CHECK: [[#@LINE+1]]:23: error: expected integer fill byte or 'nop'
+.prefalign 4,.text.f1,trap
+
+# CHECK: [[#@LINE+1]]:23: error: fill value must be in range [0, 255]
+.prefalign 4,.text.f1,256
+
+# CHECK: [[#@LINE+1]]:23: error: expected integer fill byte or 'nop'
+.prefalign 4,.text.f1,-1
+
+## Non-zero fill in a BSS section.
+.bss
+# CHECK: [[#@LINE+1]]:19: error: non-zero fill in BSS section '.bss'
+.prefalign 4,.Lend,1
+# CHECK: [[#@LINE+1]]:19: error: non-zero fill in BSS section '.bss'
+.prefalign 4,.Lend,nop
+.space 1
+.Lend:
+
+#--- b.s
+## End symbol is undefined.
+.section .text.f1,"ax", at progbits
+# CHECK: <unknown>:0: error: end symbol 'undef' must be a symbol in the current section
+.prefalign 4,undef,0
+
+#--- c.s
+## End symbol is defined in a different section.
+.section .text.f1,"ax", at progbits
+.prefalign 4,.Lend,0
+# CHECK: <unknown>:0: error: end symbol '.Lend' must be a symbol in the current section
+.section .text.f2,"ax", at progbits
+.Lend:
diff --git a/llvm/test/MC/ELF/prefalign.s b/llvm/test/MC/ELF/prefalign.s
index 803bb5d730340..7629d45657df6 100644
--- a/llvm/test/MC/ELF/prefalign.s
+++ b/llvm/test/MC/ELF/prefalign.s
@@ -1,104 +1,109 @@
-// RUN: llvm-mc -triple x86_64 %s -o - | FileCheck --check-prefix=ASM %s
-// RUN: llvm-mc -filetype=obj -triple x86_64 %s -o - | llvm-readelf -SW - | FileCheck --check-prefix=OBJ %s
+# RUN: llvm-mc -triple x86_64 %s -o - | FileCheck --check-prefix=ASM %s
+# RUN: llvm-mc -filetype=obj -triple x86_64 %s -o %t
+# RUN: llvm-readelf -SW %t | FileCheck --check-prefix=OBJ %s
+# RUN: llvm-objdump -d --no-show-raw-insn %t | FileCheck --check-prefix=DIS %s
 
-// Minimum alignment >= preferred alignment, no effect on sh_addralign.
-// ASM: .section .text.f1lt
-// ASM: .p2align 2
-// ASM: .prefalign 2 
-// OBJ: .text.f1lt        PROGBITS        0000000000000000 000040 000003 00  AX  0   0  4
-.section .text.f1lt,"ax", at progbits
+## MinAlign >= PrefAlign: the three-way rule is bounded by MinAlign regardless
+## of body size, so sh_addralign stays at MinAlign.
+# ASM: .section .text.f1
+# ASM: .p2align 2
+# ASM: .prefalign 2, .Lf1_end, 0
+# OBJ: .text.f1          PROGBITS        0000000000000000 {{[0-9a-f]+}} 000003 00  AX  0   0  4
+.section .text.f1,"ax", at progbits
 .p2align 2
-.prefalign 2
+.prefalign 2, .Lf1_end, 0
 .rept 3
-nop
+clc
 .endr
+.Lf1_end:
 
-// ASM: .section .text.f1eq
-// ASM: .p2align 2
-// ASM: .prefalign 2 
-// OBJ: .text.f1eq        PROGBITS        0000000000000000 000044 000004 00  AX  0   0  4
-.section .text.f1eq,"ax", at progbits
+## Multiple .prefalign on the same end symbol: effective PrefAlign is the maximum.
+# ASM: .section .text.f2
+# ASM: .prefalign 8, .Lf2_end, 0
+# ASM: .prefalign 16, .Lf2_end, 0
+# ASM: .prefalign 8, .Lf2_end, 0
+# OBJ: .text.f2          PROGBITS        0000000000000000 {{[0-9a-f]+}} 000009 00  AX  0   0 16
+.section .text.f2,"ax", at progbits
 .p2align 2
-.prefalign 2
-.rept 4
-nop
-.endr
-
-// ASM: .section .text.f1gt
-// ASM: .p2align 2
-// ASM: .prefalign 2 
-// OBJ: .text.f1gt        PROGBITS        0000000000000000 000048 000005 00  AX  0   0  4
-.section .text.f1gt,"ax", at progbits
-.p2align 2
-.prefalign 2
-.rept 5
-nop
+.prefalign 8, .Lf2_end, 0
+.prefalign 16, .Lf2_end, 0
+.prefalign 8, .Lf2_end, 0
+.rept 9
+clc
 .endr
+.Lf2_end:
 
-// Minimum alignment < preferred alignment, sh_addralign influenced by section size.
-// Use maximum of all .prefalign directives.
-// ASM: .section .text.f2lt
-// ASM: .p2align 2
-// ASM: .prefalign 8
-// ASM: .prefalign 16 
-// ASM: .prefalign 8
-// OBJ: .text.f2lt        PROGBITS        0000000000000000 000050 000003 00  AX  0   0  4
-.section .text.f2lt,"ax", at progbits
+## Multiple functions in a section, each with its own .prefalign.
+## nop fill; f3b's 5-byte padding is a NOP.
+## f3b: ComputedAlign=8,  padding=5
+## f3c: ComputedAlign=16, padding=0
+# ASM: .prefalign 16, .Lf3a_end, nop
+# ASM: .prefalign 16, .Lf3b_end, nop
+# ASM: .prefalign 16, .Lf3c_end, 204
+# OBJ: .text.f3          PROGBITS        0000000000000000 {{[0-9a-f]+}} 000020 00  AX  0   0 16
+# DIS: Disassembly of section .text.f3:
+# DIS:       0: clc
+# DIS-NEXT:  1: clc
+# DIS-NEXT:  2: clc
+# DIS-NEXT:  3: nopl
+# DIS-NEXT:  8: stc
+# DIS:       f: stc
+# DIS-NEXT: 10: clc
+# DIS:      1f: clc
+# DIS-EMPTY:
+.section .text.f3,"ax", at progbits
 .p2align 2
-.prefalign 8
-.prefalign 16
-.prefalign 8
+.prefalign 16, .Lf3a_end, nop
 .rept 3
-nop
+clc
 .endr
-
-// ASM: .section .text.f2between1
-// OBJ: .text.f2between1  PROGBITS        0000000000000000 000054 000008 00  AX  0   0  8
-.section .text.f2between1,"ax", at progbits
-.p2align 2
-.prefalign 8
-.prefalign 16
-.prefalign 8
+.Lf3a_end:
+.prefalign 16, .Lf3b_end, nop
 .rept 8
-nop
-.endr
-
-// OBJ: .text.f2between2  PROGBITS        0000000000000000 00005c 000009 00  AX  0   0 16
-.section .text.f2between2,"ax", at progbits
-.p2align 2
-.prefalign 8
-.prefalign 16
-.prefalign 8
-.rept 9
-nop
+stc
 .endr
-
-// OBJ: .text.f2between3  PROGBITS        0000000000000000 000068 000010 00  AX  0   0 16
-.section .text.f2between3,"ax", at progbits
-.p2align 2
-.prefalign 8
-.prefalign 16
-.prefalign 8
+.Lf3b_end:
+.prefalign 16, .Lf3c_end, 0xcc
 .rept 16
-nop
+clc
 .endr
+.Lf3c_end:
+## No-op prefalign
+.prefalign 16, .Lf3d_end, 0xcc
+.Lf3d_end:
+.prefalign 16, .Lf3a_end, 0xcc
 
-// OBJ: .text.f2gt1       PROGBITS        0000000000000000 000078 000011 00  AX  0   0 16
-.section .text.f2gt1,"ax", at progbits
+## Two functions in one section where the second function's padding depends on
+## the first function's size.
+# OBJ: .text.f4          PROGBITS        0000000000000000 {{[0-9a-f]+}} 00001e 00  AX  0   0 16
+# DIS: Disassembly of section .text.f4:
+# DIS:       0: pushq
+# DIS:       7: retq
+# DIS-NEXT:  8: nopl
+# DIS-NEXT: 10: movl
+# DIS:      1d: retq
+# DIS-EMPTY:
+.section .text.f4,"ax", at progbits
 .p2align 2
-.prefalign 8
-.prefalign 16
-.prefalign 8
-.rept 17
-nop
-.endr
+.prefalign 16, .Lf4a_end, nop
+pushq %rbp
+movq %rsp, %rbp
+xorl %eax, %eax
+popq %rbp
+retq
+.Lf4a_end:
+.prefalign 16, .Lf4b_end, nop
+movl $0, 0
+xorl %eax, %eax
+retq
+.Lf4b_end:
 
-// OBJ: .text.f2gt2       PROGBITS        0000000000000000 00008c 000021 00  AX  0   0 16
-.section .text.f2gt2,"ax", at progbits
+## .prefalign in a BSS section with zero fill.
+# ASM: .bss
+# ASM: .prefalign 16, .Lbss_end, 0
+# OBJ: .bss              NOBITS          0000000000000000 {{[0-9a-f]+}} 000004 00  WA  0   0  4
+.bss
 .p2align 2
-.prefalign 8
-.prefalign 16
-.prefalign 8
-.rept 33
-nop
-.endr
+.prefalign 16, .Lbss_end, 0
+.space 4
+.Lbss_end:
diff --git a/llvm/test/MC/RISCV/prefalign.s b/llvm/test/MC/RISCV/prefalign.s
new file mode 100644
index 0000000000000..0e48a953707fb
--- /dev/null
+++ b/llvm/test/MC/RISCV/prefalign.s
@@ -0,0 +1,34 @@
+# RUN: llvm-mc -filetype=obj -triple riscv64 -mattr=+relax %s -o %t
+# RUN: llvm-readelf -SW %t | FileCheck --check-prefix=OBJ %s
+# RUN: llvm-objdump -d -M no-aliases --no-show-raw-insn %t | FileCheck --check-prefix=DIS %s
+# RUN: llvm-readobj -r %t | FileCheck --check-prefix=RELOC %s
+
+## Two functions in one section with nop fill.
+## f1: body = 12 bytes < 16, ComputedAlign=16, but section start is 16-aligned
+##     so pad = 0
+## f2: body = 32 bytes >= 16, ComputedAlign=16, pad = 4 (one nop at 0xc)
+# OBJ: .text.f1 PROGBITS {{[0-9a-f]+}} {{[0-9a-f]+}} 000030 00 AX 0 0 16
+# DIS:       0: addi a0, zero, 0x1
+# DIS-NEXT:  4: addi a0, zero, 0x2
+# DIS-NEXT:  8: add a0, a0, a1
+## Padding nop for f2
+# DIS-NEXT:  c: addi zero, zero, 0x0
+## f2 starts at 0x10, aligned to 16
+# DIS-NEXT: 10: add a0, a0, a1
+.section .text.f1,"ax", at progbits
+.p2align 2
+.prefalign 16, .Lf1_end, nop
+addi a0, zero, 1
+addi a0, zero, 2
+add a0, a0, a1
+.Lf1_end:
+.prefalign 16, .Lf2_end, nop
+.rept 8
+add a0, a0, a1
+.endr
+.Lf2_end:
+
+## .prefalign does not emit R_RISCV_ALIGN relocations. The padding is fully
+## resolved at assembly time, so no linker adjustment is needed.
+# RELOC: Relocations [
+# RELOC-NEXT: ]

>From c05329dad2b8568cee018bdf1d4218826ec4cfc8 Mon Sep 17 00:00:00 2001
From: Fangrui Song <i at maskray.me>
Date: Sun, 1 Mar 2026 12:18:33 -0800
Subject: [PATCH 2/3] coverage

---
 llvm/test/MC/ELF/prefalign.s | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/llvm/test/MC/ELF/prefalign.s b/llvm/test/MC/ELF/prefalign.s
index 7629d45657df6..57ecc64781646 100644
--- a/llvm/test/MC/ELF/prefalign.s
+++ b/llvm/test/MC/ELF/prefalign.s
@@ -98,6 +98,16 @@ xorl %eax, %eax
 retq
 .Lf4b_end:
 
+## sh_addralign stays at 32, not downgraded by .prefalign.
+# OBJ: .text.f5          PROGBITS        0000000000000000 {{[0-9a-f]+}} 000003 00  AX  0   0 32
+.section .text.f5,"ax", at progbits
+.p2align 5
+.prefalign 16, .Lf5_end, 0
+.rept 3
+clc
+.endr
+.Lf5_end:
+
 ## .prefalign in a BSS section with zero fill.
 # ASM: .bss
 # ASM: .prefalign 16, .Lbss_end, 0

>From 66d5257d5598b5dd7939af917b6d65f88a49736e Mon Sep 17 00:00:00 2001
From: Fangrui Song <i at maskray.me>
Date: Mon, 2 Mar 2026 22:05:16 -0800
Subject: [PATCH 3/3] fix quadratic convergence issue

---
 llvm/include/llvm/MC/MCAssembler.h       |  2 +-
 llvm/lib/MC/MCAssembler.cpp              | 53 +++++++++------
 llvm/test/MC/ELF/prefalign-convergence.s | 86 ++++++++++++++++++++++++
 3 files changed, 121 insertions(+), 20 deletions(-)
 create mode 100644 llvm/test/MC/ELF/prefalign-convergence.s

diff --git a/llvm/include/llvm/MC/MCAssembler.h b/llvm/include/llvm/MC/MCAssembler.h
index a7b865cb16b81..dad3bf01c9feb 100644
--- a/llvm/include/llvm/MC/MCAssembler.h
+++ b/llvm/include/llvm/MC/MCAssembler.h
@@ -112,7 +112,7 @@ class MCAssembler {
   void relaxInstruction(MCFragment &F);
   void relaxLEB(MCFragment &F);
   void relaxBoundaryAlign(MCBoundaryAlignFragment &BF);
-  void relaxPrefAlign(MCFragment &F);
+  void layoutPrefAlign(MCFragment &F, uint64_t RawStart);
   void relaxDwarfLineAddr(MCFragment &F);
   void relaxDwarfCallFrameFragment(MCFragment &F);
   void relaxSFrameFragment(MCFragment &DF);
diff --git a/llvm/lib/MC/MCAssembler.cpp b/llvm/lib/MC/MCAssembler.cpp
index f6f64a6f64f3d..7a98616d3fb5e 100644
--- a/llvm/lib/MC/MCAssembler.cpp
+++ b/llvm/lib/MC/MCAssembler.cpp
@@ -908,32 +908,34 @@ void MCAssembler::relaxBoundaryAlign(MCBoundaryAlignFragment &BF) {
   BF.setSize(NewSize);
 }
 
-void MCAssembler::relaxPrefAlign(MCFragment &F) {
+// Compute the body size by walking forward from F to the End symbol and
+// summing fragment sizes. This avoids depending on stale layout offsets.
+void MCAssembler::layoutPrefAlign(MCFragment &F, uint64_t RawStart) {
   const MCSymbol &End = F.getPrefAlignEnd();
   if (!End.getFragment() || End.getFragment()->getParent() != F.getParent()) {
     recordError(SMLoc(), "end symbol '" + End.getName() +
                              "' must be a symbol in the current section");
     return;
   }
-  uint64_t EndOffset;
-  if (!getSymbolOffset(End, EndOffset))
+  const MCFragment *EndFrag = End.getFragment();
+  if (EndFrag->getLayoutOrder() <= F.getLayoutOrder())
     return;
-  // RawStart is the start of the (variable) padding region; StartOffset is
-  // the start of the body (RawStart plus current padding). BodySize is
-  // measured from StartOffset, not RawStart, so that padding is not counted
-  // as part of the body.
-  uint64_t RawStart = F.Offset + F.getFixedSize();
-  uint64_t StartOffset = RawStart + F.getVarSize();
+  uint64_t BodySize = 0;
+  for (const MCFragment *Cur = F.getNext();; Cur = Cur->getNext()) {
+    if (Cur == EndFrag) {
+      BodySize += End.getOffset();
+      break;
+    }
+    BodySize += computeFragmentSize(*Cur);
+  }
   Align NewAlign;
-  if (StartOffset < EndOffset) {
-    uint64_t BodySize = EndOffset - StartOffset;
+  if (BodySize) {
     if (BodySize < F.getPrefAlignPreferred().value())
       NewAlign = Align(NextPowerOf2(BodySize - 1));
     else
       NewAlign = F.getPrefAlignPreferred();
   }
   F.setPrefAlignComputed(NewAlign);
-  // Compute padding to align the body start to NewAlign.
   uint64_t NewPadSize = offsetToAlignment(RawStart, NewAlign);
   F.VarContentStart = F.getFixedSize();
   F.VarContentEnd = F.VarContentStart + NewPadSize;
@@ -1020,7 +1022,7 @@ bool MCAssembler::relaxFragment(MCFragment &F) {
     relaxBoundaryAlign(static_cast<MCBoundaryAlignFragment &>(F));
     break;
   case MCFragment::FT_PrefAlign:
-    relaxPrefAlign(F);
+    layoutPrefAlign(F, F.Offset + F.getFixedSize());
     break;
   case MCFragment::FT_CVInlineLines:
     getContext().getCVContext().encodeInlineLineTable(
@@ -1037,11 +1039,16 @@ bool MCAssembler::relaxFragment(MCFragment &F) {
   return computeFragmentSize(F) != Size;
 }
 
+// Assign offsets to fragments. While most fragments are relaxed by
+// relaxFragment, alignment fragments are exceptions: their padding
+// depend on the current offset. If computed in relaxFragment,
+// the offset comes from F.Offset set by the previous layoutSection call.
+// When an upstream alignment fragment changes padding, F.Offset becomes
+// stale, causing each relaxOnce iteration to fix only one more fragment
+// — O(N) iterations for N alignment fragments. Computing them here with
+// the tracked Offset avoids this.
 void MCAssembler::layoutSection(MCSection &Sec) {
   uint64_t Offset = 0;
-  // Note: fragments are not relaxed here. Some fragments depend on
-  // downstream symbols whose offsets have not been set in this pass yet.
-  // They are instead relaxed by relaxFragment.
   for (MCFragment &F : Sec) {
     F.Offset = Offset;
     if (F.getKind() == MCFragment::FT_Align) {
@@ -1067,6 +1074,10 @@ void MCAssembler::layoutSection(MCSection &Sec) {
       if (F.VarContentEnd > F.getParent()->ContentStorage.size())
         F.getParent()->ContentStorage.resize(F.VarContentEnd);
       Offset += Size;
+    } else if (F.getKind() == MCFragment::FT_PrefAlign) {
+      Offset += F.getFixedSize();
+      layoutPrefAlign(F, Offset);
+      Offset += F.getVarSize();
     } else {
       Offset += computeFragmentSize(F);
     }
@@ -1074,7 +1085,7 @@ void MCAssembler::layoutSection(MCSection &Sec) {
 }
 
 unsigned MCAssembler::relaxOnce(unsigned FirstStable) {
-  ++stats::RelaxationSteps;
+  uint64_t MaxIterations = 0;
   PendingErrors.clear();
 
   unsigned Res = 0;
@@ -1082,8 +1093,10 @@ unsigned MCAssembler::relaxOnce(unsigned FirstStable) {
     // Assume each iteration finalizes at least one extra fragment. If the
     // layout does not converge after N+1 iterations, bail out.
     auto &Sec = *Sections[I];
-    auto MaxIter = Sec.curFragList()->Tail->getLayoutOrder() + 1;
+    auto Limit = Sec.curFragList()->Tail->getLayoutOrder() + 1;
+    auto MaxIter = Limit;
     for (;;) {
+      --MaxIter;
       bool Changed = false;
       for (MCFragment &F : Sec)
         if (F.getKind() != MCFragment::FT_Data && relaxFragment(F))
@@ -1095,11 +1108,13 @@ unsigned MCAssembler::relaxOnce(unsigned FirstStable) {
       // sections. Therefore, we must re-evaluate all sections.
       FirstStable = Sections.size();
       Res = I;
-      if (--MaxIter == 0)
+      if (MaxIter == 0)
         break;
       layoutSection(Sec);
     }
+    MaxIterations = std::max(MaxIterations, uint64_t(Limit - MaxIter));
   }
+  stats::RelaxationSteps += MaxIterations;
   // The subsequent relaxOnce call only needs to visit Sections [0,Res) if no
   // change occurred.
   return Res;
diff --git a/llvm/test/MC/ELF/prefalign-convergence.s b/llvm/test/MC/ELF/prefalign-convergence.s
new file mode 100644
index 0000000000000..3debc4210a0ef
--- /dev/null
+++ b/llvm/test/MC/ELF/prefalign-convergence.s
@@ -0,0 +1,86 @@
+// REQUIRES: asserts
+// Test that sections with many .prefalign fragments converge in a small
+// number of relaxation steps (not O(N) steps). Without the layoutSection
+// fix, each relaxOnce inner iteration would only correctly resolve one
+// PrefAlign fragment (because subsequent fragments see stale offsets),
+// leading to O(N) iterations. With the fix, layoutSection recomputes all
+// PrefAlign fragments using the tracked offset, converging in 1 iteration.
+
+// RUN: llvm-mc -filetype=obj -triple x86_64 --stats %s -o %t 2>&1 \
+// RUN:   | FileCheck %s
+// CHECK: 1 assembler - Number of assembler layout and relaxation steps
+
+// RUN: llvm-objdump -d --no-show-raw-insn %t | FileCheck --check-prefix=DIS %s
+
+.section .text,"ax", at progbits
+.byte 0
+
+// DIS:       8: nop
+.prefalign 16, .Lend0, nop
+.rept 5
+nop
+.endr
+.Lend0:
+
+// DIS:      10: nop
+.prefalign 16, .Lend1, nop
+.rept 5
+nop
+.endr
+.Lend1:
+
+// DIS:      18: nop
+.prefalign 16, .Lend2, nop
+.rept 5
+nop
+.endr
+.Lend2:
+
+// DIS:      20: nop
+.prefalign 16, .Lend3, nop
+.rept 5
+nop
+.endr
+.Lend3:
+
+// DIS:      28: nop
+.prefalign 16, .Lend4, nop
+.rept 5
+nop
+.endr
+.Lend4:
+
+// DIS:      30: nop
+.prefalign 16, .Lend5, nop
+.rept 5
+nop
+.endr
+.Lend5:
+
+// DIS:      38: nop
+.prefalign 16, .Lend6, nop
+.rept 5
+nop
+.endr
+.Lend6:
+
+// DIS:      40: nop
+.prefalign 16, .Lend7, nop
+.rept 5
+nop
+.endr
+.Lend7:
+
+// DIS:      48: nop
+.prefalign 16, .Lend8, nop
+.rept 5
+nop
+.endr
+.Lend8:
+
+// DIS:      50: nop
+.prefalign 16, .Lend9, nop
+.rept 5
+nop
+.endr
+.Lend9: