[flang-commits] [flang] 193d7a6 - [MC, CodeGen] Update .prefalign for symbol-based preferred alignment (#184032)

Fri Apr 10 23:16:51 PDT 2026

Author: Fangrui Song
Date: 2026-04-11T06:16:43Z
New Revision: 193d7a6ace9f7016e03c90c63626fdb7fdf99bc0

URL: https://github.com/llvm/llvm-project/commit/193d7a6ace9f7016e03c90c63626fdb7fdf99bc0
DIFF: https://github.com/llvm/llvm-project/commit/193d7a6ace9f7016e03c90c63626fdb7fdf99bc0.diff

LOG: [MC,CodeGen] Update .prefalign for symbol-based preferred alignment (#184032)

https://discourse.llvm.org/t/rfc-enhancing-function-alignment-attributes/88019/17
The recently-introduced .prefalign only worked when each function was in
its own section (-ffunction-sections), because the section size gave the
function body size needed for the alignment rule.

This led to -ffunction-sections and -fno-function-sections AsmPrinter
differences (#155529), which is rather unusual.

This patch fixes this AsmPrinter difference by extending .prefalign to
accept an end symbol and a required fill operand:

    .prefalign <log2_align>, <end_sym>, nop
    .prefalign <log2_align>, <end_sym>, <fill_byte>

The first operand is a log2 alignment value (e.g. 4 means 16-byte
alignment). The body size (end_sym_offset - start_offset) determines the
alignment:

    body_size < pref_align   => ComputedAlign = std::bit_ceil(body_size)
    body_size >= pref_align  => ComputedAlign = pref_align

To also enforce a minimum alignment, emit a .p2align before .prefalign.

The fill operand is required: `nop` generates target-appropriate NOP
instructions via writeNopData, while an integer in [0,255] fills the
padding with that byte value.

Initialize MCSection::CurFragList to nullptr and add a null check
to skip ELFObjectWriter-created sections like .strtab/.symtab
that never receive changeSection calls.

relaxPrefAlign is called in both layoutSection and relaxFragment.
The layoutSection call ensures correct initial padding before
relaxOnce, and is also needed for the post-finishLayout re-layout
where relaxOnce is not used. relaxPrefAlign walks forward to the
end symbol to compute BodySize (summing fragment sizes), avoiding
dependence on stale downstream symbol offsets.

Added: 
    llvm/test/MC/ELF/prefalign-convergence.s
    llvm/test/MC/RISCV/prefalign.s

Modified: 
    clang/test/Misc/noexecstack.c
    flang/test/Driver/save-mlir-temps.f90
    llvm/docs/Extensions.rst
    llvm/include/llvm/MC/MCAssembler.h
    llvm/include/llvm/MC/MCObjectStreamer.h
    llvm/include/llvm/MC/MCSection.h
    llvm/include/llvm/MC/MCStreamer.h
    llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
    llvm/lib/MC/ELFObjectWriter.cpp
    llvm/lib/MC/MCAsmStreamer.cpp
    llvm/lib/MC/MCAssembler.cpp
    llvm/lib/MC/MCFragment.cpp
    llvm/lib/MC/MCObjectStreamer.cpp
    llvm/lib/MC/MCParser/AsmParser.cpp
    llvm/lib/MC/MCSection.cpp
    llvm/lib/MC/MCStreamer.cpp
    llvm/test/CodeGen/AArch64/preferred-function-alignment.ll
    llvm/test/CodeGen/ARM/preferred-function-alignment.ll
    llvm/test/CodeGen/LoongArch/linker-relaxation.ll
    llvm/test/CodeGen/PowerPC/code-align.ll
    llvm/test/CodeGen/PowerPC/ppc64-calls.ll
    llvm/test/CodeGen/SystemZ/vec-perm-14.ll
    llvm/test/CodeGen/X86/eh-label.ll
    llvm/test/CodeGen/X86/empty-function.ll
    llvm/test/CodeGen/X86/kcfi-arity.ll
    llvm/test/CodeGen/X86/kcfi-patchable-function-prefix.ll
    llvm/test/CodeGen/X86/kcfi.ll
    llvm/test/CodeGen/X86/prefalign.ll
    llvm/test/CodeGen/X86/statepoint-invoke.ll
    llvm/test/DebugInfo/KeyInstructions/X86/dwarf-basic.ll
    llvm/test/DebugInfo/LoongArch/relax_dwo_ranges.ll
    llvm/test/DebugInfo/X86/header.ll
    llvm/test/DebugInfo/X86/ranges_always.ll
    llvm/test/MC/ELF/prefalign-errors.s
    llvm/test/MC/ELF/prefalign.s
    llvm/test/tools/llvm-nm/X86/demangle.ll
    llvm/test/tools/llvm-objdump/X86/source-interleave-function-from-debug.test

Removed: 
    


################################################################################
diff  --git a/clang/test/Misc/noexecstack.c b/clang/test/Misc/noexecstack.c
index 1af0c76d0207e..9a3ec98b75d2e 100644

--- a/clang/test/Misc/noexecstack.c
+++ b/clang/test/Misc/noexecstack.c
@@ -10,7 +10,7 @@
 // RUN: echo "nop" | %clang -cc1as -triple x86_64 - -filetype obj -o %t.o
 // RUN: llvm-readelf -S %t.o | FileCheck --check-prefix=NOSTACK %s
 
-// CHECK: .text             PROGBITS        0000000000000000 {{[0-9a-f]+}} 000001 00  AX  0   0 16
+// CHECK: .text             PROGBITS        0000000000000000 {{[0-9a-f]+}} 000001 00  AX  0   0  4
 // CHECK: .note.GNU-stack   PROGBITS        0000000000000000 {{[0-9a-f]+}} 000000 00      0   0  1
 
 // NOSTACK-NOT: .note.GNU-stack

diff  --git a/flang/test/Driver/save-mlir-temps.f90 b/flang/test/Driver/save-mlir-temps.f90
index e9478a6c521b2..69e5af79a7b1c 100644
--- a/flang/test/Driver/save-mlir-temps.f90
+++ b/flang/test/Driver/save-mlir-temps.f90
@@ -9,6 +9,8 @@
 ! However, calling an external assembler on arm64 Macs fails, because it's
 ! currently being invoked with the `-Q` flag, that is not supported on arm64.
 ! UNSUPPORTED: system-windows, system-darwin
+! TODO Remove after -fno-integrated-as properly sets DisableIntegratedAS
+! XFAIL: *
 
 !--------------------------
 ! Invalid output directory

diff  --git a/llvm/docs/Extensions.rst b/llvm/docs/Extensions.rst
index c8de7f59de5c0..b7da10d887035 100644
--- a/llvm/docs/Extensions.rst
+++ b/llvm/docs/Extensions.rst
@@ -31,17 +31,28 @@ hexadecimal format instead of decimal if desired.
 ``.prefalign`` directive
 ------------------------
 
-The ``.prefalign`` directive sets the preferred alignment for a section,
-and enables the section's final alignment to be set in a way that is
-dependent on the section size (currently only supported with ELF).
-
-If the section size is less than the section's minimum alignment as
-determined using ``.align`` family directives, the section's alignment
-will be equal to its minimum alignment. Otherwise, if the section size is
-between the minimum alignment and the preferred alignment, the section's
-alignment will be equal to the power of 2 greater than or equal to the
-section size. Otherwise, the section's alignment will be equal to the
-preferred alignment.
+.. code-block:: gas
+
+  .prefalign <log2_align>, <end_sym>, nop
+  .prefalign <log2_align>, <end_sym>, <fill_byte>
+
+The ``.prefalign`` directive pads the current location so that the code
+between the directive and ``end_sym`` starts at an alignment that depends
+on the size of that code (currently only supported with ELF).
+``log2_align`` specifies the preferred alignment as a power-of-2 exponent
+(e.g. 4 means 16-byte alignment). ``end_sym`` must be a symbol defined in
+the same section. The fill operand is required: ``nop`` fills the padding
+with target-appropriate NOP instructions, while an integer in ``[0, 255]``
+fills the padding with that byte value.
+
+The alignment is determined by the *body_size* (the number of bytes between
+the padded start and ``end_sym``), with *pref_align* = 2^\ *log2_align*:
+
+- If *body_size* < *pref_align*: align to the smallest power of 2
+  greater than or equal to *body_size*.
+- If *body_size* ≥ *pref_align*: align to *pref_align*.
+
+To also enforce a minimum alignment, emit a ``.p2align`` before ``.prefalign``.
 
 Machine-specific Assembly Syntax
 ================================

diff  --git a/llvm/include/llvm/MC/MCAssembler.h b/llvm/include/llvm/MC/MCAssembler.h
index 7f4a1433e1c2a..22f8ebde88756 100644
--- a/llvm/include/llvm/MC/MCAssembler.h
+++ b/llvm/include/llvm/MC/MCAssembler.h
@@ -114,6 +114,7 @@ class MCAssembler {
   /// Perform relaxation on a single fragment.
   void relaxFragment(MCFragment &F);
   void relaxAlign(MCFragment &F);
+  void relaxPrefAlign(MCFragment &F);
   void relaxInstruction(MCFragment &F);
   void relaxLEB(MCFragment &F);
   void relaxBoundaryAlign(MCBoundaryAlignFragment &BF);

diff  --git a/llvm/include/llvm/MC/MCObjectStreamer.h b/llvm/include/llvm/MC/MCObjectStreamer.h
index 5fc17b2b383b1..cb2694b231d5b 100644
--- a/llvm/include/llvm/MC/MCObjectStreamer.h
+++ b/llvm/include/llvm/MC/MCObjectStreamer.h
@@ -139,7 +139,8 @@ class LLVM_ABI MCObjectStreamer : public MCStreamer {
                             unsigned MaxBytesToEmit = 0) override;
   void emitCodeAlignment(Align ByteAlignment, const MCSubtargetInfo *STI,
                          unsigned MaxBytesToEmit = 0) override;
-  void emitPrefAlign(Align Alignment) override;
+  void emitPrefAlign(Align Alignment, const MCSymbol &End, bool EmitNops,
+                     uint8_t Fill, const MCSubtargetInfo &STI) override;
   void emitValueToOffset(const MCExpr *Offset, unsigned char Value,
                          SMLoc Loc) override;
   void emitDwarfLocDirective(unsigned FileNo, unsigned Line, unsigned Column,

diff  --git a/llvm/include/llvm/MC/MCSection.h b/llvm/include/llvm/MC/MCSection.h
index 4c36ed567de62..82bfa41c9215b 100644
--- a/llvm/include/llvm/MC/MCSection.h
+++ b/llvm/include/llvm/MC/MCSection.h
@@ -53,6 +53,7 @@ class MCFragment {
     FT_Data,
     FT_Relaxable,
     FT_Align,
+    FT_PrefAlign,
     FT_Fill,
     FT_LEB,
     FT_Nops,
@@ -132,6 +133,19 @@ class MCFragment {
       // Value to use for filling padding bytes.
       int64_t Fill;
     } align;
+    struct {
+      // Symbol denoting the end of the region; always non-null.
+      const MCSymbol *End;
+      // The preferred (maximum) alignment.
+      Align PreferredAlign;
+      // The alignment computed during relaxation.
+      Align ComputedAlign;
+      // If true, fill padding with target NOPs via writeNopData; the STI field
+      // holds the subtarget info needed.  If false, fill with Fill byte.
+      bool EmitNops;
+      // Fill byte used when !EmitNops.
+      uint8_t Fill;
+    } prefalign;
     struct {
       // True if this is a sleb128, false if uleb128.
       bool IsSigned;
@@ -268,6 +282,45 @@ class MCFragment {
     return u.align.EmitNops;
   }
 
+  //== FT_PrefAlign functions
+  // Initialize an FT_PrefAlign fragment. The region starts at this fragment and
+  // ends at \p End. ComputedAlign is set during relaxation:
+  //   body_size < PrefAlign  => ComputedAlign = std::bit_ceil(body_size)
+  //   body_size >= PrefAlign => ComputedAlign = PrefAlign
+  void makePrefAlign(Align PrefAlign, const MCSymbol &End, bool EmitNops,
+                     uint8_t Fill) {
+    Kind = FT_PrefAlign;
+    u.prefalign.End = &End;
+    u.prefalign.PreferredAlign = PrefAlign;
+    u.prefalign.ComputedAlign = Align();
+    u.prefalign.EmitNops = EmitNops;
+    u.prefalign.Fill = Fill;
+  }
+  const MCSymbol &getPrefAlignEnd() const {
+    assert(Kind == FT_PrefAlign);
+    return *u.prefalign.End;
+  }
+  Align getPrefAlignPreferred() const {
+    assert(Kind == FT_PrefAlign);
+    return u.prefalign.PreferredAlign;
+  }
+  Align getPrefAlignComputed() const {
+    assert(Kind == FT_PrefAlign);
+    return u.prefalign.ComputedAlign;
+  }
+  void setPrefAlignComputed(Align A) {
+    assert(Kind == FT_PrefAlign);
+    u.prefalign.ComputedAlign = A;
+  }
+  bool getPrefAlignEmitNops() const {
+    assert(Kind == FT_PrefAlign);
+    return u.prefalign.EmitNops;
+  }
+  uint8_t getPrefAlignFill() const {
+    assert(Kind == FT_PrefAlign);
+    return u.prefalign.Fill;
+  }
+
   //== FT_LEB functions
   void makeLEB(bool IsSigned, const MCExpr *Value) {
     assert(Kind == FT_Data);
@@ -538,14 +591,14 @@ class LLVM_ABI MCSection {
 private:
   // At parse time, this holds the fragment list of the current subsection. At
   // layout time, this holds the concatenated fragment lists of all subsections.
-  FragList *CurFragList;
+  // Null until the first fragment is added to this section.
+  FragList *CurFragList = nullptr;
   // In many object file formats, this denotes the section symbol. In Mach-O,
   // this denotes an optional temporary label at the section start.
   MCSymbol *Begin;
   MCSymbol *End = nullptr;
   /// The alignment requirement of this section.
   Align Alignment;
-  MaybeAlign PreferredAlignment;
   /// The section index in the assemblers section list.
   unsigned Ordinal = 0;
   // If not -1u, the first linker-relaxable fragment's order within the
@@ -606,19 +659,6 @@ class LLVM_ABI MCSection {
       Alignment = MinAlignment;
   }
 
-  Align getPreferredAlignment() const {
-    if (!PreferredAlignment || Alignment > *PreferredAlignment)
-      return Alignment;
-    return *PreferredAlignment;
-  }
-
-  void ensurePreferredAlignment(Align PrefAlign) {
-    if (!PreferredAlignment || PrefAlign > *PreferredAlignment)
-      PreferredAlignment = PrefAlign;
-  }
-
-  Align getAlignmentForObjectFile(uint64_t Size) const;
-
   unsigned getOrdinal() const { return Ordinal; }
   void setOrdinal(unsigned Value) { Ordinal = Value; }
 

diff  --git a/llvm/include/llvm/MC/MCStreamer.h b/llvm/include/llvm/MC/MCStreamer.h
index 0a27eb7b104d4..535e9788d56dd 100644
--- a/llvm/include/llvm/MC/MCStreamer.h
+++ b/llvm/include/llvm/MC/MCStreamer.h
@@ -846,7 +846,8 @@ class LLVM_ABI MCStreamer {
   virtual void emitCodeAlignment(Align Alignment, const MCSubtargetInfo *STI,
                                  unsigned MaxBytesToEmit = 0);
 
-  virtual void emitPrefAlign(Align A);
+  virtual void emitPrefAlign(Align A, const MCSymbol &End, bool EmitNops,
+                             uint8_t Fill, const MCSubtargetInfo &STI);
 
   /// Emit some number of copies of \p Value until the byte offset \p
   /// Offset is reached.

diff  --git a/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp b/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
index cb4e4ea8a3177..af67d04abef2b 100644
--- a/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
@@ -1047,19 +1047,20 @@ void AsmPrinter::emitFunctionHeader() {
 
   emitLinkage(&F, CurrentFnSym);
   if (MAI->hasFunctionAlignment()) {
-    // Make sure that the preferred alignment directive (.prefalign) is
-    // supported before using it. The preferred alignment directive will not
-    // have the intended effect unless function sections are enabled, so check
-    // for that as well.
+    Align PrefAlign = MF->getPreferredAlignment();
+    // Use .prefalign when the integrated assembler supports it and the target
+    // has a preferred alignment distinct from the minimum. The end symbol must
+    // be created here, before the function body, so that .prefalign can
+    // reference it; emitFunctionBody will emit the label at the function end.
     if (MAI->useIntegratedAssembler() && MAI->hasPreferredAlignment() &&
-        TM.getFunctionSections()) {
-      Align Alignment = MF->getAlignment();
-      Align PrefAlignment = MF->getPreferredAlignment();
-      emitAlignment(Alignment, &F);
-      if (Alignment != PrefAlignment)
-        OutStreamer->emitPrefAlign(PrefAlignment);
+        MF->getAlignment() != PrefAlign) {
+      emitAlignment(MF->getAlignment(), &F);
+      CurrentFnEnd = createTempSymbol("func_end");
+      OutStreamer->emitPrefAlign(PrefAlign, *CurrentFnEnd,
+                                 /*EmitNops=*/true, /*Fill=*/0,
+                                 getSubtargetInfo());
     } else {
-      emitAlignment(MF->getPreferredAlignment(), &F);
+      emitAlignment(PrefAlign, &F);
     }
   }
 
@@ -2466,9 +2467,11 @@ void AsmPrinter::emitFunctionBody() {
   // SPIR-V supports label instructions only inside a block, not after the
   // function body.
   if (TT.getObjectFormat() != Triple::SPIRV &&
-      (EmitFunctionSize || needFuncLabels(*MF, *this))) {
-    // Create a symbol for the end of function.
-    CurrentFnEnd = createTempSymbol("func_end");
+      (EmitFunctionSize || needFuncLabels(*MF, *this) || CurrentFnEnd)) {
+    // Create a symbol for the end of function, if not already pre-created
+    // (e.g. for .prefalign directive).
+    if (!CurrentFnEnd)
+      CurrentFnEnd = createTempSymbol("func_end");
     OutStreamer->emitLabel(CurrentFnEnd);
   }
 
@@ -3222,6 +3225,7 @@ void AsmPrinter::SetupMachineFunction(MachineFunction &MF) {
   CurrentFnSymForSize = CurrentFnSym;
   CurrentFnBegin = nullptr;
   CurrentFnBeginLocal = nullptr;
+  CurrentFnEnd = nullptr;
   CurrentSectionBeginSym = nullptr;
   CurrentFnCallsiteEndSymbols.clear();
   MBBSectionRanges.clear();

diff  --git a/llvm/lib/MC/ELFObjectWriter.cpp b/llvm/lib/MC/ELFObjectWriter.cpp
index b0c38797e4b34..c389ec9502510 100644
--- a/llvm/lib/MC/ELFObjectWriter.cpp
+++ b/llvm/lib/MC/ELFObjectWriter.cpp
@@ -911,10 +911,10 @@ void ELFWriter::writeSectionHeader(uint32_t GroupSymbolIndex, uint64_t Offset,
       sh_link = Sym->getSection().getOrdinal();
   }
 
-  writeSectionHeaderEntry(
-      StrTabBuilder.getOffset(Section.getName()), Section.getType(),
-      Section.getFlags(), 0, Offset, Size, sh_link, sh_info,
-      Section.getAlignmentForObjectFile(Size), Section.getEntrySize());
+  writeSectionHeaderEntry(StrTabBuilder.getOffset(Section.getName()),
+                          Section.getType(), Section.getFlags(), 0, Offset,
+                          Size, sh_link, sh_info, Section.getAlign(),
+                          Section.getEntrySize());
 }
 
 void ELFWriter::writeSectionHeaders() {

diff  --git a/llvm/lib/MC/MCAsmStreamer.cpp b/llvm/lib/MC/MCAsmStreamer.cpp
index 76510ffe4717b..5c72a883c062e 100644
--- a/llvm/lib/MC/MCAsmStreamer.cpp
+++ b/llvm/lib/MC/MCAsmStreamer.cpp
@@ -286,7 +286,8 @@ class MCAsmStreamer final : public MCAsmBaseStreamer {
 
   void emitCodeAlignment(Align Alignment, const MCSubtargetInfo *STI,
                          unsigned MaxBytesToEmit = 0) override;
-  void emitPrefAlign(Align Alignment) override;
+  void emitPrefAlign(Align Alignment, const MCSymbol &End, bool EmitNops,
+                     uint8_t Fill, const MCSubtargetInfo &STI) override;
 
   void emitValueToOffset(const MCExpr *Offset,
                          unsigned char Value,
@@ -1566,8 +1567,15 @@ void MCAsmStreamer::emitCodeAlignment(Align Alignment,
     emitAlignmentDirective(Alignment.value(), std::nullopt, 1, MaxBytesToEmit);
 }
 
-void MCAsmStreamer::emitPrefAlign(Align Alignment) {
-  OS << "\t.prefalign\t" << Alignment.value();
+void MCAsmStreamer::emitPrefAlign(Align Alignment, const MCSymbol &End,
+                                  bool EmitNops, uint8_t Fill,
+                                  const MCSubtargetInfo &) {
+  OS << "\t.prefalign\t" << Log2(Alignment) << ", ";
+  End.print(OS, MAI);
+  if (EmitNops)
+    OS << ", nop";
+  else
+    OS << ", " << static_cast<unsigned>(Fill);
   EmitEOL();
 }
 

diff  --git a/llvm/lib/MC/MCAssembler.cpp b/llvm/lib/MC/MCAssembler.cpp
index 671fb14908a71..62f248dd2e481 100644
--- a/llvm/lib/MC/MCAssembler.cpp
+++ b/llvm/lib/MC/MCAssembler.cpp
@@ -232,6 +232,9 @@ uint64_t MCAssembler::computeFragmentSize(const MCFragment &F) const {
     return Size;
   }
 
+  case MCFragment::FT_PrefAlign:
+    return F.getSize();
+
   case MCFragment::FT_Nops:
     return cast<MCNopsFragment>(F).getNumBytes();
 
@@ -464,6 +467,23 @@ static void writeFragment(raw_ostream &OS, const MCAssembler &Asm,
     }
   } break;
 
+  case MCFragment::FT_PrefAlign: {
+    OS << StringRef(F.getContents().data(), F.getContents().size());
+    uint64_t PadSize = FragmentSize - F.getContents().size();
+    if (F.getPrefAlignEmitNops()) {
+      if (!Asm.getBackend().writeNopData(OS, PadSize, F.getSubtargetInfo()))
+        reportFatalInternalError("unable to write nop sequence of " +
+                                 Twine(PadSize) + " bytes");
+    } else if (F.getPrefAlignFill() == 0) {
+      OS.write_zeros(PadSize);
+    } else {
+      char B = char(F.getPrefAlignFill());
+      for (uint64_t I = 0; I < PadSize; ++I)
+        OS << B;
+    }
+    break;
+  }
+
   case MCFragment::FT_Fill: {
     ++stats::EmittedFillFragments;
     const MCFillFragment &FF = cast<MCFillFragment>(F);
@@ -597,6 +617,10 @@ void MCAssembler::writeSectionData(raw_ostream &OS,
         // 0.
         assert(F.getAlignFill() == 0 && "Invalid align in virtual section!");
         break;
+      case MCFragment::FT_PrefAlign:
+        assert(!F.getPrefAlignEmitNops() && F.getPrefAlignFill() == 0 &&
+               "Invalid align in BSS");
+        break;
       case MCFragment::FT_Fill:
         HasNonZero = cast<MCFillFragment>(F).getValue() != 0;
         break;
@@ -774,6 +798,40 @@ void MCAssembler::relaxAlign(MCFragment &F) {
     F.getParent()->ContentStorage.resize(F.VarContentEnd);
 }
 
+// Compute the body size by walking forward from F to the End symbol and
+// summing fragment sizes. This avoids depending on stale layout offsets.
+void MCAssembler::relaxPrefAlign(MCFragment &F) {
+  uint64_t RawStart = F.Offset + F.getFixedSize();
+  const MCSymbol &End = F.getPrefAlignEnd();
+  if (!End.getFragment() || End.getFragment()->getParent() != F.getParent()) {
+    recordError(SMLoc(), ".prefalign end symbol '" + End.getName() +
+                             "' must be in the current section");
+    return;
+  }
+  const MCFragment *EndFrag = End.getFragment();
+  if (EndFrag->getLayoutOrder() <= F.getLayoutOrder())
+    return;
+  uint64_t BodySize = 0;
+  for (const MCFragment *Cur = F.getNext();; Cur = Cur->getNext()) {
+    if (Cur == EndFrag) {
+      BodySize += End.getOffset();
+      break;
+    }
+    BodySize += computeFragmentSize(*Cur);
+  }
+  Align NewAlign =
+      std::min(Align(llvm::bit_ceil(BodySize)), F.getPrefAlignPreferred());
+  F.setPrefAlignComputed(NewAlign);
+  uint64_t NewPadSize = offsetToAlignment(RawStart, NewAlign);
+  F.VarContentStart = F.getFixedSize();
+  F.VarContentEnd = F.VarContentStart + NewPadSize;
+  if (F.VarContentEnd > F.getParent()->ContentStorage.size())
+    F.getParent()->ContentStorage.resize(F.VarContentEnd);
+  // Update the maximum alignment on the current section if necessary, similar
+  // to MCObjectStreamer::emitValueToAlignment.
+  F.getParent()->ensureMinAlignment(NewAlign);
+}
+
 bool MCAssembler::fixupNeedsRelaxation(const MCFragment &F,
                                        const MCFixup &Fixup) const {
   ++stats::FixupEvalForRelax;
@@ -995,6 +1053,9 @@ void MCAssembler::relaxFragment(MCFragment &F) {
   case MCFragment::FT_BoundaryAlign:
     relaxBoundaryAlign(static_cast<MCBoundaryAlignFragment &>(F));
     break;
+  case MCFragment::FT_PrefAlign:
+    relaxPrefAlign(F);
+    break;
   case MCFragment::FT_CVInlineLines:
     getContext().getCVContext().encodeInlineLineTable(
         *this, static_cast<MCCVInlineLineTableFragment &>(F));

diff  --git a/llvm/lib/MC/MCFragment.cpp b/llvm/lib/MC/MCFragment.cpp
index 85d1c5888f1da..21a304da0bb4f 100644
--- a/llvm/lib/MC/MCFragment.cpp
+++ b/llvm/lib/MC/MCFragment.cpp
@@ -55,7 +55,8 @@ LLVM_DUMP_METHOD void MCFragment::dump() const {
   case MCFragment::FT_DwarfFrame:    OS << "DwarfCallFrame"; break;
   case MCFragment::FT_SFrame:        OS << "SFrame"; break;
   case MCFragment::FT_LEB:           OS << "LEB"; break;
-  case MCFragment::FT_BoundaryAlign: OS<<"BoundaryAlign"; break;
+  case MCFragment::FT_BoundaryAlign: OS << "BoundaryAlign"; break;
+  case MCFragment::FT_PrefAlign:     OS << "PrefAlign"; break;
   case MCFragment::FT_SymbolId:      OS << "SymbolId"; break;
   case MCFragment::FT_CVInlineLines: OS << "CVInlineLineTable"; break;
   case MCFragment::FT_CVDefRange:    OS << "CVDefRangeTable"; break;
@@ -170,6 +171,11 @@ LLVM_DUMP_METHOD void MCFragment::dump() const {
        << " Size:" << BF->getSize();
     break;
   }
+  case MCFragment::FT_PrefAlign:
+    OS << " PrefAlign:" << getPrefAlignPreferred().value()
+       << " End:" << getPrefAlignEnd().getName()
+       << " ComputedAlign:" << getPrefAlignComputed().value();
+    break;
   case MCFragment::FT_SymbolId: {
     const auto *F = cast<MCSymbolIdFragment>(this);
     OS << " Sym:" << F->getSymbol();

diff  --git a/llvm/lib/MC/MCObjectStreamer.cpp b/llvm/lib/MC/MCObjectStreamer.cpp
index 86290521d8266..6cbf208fad20d 100644
--- a/llvm/lib/MC/MCObjectStreamer.cpp
+++ b/llvm/lib/MC/MCObjectStreamer.cpp
@@ -688,8 +688,14 @@ void MCObjectStreamer::emitCodeAlignment(Align Alignment,
   F->STI = STI;
 }
 
-void MCObjectStreamer::emitPrefAlign(Align Alignment) {
-  getCurrentSectionOnly()->ensurePreferredAlignment(Alignment);
+void MCObjectStreamer::emitPrefAlign(Align Alignment, const MCSymbol &End,
+                                     bool EmitNops, uint8_t Fill,
+                                     const MCSubtargetInfo &STI) {
+  auto *F = getCurrentFragment();
+  F->makePrefAlign(Alignment, End, EmitNops, Fill);
+  if (EmitNops)
+    F->STI = &STI;
+  newFragment();
 }
 
 void MCObjectStreamer::emitValueToOffset(const MCExpr *Offset,

diff  --git a/llvm/lib/MC/MCParser/AsmParser.cpp b/llvm/lib/MC/MCParser/AsmParser.cpp
index e2b70c3e7dd35..4e95bf47bb7ee 100644
--- a/llvm/lib/MC/MCParser/AsmParser.cpp
+++ b/llvm/lib/MC/MCParser/AsmParser.cpp
@@ -3470,16 +3470,54 @@ bool AsmParser::parseDirectiveAlign(bool IsPow2, uint8_t ValueSize) {
 
 bool AsmParser::parseDirectivePrefAlign() {
   SMLoc AlignmentLoc = getLexer().getLoc();
-  int64_t Alignment;
-  if (checkForValidSection() || parseAbsoluteExpression(Alignment))
+  int64_t Log2Alignment;
+  if (checkForValidSection() || parseAbsoluteExpression(Log2Alignment))
     return true;
-  if (parseEOL())
+
+  if (Log2Alignment < 0 || Log2Alignment > 63)
+    return Error(AlignmentLoc, "log2 alignment must be in the range [0, 63]");
+
+  // Parse end symbol: .prefalign N, sym
+  SMLoc SymLoc = getLexer().getLoc();
+  if (parseComma())
+    return true;
+  StringRef Name;
+  SymLoc = getLexer().getLoc();
+  if (parseIdentifier(Name))
+    return Error(SymLoc, "expected symbol name");
+  MCSymbol *End = getContext().getOrCreateSymbol(Name);
+
+  // Parse fill operand: integer byte [0, 255] or "nop".
+  SMLoc FillLoc = getLexer().getLoc();
+  if (parseComma())
     return true;
 
-  if (!isPowerOf2_64(Alignment))
-    return Error(AlignmentLoc, "alignment must be a power of 2");
-  getStreamer().emitPrefAlign(Align(Alignment));
+  bool EmitNops = false;
+  uint8_t Fill = 0;
+  SMLoc FillLoc2 = getLexer().getLoc();
+  if (getLexer().is(AsmToken::Identifier) &&
+      getLexer().getTok().getIdentifier() == "nop") {
+    EmitNops = true;
+    Lex();
+  } else {
+    int64_t FillVal;
+    if (parseAbsoluteExpression(FillVal))
+      return true;
+    if (FillVal < 0 || FillVal > 255)
+      return Error(FillLoc2, "fill value must be in range [0, 255]");
+    Fill = static_cast<uint8_t>(FillVal);
+  }
 
+  if (parseEOL())
+    return true;
+  if ((EmitNops || Fill != 0) &&
+      getStreamer().getCurrentSectionOnly()->isBssSection())
+    return Error(FillLoc, "non-zero fill in BSS section '" +
+                              getStreamer().getCurrentSectionOnly()->getName() +
+                              "'");
+
+  getStreamer().emitPrefAlign(Align(1ULL << Log2Alignment), *End, EmitNops,
+                              Fill, getTargetParser().getSTI());
   return false;
 }
 

diff  --git a/llvm/lib/MC/MCSection.cpp b/llvm/lib/MC/MCSection.cpp
index 8285379eeaf81..a668e7919b7b9 100644
--- a/llvm/lib/MC/MCSection.cpp
+++ b/llvm/lib/MC/MCSection.cpp
@@ -30,16 +30,6 @@ MCSymbol *MCSection::getEndSymbol(MCContext &Ctx) {
   return End;
 }
 
-Align MCSection::getAlignmentForObjectFile(uint64_t Size) const {
-  if (Size < getAlign().value())
-    return getAlign();
-
-  if (Size < getPreferredAlignment().value())
-    return Align(NextPowerOf2(Size - 1));
-
-  return getPreferredAlignment();
-}
-
 bool MCSection::hasEnded() const { return End && End->isInSection(); }
 
 #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)

diff  --git a/llvm/lib/MC/MCStreamer.cpp b/llvm/lib/MC/MCStreamer.cpp
index 4131d027f63f9..b52e3e5b90bf2 100644
--- a/llvm/lib/MC/MCStreamer.cpp
+++ b/llvm/lib/MC/MCStreamer.cpp
@@ -1363,7 +1363,8 @@ void MCStreamer::emitFill(const MCExpr &NumBytes, uint64_t Value, SMLoc Loc) {}
 void MCStreamer::emitFill(const MCExpr &NumValues, int64_t Size, int64_t Expr,
                           SMLoc Loc) {}
 void MCStreamer::emitValueToAlignment(Align, int64_t, uint8_t, unsigned) {}
-void MCStreamer::emitPrefAlign(Align A) {}
+void MCStreamer::emitPrefAlign(Align A, const MCSymbol &End, bool EmitNops,
+                               uint8_t Fill, const MCSubtargetInfo &STI) {}
 void MCStreamer::emitCodeAlignment(Align Alignment, const MCSubtargetInfo *STI,
                                    unsigned MaxBytesToEmit) {}
 void MCStreamer::emitValueToOffset(const MCExpr *Offset, unsigned char Value,

diff  --git a/llvm/test/CodeGen/AArch64/preferred-function-alignment.ll b/llvm/test/CodeGen/AArch64/preferred-function-alignment.ll
index a6cb7123e5af4..fa14d94e856dc 100644
--- a/llvm/test/CodeGen/AArch64/preferred-function-alignment.ll
+++ b/llvm/test/CodeGen/AArch64/preferred-function-alignment.ll
@@ -29,10 +29,11 @@ define void @test() {
 }
 
 ; CHECK-LABEL: test
-; ALIGN2: .p2align 2
-; ALIGN3: .p2align 3
-; ALIGN4: .p2align 4
-; ALIGN5: .p2align 5
+; CHECK: .p2align 2
+; ALIGN2-NOT: .prefalign
+; ALIGN3-NEXT: .prefalign 3
+; ALIGN4-NEXT: .prefalign 4
+; ALIGN5-NEXT: .prefalign 5
 
 define void @test_optsize() optsize {
   ret void

diff  --git a/llvm/test/CodeGen/ARM/preferred-function-alignment.ll b/llvm/test/CodeGen/ARM/preferred-function-alignment.ll
index 2fc67905f6db7..4fd9ed345d9ec 100644
--- a/llvm/test/CodeGen/ARM/preferred-function-alignment.ll
+++ b/llvm/test/CodeGen/ARM/preferred-function-alignment.ll
@@ -1,15 +1,18 @@
 ; RUN: llc -mtriple=arm-none-eabi -mcpu=cortex-m85 < %s | FileCheck --check-prefixes=CHECK,ALIGN-64,ALIGN-CS-16 %s
 ; RUN: llc -mtriple=arm-none-eabi -mcpu=cortex-m23 < %s | FileCheck --check-prefixes=CHECK,ALIGN-16,ALIGN-CS-16 %s
 
-; RUN: llc -mtriple=arm-none-eabi -mcpu=cortex-a5 < %s  | FileCheck --check-prefixes=CHECK,ALIGN-32,ALIGN-CS-32 %s
-; RUN: llc -mtriple=arm-none-eabi -mcpu=cortex-m33 < %s | FileCheck --check-prefixes=CHECK,ALIGN-32,ALIGN-CS-16 %s
-; RUN: llc -mtriple=arm-none-eabi -mcpu=cortex-m55 < %s | FileCheck --check-prefixes=CHECK,ALIGN-32,ALIGN-CS-16 %s
+; RUN: llc -mtriple=arm-none-eabi -mcpu=cortex-a5 < %s  | FileCheck --check-prefixes=CHECK,ALIGN-32A,ALIGN-CS-32 %s
+; RUN: llc -mtriple=arm-none-eabi -mcpu=cortex-m33 < %s | FileCheck --check-prefixes=CHECK,ALIGN-32T,ALIGN-CS-16 %s
+; RUN: llc -mtriple=arm-none-eabi -mcpu=cortex-m55 < %s | FileCheck --check-prefixes=CHECK,ALIGN-32T,ALIGN-CS-16 %s
 ; RUN: llc -mtriple=arm-none-eabi -mcpu=cortex-m7 < %s | FileCheck --check-prefixes=CHECK,ALIGN-64,ALIGN-CS-16 %s
 
 ; CHECK-LABEL: test
 ; ALIGN-16: .p2align 1
-; ALIGN-32: .p2align 2
-; ALIGN-64: .p2align 3
+; ALIGN-32A: .p2align 2
+; ALIGN-32T: .p2align 1
+; ALIGN-32T-NEXT: .prefalign 2
+; ALIGN-64: .p2align 1
+; ALIGN-64-NEXT: .prefalign 3
 
 define void @test() {
   ret void

diff  --git a/llvm/test/CodeGen/LoongArch/linker-relaxation.ll b/llvm/test/CodeGen/LoongArch/linker-relaxation.ll
index 6b197bc578919..873a1f9168323 100644
--- a/llvm/test/CodeGen/LoongArch/linker-relaxation.ll
+++ b/llvm/test/CodeGen/LoongArch/linker-relaxation.ll
@@ -77,7 +77,6 @@ declare dso_local void @callee3() nounwind
 ; RELAX-NEXT:       R_LARCH_RELAX - 0x0
 ; CHECK-RELOC-NEXT: R_LARCH_PCALA_LO12 g_i1 0x0
 ; RELAX-NEXT:       R_LARCH_RELAX - 0x0
-; RELAX-NEXT:       R_LARCH_ALIGN - 0x1C
 ; CHECK-RELOC-NEXT: R_LARCH_CALL36 callee1 0x0
 ; RELAX-NEXT:       R_LARCH_RELAX - 0x0
 ; CHECK-RELOC-NEXT: R_LARCH_CALL36 callee2 0x0

diff  --git a/llvm/test/CodeGen/PowerPC/code-align.ll b/llvm/test/CodeGen/PowerPC/code-align.ll
index 805873816c4d9..841636d65d87e 100644
--- a/llvm/test/CodeGen/PowerPC/code-align.ll
+++ b/llvm/test/CodeGen/PowerPC/code-align.ll
@@ -20,9 +20,7 @@ entry:
   ret i32 %mul
 
 ; CHECK-LABEL: .globl  foo
-; GENERIC: .p2align  2
-; BASIC: .p2align  4
-; PWR: .p2align  4
+; CHECK: .p2align  2
 ; CHECK: @foo
 }
 

diff  --git a/llvm/test/CodeGen/PowerPC/ppc64-calls.ll b/llvm/test/CodeGen/PowerPC/ppc64-calls.ll
index 2c2743f5400d9..67ff626b4f680 100644
--- a/llvm/test/CodeGen/PowerPC/ppc64-calls.ll
+++ b/llvm/test/CodeGen/PowerPC/ppc64-calls.ll
@@ -19,7 +19,7 @@ define dso_local void @test_direct() nounwind readnone {
   tail call void @foo() nounwind
 ; Because of tail call optimization, it can be 'b' instruction.
 ; CHECK: [[BR:b[l]?]] foo
-; CHECK-NOT: nop
+; CHECK-NOT: {{^[[:space:]]+}}nop
   ret void
 }
 

diff  --git a/llvm/test/CodeGen/SystemZ/vec-perm-14.ll b/llvm/test/CodeGen/SystemZ/vec-perm-14.ll
index 0b392676fa3ec..3c946f72935fc 100644
--- a/llvm/test/CodeGen/SystemZ/vec-perm-14.ll
+++ b/llvm/test/CodeGen/SystemZ/vec-perm-14.ll
@@ -61,7 +61,8 @@ define <4 x i8> @fun1(<2 x i8> %arg) {
 ; CHECK-NEXT:        .space  1
 ; CHECK-NEXT:        .text
 ; CHECK-NEXT:        .globl  fun1
-; CHECK-NEXT:        .p2align        4
+; CHECK-NEXT:        .p2align        1
+; CHECK-NEXT:        .prefalign      4, .Lfunc_end1, nop
 ; CHECK-NEXT:        .type   fun1, at function
 ; CHECK-NEXT: fun1:                                  # @fun1
 ; CHECK-NEXT:        .cfi_startproc
@@ -96,7 +97,8 @@ define <4 x i8> @fun2(<2 x i8> %arg) {
 ; CHECK-NEXT:        .space  1
 ; CHECK-NEXT:        .text
 ; CHECK-NEXT:        .globl  fun2
-; CHECK-NEXT:        .p2align        4
+; CHECK-NEXT:        .p2align        1
+; CHECK-NEXT:        .prefalign      4, .Lfunc_end2, nop
 ; CHECK-NEXT:        .type   fun2, at function
 ; CHECK-NEXT:fun2:                                   # @fun2
 ; CHECK-NEXT:        .cfi_startproc

diff  --git a/llvm/test/CodeGen/X86/eh-label.ll b/llvm/test/CodeGen/X86/eh-label.ll
index 78611000e18dd..b3954700463eb 100644
--- a/llvm/test/CodeGen/X86/eh-label.ll
+++ b/llvm/test/CodeGen/X86/eh-label.ll
@@ -7,7 +7,7 @@ define void @f() personality ptr @g {
 bb0:
   call void asm ".Lfunc_end0:", ""()
 ; CHECK: #APP
-; CHECK-NEXT: .Lfunc_end0:
+; CHECK-NEXT: .Lfunc_end0{{.*}}:
 ; CHECK-NEXT: #NO_APP
 
   invoke void @g() to label %bb2 unwind label %bb1

diff  --git a/llvm/test/CodeGen/X86/empty-function.ll b/llvm/test/CodeGen/X86/empty-function.ll
index 7d908311ec8dc..bf05c8e359130 100644
--- a/llvm/test/CodeGen/X86/empty-function.ll
+++ b/llvm/test/CodeGen/X86/empty-function.ll
@@ -16,7 +16,7 @@ entry:
 ; CHECK-LABEL: f:
 ; WIN32: nop
 ; WIN64: nop
-; LINUX-NOT: nop
+; LINUX-NOT: {{^[[:space:]]+}}nop
 ; LINUX-NOT: ud2
 
 }

diff  --git a/llvm/test/CodeGen/X86/kcfi-arity.ll b/llvm/test/CodeGen/X86/kcfi-arity.ll
index 5a19bcd7835ea..ef859ef65fb63 100644
--- a/llvm/test/CodeGen/X86/kcfi-arity.ll
+++ b/llvm/test/CodeGen/X86/kcfi-arity.ll
@@ -3,7 +3,8 @@
 ; RUN: llc -mtriple=x86_64-unknown-linux-gnu -verify-machineinstrs -stop-after=finalize-isel < %s | FileCheck %s --check-prefixes=MIR,ISEL
 ; RUN: llc -mtriple=x86_64-unknown-linux-gnu -verify-machineinstrs -stop-after=kcfi < %s | FileCheck %s --check-prefixes=MIR,KCFI
 
-; ASM:       .p2align 4
+; ASM:       .p2align 2
+; ASM:       .prefalign 4
 ; ASM:       .type __cfi_f1, at function
 ; ASM-LABEL: __cfi_f1:
 ; ASM-NEXT:    nop

diff  --git a/llvm/test/CodeGen/X86/kcfi-patchable-function-prefix.ll b/llvm/test/CodeGen/X86/kcfi-patchable-function-prefix.ll
index 1b7bd7835e890..63018b75c908f 100644
--- a/llvm/test/CodeGen/X86/kcfi-patchable-function-prefix.ll
+++ b/llvm/test/CodeGen/X86/kcfi-patchable-function-prefix.ll
@@ -1,6 +1,6 @@
 ; RUN: llc -mtriple=x86_64-unknown-linux-gnu -verify-machineinstrs < %s | FileCheck %s
 
-; CHECK:          .p2align 4
+; CHECK:          .prefalign 4, .Lfunc_end0, nop
 ; CHECK-LABEL:    __cfi_f1:
 ; CHECK-COUNT-11:   nop
 ; CHECK-NEXT:       movl $12345678, %eax
@@ -12,10 +12,11 @@ define void @f1(ptr noundef %x) !kcfi_type !1 {
   call void %x() [ "kcfi"(i32 12345678) ]
   ret void
 }
+; CHECK:          .Lfunc_end0:
 
-; CHECK:          .p2align 4
+; CHECK:          .prefalign 4
 ; CHECK-NOT:      __cfi_f2:
-; CHECK-NOT:        nop
+; CHECK-NOT:        {{^[[:space:]]+}}nop
 ; CHECK-LABEL:    f2:
 define void @f2(ptr noundef %x) {
 ; CHECK:            addl -4(%r{{..}}), %r10d
@@ -23,9 +24,9 @@ define void @f2(ptr noundef %x) {
   ret void
 }
 
-; CHECK:          .p2align 4
+; CHECK:          .prefalign 4
 ; CHECK-LABEL:    __cfi_f3:
-; CHECK-NOT:        nop
+; CHECK-NOT:        {{^[[:space:]]+}}nop
 ; CHECK-NEXT:       movl $12345678, %eax
 ; CHECK-COUNT-11:   nop
 ; CHECK-LABEL:    f3:
@@ -35,9 +36,9 @@ define void @f3(ptr noundef %x) #0 !kcfi_type !1 {
   ret void
 }
 
-; CHECK:          .p2align 4
+; CHECK:          .prefalign 4
 ; CHECK-NOT:      __cfi_f4:
-; CHECK-COUNT-16:   nop
+; CHECK-COUNT-16:   {{^[[:space:]]+}}nop
 ; CHECK-LABEL:    f4:
 define void @f4(ptr noundef %x) #0 {
 ; CHECK:            addl -15(%r{{..}}), %r10d

diff  --git a/llvm/test/CodeGen/X86/kcfi.ll b/llvm/test/CodeGen/X86/kcfi.ll
index fd93b8e3d4188..4a01f65e92721 100644
--- a/llvm/test/CodeGen/X86/kcfi.ll
+++ b/llvm/test/CodeGen/X86/kcfi.ll
@@ -2,7 +2,8 @@
 ; RUN: llc -mtriple=x86_64-unknown-linux-gnu -verify-machineinstrs -stop-after=finalize-isel < %s | FileCheck %s --check-prefixes=MIR,ISEL
 ; RUN: llc -mtriple=x86_64-unknown-linux-gnu -verify-machineinstrs -stop-after=kcfi < %s | FileCheck %s --check-prefixes=MIR,KCFI
 
-; ASM:       .p2align 4
+; ASM:       .p2align 2
+; ASM:       .prefalign 4
 ; ASM:       .type __cfi_f1, at function
 ; ASM-LABEL: __cfi_f1:
 ; ASM-NEXT:    nop

diff  --git a/llvm/test/CodeGen/X86/prefalign.ll b/llvm/test/CodeGen/X86/prefalign.ll
index 062cf740eabeb..c5d5a9223c2ba 100644
--- a/llvm/test/CodeGen/X86/prefalign.ll
+++ b/llvm/test/CodeGen/X86/prefalign.ll
@@ -1,12 +1,11 @@
-; RUN: llc < %s | FileCheck --check-prefixes=CHECK,NOFS %s
-; RUN: llc -function-sections < %s | FileCheck --check-prefixes=CHECK,FS %s
+; RUN: llc < %s | FileCheck %s
+; RUN: llc -function-sections < %s | FileCheck %s
 
 target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
 target triple = "x86_64-unknown-linux-gnu"
 
 ; CHECK: .globl f1
-; NOFS-NEXT: .p2align 4
-; FS-NEXT: .prefalign 16
+; CHECK-NEXT: .prefalign 4
 define void @f1() {
   ret void
 }
@@ -19,9 +18,8 @@ define void @f2() prefalign(1) {
 }
 
 ; CHECK: .globl f3
-; NOFS-NEXT: .p2align 2
-; FS-NEXT: .p2align 1
-; FS-NEXT: .prefalign 4
+; CHECK-NEXT: .p2align 1
+; CHECK-NEXT: .prefalign 2
 define void @f3() align 2 prefalign(4) {
   ret void
 }

diff  --git a/llvm/test/CodeGen/X86/statepoint-invoke.ll b/llvm/test/CodeGen/X86/statepoint-invoke.ll
index 34dbc21a8a8cb..add4041a28c73 100644
--- a/llvm/test/CodeGen/X86/statepoint-invoke.ll
+++ b/llvm/test/CodeGen/X86/statepoint-invoke.ll
@@ -56,7 +56,7 @@ exceptional_return:
 ; CHECK: .uleb128  .Ltmp{{[0-9]+}}-.Ltmp{{[0-9]+}}
 ; CHECK: .uleb128  .Ltmp{{[0-9]+}}-.Lfunc_begin{{[0-9]+}}
 ; CHECK: .byte  0
-; CHECK: .p2align 4
+; CHECK: .prefalign 4
 
 define ptr addrspace(1) @test_result(ptr addrspace(1) %obj,
 ; CHECK-LABEL: test_result:
@@ -99,7 +99,7 @@ exceptional_return:
 ; CHECK: .uleb128 .Ltmp{{[0-9]+}}-.Ltmp{{[0-9]+}}
 ; CHECK: .uleb128 .Ltmp{{[0-9]+}}-.Lfunc_begin{{[0-9]+}}
 ; CHECK: .byte 0
-; CHECK: .p2align 4
+; CHECK: .prefalign 4
 
 define ptr addrspace(1) @test_same_val(i1 %cond, ptr addrspace(1) %val1, ptr addrspace(1) %val2, ptr addrspace(1) %val3)
 ; CHECK-LABEL: test_same_val:

diff  --git a/llvm/test/DebugInfo/KeyInstructions/X86/dwarf-basic.ll b/llvm/test/DebugInfo/KeyInstructions/X86/dwarf-basic.ll
index b97b436ffc573..f2acb802b5295 100644
--- a/llvm/test/DebugInfo/KeyInstructions/X86/dwarf-basic.ll
+++ b/llvm/test/DebugInfo/KeyInstructions/X86/dwarf-basic.ll
@@ -21,16 +21,16 @@
 ; OBJ: 0000000000000000 <_Z1fi>:
 ; OBJ-NEXT: 0: leal    0x1(%rdi), %eax
 ; OBJ-NEXT: 3: retq
-; OBJ: 0000000000000010 <_Z1gi>:
-; OBJ-NEXT: 10: leal    0x1(%rdi), %eax
-; OBJ-NEXT: 13: retq
+; OBJ: 0000000000000004 <_Z1gi>:
+; OBJ-NEXT: 4: leal    0x1(%rdi), %eax
+; OBJ-NEXT: 7: retq
 
 ; DBG:      Address            Line   Column File   ISA Discriminator OpIndex Flags
 ; DBG-NEXT: ------------------ ------ ------ ------ --- ------------- ------- -------------
 ; DBG-NEXT: 0x0000000000000000      2      0      0   0             0       0  is_stmt prologue_end
 ; DBG-NEXT: 0x0000000000000003      3      0      0   0             0       0  is_stmt
-; DBG-NEXT: 0x0000000000000010      2      0      0   0             0       0  is_stmt prologue_end
-; DBG-NEXT: 0x0000000000000013      6      0      0   0             0       0  is_stmt
+; DBG-NEXT: 0x0000000000000004      2      0      0   0             0       0  is_stmt prologue_end
+; DBG-NEXT: 0x0000000000000007      6      0      0   0             0       0  is_stmt
 
 target triple = "x86_64-unknown-linux-gnu"
 

diff  --git a/llvm/test/DebugInfo/LoongArch/relax_dwo_ranges.ll b/llvm/test/DebugInfo/LoongArch/relax_dwo_ranges.ll
index 073ab562df57b..458d7e79c67cb 100644
--- a/llvm/test/DebugInfo/LoongArch/relax_dwo_ranges.ll
+++ b/llvm/test/DebugInfo/LoongArch/relax_dwo_ranges.ll
@@ -46,9 +46,9 @@
 ; DWARF5: Addrs: [
 ; DWARF5-NEXT: 0x0000000000000000
 ; DWARF5-NEXT: 0x0000000000000040
-; DWARF5-NEXT: 0x000000000000005c
-; DWARF5-NEXT: 0x000000000000009c
-; DWARF5-NEXT: 0x00000000000000e0
+; DWARF5-NEXT: 0x0000000000000040
+; DWARF5-NEXT: 0x0000000000000080
+; DWARF5-NEXT: 0x00000000000000c4
 ; DWARF5-NEXT: ]
 
 ; HDR-NOT: .rela.{{.*}}.dwo
@@ -57,9 +57,9 @@
 ; entries respectively
 ; DWARF5: .debug_rnglists.dwo contents:
 ; DWARF5: ranges:
-; DWARF5-NEXT: 0x00000014: [DW_RLE_startx_length]:  0x0000000000000002, 0x0000000000000024 => [0x000000000000005c, 0x0000000000000080)
+; DWARF5-NEXT: 0x00000014: [DW_RLE_startx_length]:  0x0000000000000002, 0x0000000000000024 => [0x0000000000000040, 0x0000000000000064)
 ; DWARF5-NEXT: 0x00000017: [DW_RLE_end_of_list  ]
-; DWARF5-NEXT: 0x00000018: [DW_RLE_startx_endx  ]:  0x0000000000000003, 0x0000000000000004 => [0x000000000000009c, 0x00000000000000e0)
+; DWARF5-NEXT: 0x00000018: [DW_RLE_startx_endx  ]:  0x0000000000000003, 0x0000000000000004 => [0x0000000000000080, 0x00000000000000c4)
 ; DWARF5-NEXT: 0x0000001b: [DW_RLE_end_of_list  ]
 ; DWARF5-EMPTY:
 
@@ -72,13 +72,13 @@
 ; DWARF4: DW_AT_name {{.*}} "square") 
 
 ; DWARF4: DW_TAG_subprogram
-; DWARF4-NEXT: DW_AT_low_pc [DW_FORM_GNU_addr_index]	(indexed (00000002) address = 0x000000000000005c ".text")
+; DWARF4-NEXT: DW_AT_low_pc [DW_FORM_GNU_addr_index]	(indexed (00000002) address = 0x0000000000000040 ".text")
 ; DWARF4-NEXT: DW_AT_high_pc [DW_FORM_data4]	(0x00000024)
 ; DWARF4: DW_AT_name {{.*}} "boo") 
 
 ; DWARF4: DW_TAG_subprogram
-; DWARF4-NEXT: DW_AT_low_pc  [DW_FORM_GNU_addr_index] (indexed (00000003) address = 0x000000000000009c ".text")
-; DWARF4-NEXT: DW_AT_high_pc [DW_FORM_GNU_addr_index] (indexed (00000004) address = 0x00000000000000e0 ".text")
+; DWARF4-NEXT: DW_AT_low_pc  [DW_FORM_GNU_addr_index] (indexed (00000003) address = 0x0000000000000080 ".text")
+; DWARF4-NEXT: DW_AT_high_pc [DW_FORM_GNU_addr_index] (indexed (00000004) address = 0x00000000000000c4 ".text")
 ; DWARF4: DW_AT_name {{.*}} "main") 
 
 ; HDR-NOT: .rela.{{.*}}.dwo
@@ -88,9 +88,9 @@
 ; DWARF4: Addrs: [
 ; DWARF4-NEXT: 0x0000000000000000
 ; DWARF4-NEXT: 0x0000000000000040
-; DWARF4-NEXT: 0x000000000000005c
-; DWARF4-NEXT: 0x000000000000009c
-; DWARF4-NEXT: 0x00000000000000e0
+; DWARF4-NEXT: 0x0000000000000040
+; DWARF4-NEXT: 0x0000000000000080
+; DWARF4-NEXT: 0x00000000000000c4
 ; DWARF4-NEXT: ]
 
 ; HDR-NOT: .rela.{{.*}}.dwo

diff  --git a/llvm/test/DebugInfo/X86/header.ll b/llvm/test/DebugInfo/X86/header.ll
index 0c730252701c5..70efb946a2f0c 100644
--- a/llvm/test/DebugInfo/X86/header.ll
+++ b/llvm/test/DebugInfo/X86/header.ll
@@ -5,7 +5,7 @@
 ; CHECK:      .file	"<stdin>"
 ; CHECK-NEXT:  .text
 ; CHECK-NEXT: .globl	f
-; CHECK-NEXT: .p2align	4
+; CHECK-NEXT: .prefalign	4, .Lfunc_end0, nop
 ; CHECK-NEXT: .type	f, at function
 ; CHECK-NEXT: f:                                      # @f
 

diff  --git a/llvm/test/DebugInfo/X86/ranges_always.ll b/llvm/test/DebugInfo/X86/ranges_always.ll
index 76f846e51d2fb..7c07b464f3af4 100644
--- a/llvm/test/DebugInfo/X86/ranges_always.ll
+++ b/llvm/test/DebugInfo/X86/ranges_always.ll
@@ -102,9 +102,9 @@
 ; CHECK:     NULL
 ; CHECK:   DW_TAG_subprogram
 ; EXPR:      DW_AT_low_pc
-; EXPR-SAME:   [DW_FORM_exprloc] (DW_OP_addrx 0x0, DW_OP_const4u 0x30, DW_OP_plus)
+; EXPR-SAME:   [DW_FORM_exprloc] (DW_OP_addrx 0x0, DW_OP_const4u 0x1e, DW_OP_plus)
 ; FORM:      DW_AT_low_pc
-; FORM-SAME:   [DW_FORM_LLVM_addrx_offset] (indexed (00000000) + 0x30 address = 0x0000000000000030 ".text")
+; FORM-SAME:   [DW_FORM_LLVM_addrx_offset] (indexed (00000000) + 0x1e address = 0x000000000000001e ".text")
 ; EXPRORFORM: DW_AT_high_pc
 ; EXPRORFORM-SAME: (0x00000001)
 ; RNG:       DW_AT_ranges

diff  --git a/llvm/test/MC/ELF/prefalign-convergence.s b/llvm/test/MC/ELF/prefalign-convergence.s
new file mode 100644
index 0000000000000..263764d4b8f29
--- /dev/null
+++ b/llvm/test/MC/ELF/prefalign-convergence.s
@@ -0,0 +1,47 @@
+# REQUIRES: asserts
+## Test that sections with many .prefalign fragments converge quickly.
+## PrefAlign fragments see fresh offsets and converge in 1 iteration.
+
+# RUN: llvm-mc -filetype=obj -triple x86_64 --stats %s -o %t 2>&1 | FileCheck %s --check-prefix=STATS
+# RUN: llvm-objdump -d --no-show-raw-insn %t | FileCheck %s
+
+# STATS: 2 assembler - Number of assembler layout and relaxation steps
+
+# CHECK:       8: int3
+# CHECK-NEXT:  9: int3
+# CHECK-NEXT:  a: int3
+# CHECK-NEXT:  b: int3
+# CHECK-NEXT:  c: int3
+# CHECK-NEXT:  d: nopl
+# CHECK-NEXT: 10: int3
+# CHECK:      15: nopl
+# CHECK-NEXT: 18: int3
+# CHECK:      1d: nopl
+# CHECK-NEXT: 20: int3
+# CHECK:      25: nopl
+# CHECK-NEXT: 28: int3
+# CHECK:      2d: nopl
+# CHECK-NEXT: 30: int3
+# CHECK:      35: nopl
+# CHECK-NEXT: 38: int3
+# CHECK:      3d: nopl
+# CHECK-NEXT: 40: int3
+# CHECK:      45: nopl
+# CHECK-NEXT: 48: int3
+# CHECK:      4d: nopl
+# CHECK-NEXT: 50: int3
+# CHECK-NEXT: 51: int3
+# CHECK-NEXT: 52: int3
+# CHECK-NEXT: 53: int3
+# CHECK-NEXT: 54: int3
+
+.section .text,"ax", at progbits
+.byte 0
+
+.rept 10
+.prefalign 4, .Lend\+, nop
+.rept 5
+int3
+.endr
+.Lend\+:
+.endr

diff  --git a/llvm/test/MC/ELF/prefalign-errors.s b/llvm/test/MC/ELF/prefalign-errors.s
index 802a78fde7c44..4325e507577e7 100644
--- a/llvm/test/MC/ELF/prefalign-errors.s
+++ b/llvm/test/MC/ELF/prefalign-errors.s
@@ -1,5 +1,50 @@
-// RUN: not llvm-mc -filetype=asm -triple x86_64-pc-linux-gnu %s -o - 2>&1 | FileCheck %s
+# RUN: rm -fr %t && split-file %s %t && cd %t
+# RUN: not llvm-mc -triple=x86_64 a.s 2>&1 | FileCheck a.s
+# RUN: not llvm-mc -triple=x86_64 -filetype=obj b.s 2>&1 | FileCheck b.s
+# RUN: not llvm-mc -triple=x86_64 -filetype=obj c.s 2>&1 | FileCheck c.s
 
+#--- a.s
 .section .text.f1,"ax", at progbits
-// CHECK: {{.*}}.s:[[# @LINE+1]]:12: error: alignment must be a power of 2
-.prefalign 3
+# CHECK: [[#@LINE+1]]:12: error: log2 alignment must be in the range [0, 63]
+.prefalign 64
+
+# CHECK: [[#@LINE+1]]:13: error: expected comma
+.prefalign 4
+
+# CHECK: [[#@LINE+1]]:14: error: expected symbol name
+.prefalign 4,
+
+# CHECK: [[#@LINE+1]]:22: error: expected comma
+.prefalign 4,.text.f1
+
+# CHECK: [[#@LINE+1]]:23: error: expected absolute expression
+.prefalign 4,.text.f1,trap
+
+# CHECK: [[#@LINE+1]]:23: error: fill value must be in range [0, 255]
+.prefalign 4,.text.f1,256
+
+# CHECK: [[#@LINE+1]]:23: error: fill value must be in range [0, 255]
+.prefalign 4,.text.f1,-1
+
+## Non-zero fill in a BSS section.
+.bss
+# CHECK: [[#@LINE+1]]:19: error: non-zero fill in BSS section '.bss'
+.prefalign 4,.Lend,1
+# CHECK: [[#@LINE+1]]:19: error: non-zero fill in BSS section '.bss'
+.prefalign 4,.Lend,nop
+.space 1
+.Lend:
+
+#--- b.s
+## End symbol is undefined.
+.section .text.f1,"ax", at progbits
+# CHECK: <unknown>:0: error: .prefalign end symbol 'undef' must be in the current section
+.prefalign 4,undef,0
+
+#--- c.s
+## End symbol is defined in a 
diff erent section.
+.section .text.f1,"ax", at progbits
+.prefalign 4,.Lend,0
+# CHECK: <unknown>:0: error: .prefalign end symbol '.Lend' must be in the current section
+.section .text.f2,"ax", at progbits
+.Lend:

diff  --git a/llvm/test/MC/ELF/prefalign.s b/llvm/test/MC/ELF/prefalign.s
index 803bb5d730340..032cd114d3cc3 100644
--- a/llvm/test/MC/ELF/prefalign.s
+++ b/llvm/test/MC/ELF/prefalign.s
@@ -1,104 +1,138 @@
-// RUN: llvm-mc -triple x86_64 %s -o - | FileCheck --check-prefix=ASM %s
-// RUN: llvm-mc -filetype=obj -triple x86_64 %s -o - | llvm-readelf -SW - | FileCheck --check-prefix=OBJ %s
+# RUN: llvm-mc -triple x86_64 %s -o - | FileCheck --check-prefix=ASM %s
+# RUN: llvm-mc -filetype=obj -triple x86_64 %s -o %t
+# RUN: llvm-readelf -SW %t | FileCheck --check-prefix=OBJ %s
+# RUN: llvm-objdump -d --no-show-raw-insn %t | FileCheck --check-prefix=DIS %s
+# RUN: llvm-objdump -s -j .text.f1 -j .text.f2 -j .text.f6 %t | FileCheck --check-prefix=HEX %s
 
-// Minimum alignment >= preferred alignment, no effect on sh_addralign.
-// ASM: .section .text.f1lt
-// ASM: .p2align 2
-// ASM: .prefalign 2 
-// OBJ: .text.f1lt        PROGBITS        0000000000000000 000040 000003 00  AX  0   0  4
-.section .text.f1lt,"ax", at progbits
+## MinAlign >= PrefAlign: the three-way rule is bounded by MinAlign regardless
+## of body size, so sh_addralign stays at MinAlign.
+# ASM: .section .text.f1
+# ASM: .p2align 2
+# ASM: .prefalign 1, .Lf1_end, 0
+# OBJ: .text.f1          PROGBITS        0000000000000000 {{[0-9a-f]+}} 000003 00  AX  0   0  4
+# HEX:      Contents of section .text.f1:
+# HEX-NEXT:  0000 f8f8f8 ...
+.section .text.f1,"ax", at progbits
 .p2align 2
-.prefalign 2
+.prefalign 1, .Lf1_end, 0
 .rept 3
-nop
+clc
 .endr
+.Lf1_end:
 
-// ASM: .section .text.f1eq
-// ASM: .p2align 2
-// ASM: .prefalign 2 
-// OBJ: .text.f1eq        PROGBITS        0000000000000000 000044 000004 00  AX  0   0  4
-.section .text.f1eq,"ax", at progbits
+## Multiple .prefalign on the same end symbol: effective PrefAlign is the maximum.
+# ASM: .section .text.f2
+# ASM: .prefalign 3, .Lf2_end, 0
+# ASM: .prefalign 4, .Lf2_end, 0
+# ASM: .prefalign 3, .Lf2_end, 0
+# OBJ: .text.f2          PROGBITS        0000000000000000 {{[0-9a-f]+}} 000009 00  AX  0   0 16
+# HEX-NEXT: Contents of section .text.f2:
+# HEX-NEXT:  0000 f8f8f8f8 f8f8f8f8 f8 .........
+.section .text.f2,"ax", at progbits
 .p2align 2
-.prefalign 2
-.rept 4
-nop
-.endr
-
-// ASM: .section .text.f1gt
-// ASM: .p2align 2
-// ASM: .prefalign 2 
-// OBJ: .text.f1gt        PROGBITS        0000000000000000 000048 000005 00  AX  0   0  4
-.section .text.f1gt,"ax", at progbits
-.p2align 2
-.prefalign 2
-.rept 5
-nop
+.prefalign 3, .Lf2_end, 0
+.prefalign 4, .Lf2_end, 1-1
+.prefalign 3, .Lf2_end, 0
+.rept 9
+clc
 .endr
+.Lf2_end:
 
-// Minimum alignment < preferred alignment, sh_addralign influenced by section size.
-// Use maximum of all .prefalign directives.
-// ASM: .section .text.f2lt
-// ASM: .p2align 2
-// ASM: .prefalign 8
-// ASM: .prefalign 16 
-// ASM: .prefalign 8
-// OBJ: .text.f2lt        PROGBITS        0000000000000000 000050 000003 00  AX  0   0  4
-.section .text.f2lt,"ax", at progbits
+## Multiple functions in a section, each with its own .prefalign.
+## nop fill; f3b's 5-byte padding is a NOP.
+## f3b: ComputedAlign=8,  padding=5
+## f3c: ComputedAlign=16, padding=0
+# ASM: .prefalign 4, .Lf3a_end, nop
+# ASM: .prefalign 4, .Lf3b_end, nop
+# ASM: .prefalign 4, .Lf3c_end, 204
+# OBJ: .text.f3          PROGBITS        0000000000000000 {{[0-9a-f]+}} 000020 00  AX  0   0 16
+# DIS: Disassembly of section .text.f3:
+# DIS:       0: clc
+# DIS-NEXT:  1: clc
+# DIS-NEXT:  2: clc
+# DIS-NEXT:  3: nopl
+# DIS-NEXT:  8: stc
+# DIS:       f: stc
+# DIS-NEXT: 10: clc
+# DIS:      1f: clc
+# DIS-EMPTY:
+.section .text.f3,"ax", at progbits
 .p2align 2
-.prefalign 8
-.prefalign 16
-.prefalign 8
+.prefalign 4, .Lf3a_end, nop
 .rept 3
-nop
+clc
 .endr
-
-// ASM: .section .text.f2between1
-// OBJ: .text.f2between1  PROGBITS        0000000000000000 000054 000008 00  AX  0   0  8
-.section .text.f2between1,"ax", at progbits
-.p2align 2
-.prefalign 8
-.prefalign 16
-.prefalign 8
+.Lf3a_end:
+.prefalign 4, .Lf3b_end, nop
 .rept 8
-nop
+stc
 .endr
-
-// OBJ: .text.f2between2  PROGBITS        0000000000000000 00005c 000009 00  AX  0   0 16
-.section .text.f2between2,"ax", at progbits
-.p2align 2
-.prefalign 8
-.prefalign 16
-.prefalign 8
-.rept 9
-nop
+.Lf3b_end:
+.prefalign 4, .Lf3c_end, 0xcc
+.rept 16
+clc
 .endr
+.Lf3c_end:
+## No-op prefalign
+.prefalign 4, .Lf3d_end, 0xcc
+.Lf3d_end:
+.prefalign 4, .Lf3a_end, 0xcb+1
 
-// OBJ: .text.f2between3  PROGBITS        0000000000000000 000068 000010 00  AX  0   0 16
-.section .text.f2between3,"ax", at progbits
+## Two functions in one section where the second function's padding depends on
+## the first function's size.
+# OBJ: .text.f4          PROGBITS        0000000000000000 {{[0-9a-f]+}} 00001e 00  AX  0   0 16
+# DIS: Disassembly of section .text.f4:
+# DIS:       0: pushq
+# DIS:       7: retq
+# DIS-NEXT:  8: nopl
+# DIS-NEXT: 10: movl
+# DIS:      1d: retq
+# DIS-EMPTY:
+.section .text.f4,"ax", at progbits
 .p2align 2
-.prefalign 8
-.prefalign 16
-.prefalign 8
-.rept 16
-nop
+.prefalign 4, .Lf4a_end, nop
+pushq %rbp
+movq %rsp, %rbp
+xorl %eax, %eax
+popq %rbp
+retq
+.Lf4a_end:
+.prefalign 4, .Lf4b_end, nop
+movl $0, 0
+xorl %eax, %eax
+retq
+.Lf4b_end:
+
+## sh_addralign stays at 32, not downgraded by .prefalign.
+# OBJ: .text.f5          PROGBITS        0000000000000000 {{[0-9a-f]+}} 000003 00  AX  0   0 32
+.section .text.f5,"ax", at progbits
+.p2align 5
+.prefalign 4, .Lf5_end, 0
+.rept 3
+clc
 .endr
+.Lf5_end:
 
-// OBJ: .text.f2gt1       PROGBITS        0000000000000000 000078 000011 00  AX  0   0 16
-.section .text.f2gt1,"ax", at progbits
-.p2align 2
-.prefalign 8
-.prefalign 16
-.prefalign 8
-.rept 17
-nop
+## body_size > PrefAlign: ComputedAlign is clamped to PrefAlign.
+## body=20, pref=8 => ComputedAlign=8, padding=7 zero bytes.
+# OBJ: .text.f6          PROGBITS        0000000000000000 {{[0-9a-f]+}} 00001c 00  AX  0   0  8
+# HEX-NEXT: Contents of section .text.f6:
+# HEX-NEXT:  0000 01030303 03030303 f8f8f8f8 f8f8f8f8 ................
+# HEX-NEXT:  0010 f8f8f8f8 f8f8f8f8 f8f8f8f8 ............
+.section .text.f6,"ax", at progbits
+.byte 1
+.prefalign 3, .Lf6_end, 3
+.rept 20
+clc
 .endr
+.Lf6_end:
 
-// OBJ: .text.f2gt2       PROGBITS        0000000000000000 00008c 000021 00  AX  0   0 16
-.section .text.f2gt2,"ax", at progbits
+## .prefalign in a BSS section with zero fill.
+# ASM: .bss
+# ASM: .prefalign 4, .Lbss_end, 0
+# OBJ: .bss              NOBITS          0000000000000000 {{[0-9a-f]+}} 000004 00  WA  0   0  4
+.bss
 .p2align 2
-.prefalign 8
-.prefalign 16
-.prefalign 8
-.rept 33
-nop
-.endr
+.prefalign 4, .Lbss_end, 0
+.space 4
+.Lbss_end:

diff  --git a/llvm/test/MC/RISCV/prefalign.s b/llvm/test/MC/RISCV/prefalign.s
new file mode 100644
index 0000000000000..488234ee04ba0
--- /dev/null
+++ b/llvm/test/MC/RISCV/prefalign.s
@@ -0,0 +1,34 @@
+# RUN: llvm-mc -filetype=obj -triple riscv64 -mattr=+relax %s -o %t
+# RUN: llvm-readelf -SW %t | FileCheck --check-prefix=OBJ %s
+# RUN: llvm-objdump -d -M no-aliases --no-show-raw-insn %t | FileCheck --check-prefix=DIS %s
+# RUN: llvm-readobj -r %t | FileCheck --check-prefix=RELOC %s
+
+## Two functions in one section with nop fill.
+## f1: body = 12 bytes < 16, ComputedAlign=16, but section start is 16-aligned
+##     so pad = 0
+## f2: body = 32 bytes >= 16, ComputedAlign=16, pad = 4 (one nop at 0xc)
+# OBJ: .text.f1 PROGBITS {{[0-9a-f]+}} {{[0-9a-f]+}} 000030 00 AX 0 0 16
+# DIS:       0: addi a0, zero, 0x1
+# DIS-NEXT:  4: addi a0, zero, 0x2
+# DIS-NEXT:  8: add a0, a0, a1
+## Padding nop for f2
+# DIS-NEXT:  c: addi zero, zero, 0x0
+## f2 starts at 0x10, aligned to 16
+# DIS-NEXT: 10: add a0, a0, a1
+.section .text.f1,"ax", at progbits
+.p2align 2
+.prefalign 4, .Lf1_end, nop
+addi a0, zero, 1
+addi a0, zero, 2
+add a0, a0, a1
+.Lf1_end:
+.prefalign 4, .Lf2_end, nop
+.rept 8
+add a0, a0, a1
+.endr
+.Lf2_end:
+
+## .prefalign does not emit R_RISCV_ALIGN relocations. The padding is fully
+## resolved at assembly time, so no linker adjustment is needed.
+# RELOC: Relocations [
+# RELOC-NEXT: ]

diff  --git a/llvm/test/tools/llvm-nm/X86/demangle.ll b/llvm/test/tools/llvm-nm/X86/demangle.ll
index cab2e09cc1d49..b0e3d75993a9f 100644
--- a/llvm/test/tools/llvm-nm/X86/demangle.ll
+++ b/llvm/test/tools/llvm-nm/X86/demangle.ll
@@ -6,7 +6,7 @@
 
 ; RUN: llc -filetype=obj -mtriple=x86_64-apple-darwin9 -o %t.macho %s
 ; RUN: llvm-nm %t.macho | FileCheck --check-prefix="MACHO-MANGLED" %s
-; RUN: llvm-nm -C %t.macho | FileCheck --check-prefix="DEMANGLED" %s
+; RUN: llvm-nm -C %t.macho | FileCheck --check-prefix="MACHO-DEMANGLED" %s
 
 ; RUN: llc -filetype=obj -mtriple=x86_64-pc-win32 -o %t.coff %s
 ; RUN: llvm-nm %t.coff | FileCheck --check-prefix="COFF-MANGLED" %s
@@ -33,8 +33,8 @@ entry:
   ret i32 1
 }
 
-; MANGLED:       0000000000000020 T _RNvC1a3baz
-; MANGLED:       0000000000000010 T _Z3barf
+; MANGLED:       0000000000000010 T _RNvC1a3baz
+; MANGLED:       0000000000000008 T _Z3barf
 ; MANGLED:       0000000000000000 T _Z3fooi
 
 ; MACHO-MANGLED: 0000000000000020 T __RNvC1a3baz
@@ -45,10 +45,14 @@ entry:
 ; COFF-MANGLED:          00000010 T _Z3barf
 ; COFF-MANGLED:          00000000 T _Z3fooi
 
-; DEMANGLED:     0000000000000020 T a::baz
-; DEMANGLED:     0000000000000010 T bar(float)
+; DEMANGLED:     0000000000000010 T a::baz
+; DEMANGLED:     0000000000000008 T bar(float)
 ; DEMANGLED:     0000000000000000 T foo(int)
 
+; MACHO-DEMANGLED: 0000000000000020 T a::baz
+; MACHO-DEMANGLED: 0000000000000010 T bar(float)
+; MACHO-DEMANGLED: 0000000000000000 T foo(int)
+
 ; COFF-DEMANGLED:        00000020 T a::baz
 ; COFF-DEMANGLED:        00000010 T bar(float)
 ; COFF-DEMANGLED:        00000000 T foo(int)

diff  --git a/llvm/test/tools/llvm-objdump/X86/source-interleave-function-from-debug.test b/llvm/test/tools/llvm-objdump/X86/source-interleave-function-from-debug.test
index edf87c02c692e..96ae656100531 100644
--- a/llvm/test/tools/llvm-objdump/X86/source-interleave-function-from-debug.test
+++ b/llvm/test/tools/llvm-objdump/X86/source-interleave-function-from-debug.test
@@ -13,23 +13,23 @@
 ; CHECK-NEXT:        0: b8 05 00 00 00                movl    $5, %eax
 ; CHECK-NEXT:        5: c3                            retq
 
-; CHECK-NO-DEMANGLE:      0000000000000010 <_ZN3xyz3barEv>:
+; CHECK-NO-DEMANGLE:      0000000000000008 <_ZN3xyz3barEv>:
 ; CHECK-NO-DEMANGLE-NEXT: ; _ZN3xyz3barEv():
-; CHECK-DEMANGLE:         0000000000000010 <xyz::bar()>:
+; CHECK-DEMANGLE:         0000000000000008 <xyz::bar()>:
 ; CHECK-DEMANGLE-NEXT:    ; xyz::bar():
 
 ; CHECK-NEXT: ; /tmp{{/|\\}}src.cc:3
-; CHECK-NEXT:       10: b8 0a 00 00 00                movl    $10, %eax
-; CHECK-NEXT:       15: c3                            retq
+; CHECK-NEXT:        8: b8 0a 00 00 00                movl    $10, %eax
+; CHECK-NEXT:        d: c3                            retq
 
-; CHECK-NO-DEMANGLE:      0000000000000020 <_ZN3xyz3bazEv>:
+; CHECK-NO-DEMANGLE:      0000000000000010 <_ZN3xyz3bazEv>:
 ; CHECK-NO-DEMANGLE-NEXT: ; _ZN3xyz3bazEv():
-; CHECK-DEMANGLE:         0000000000000020 <xyz::baz()>:
+; CHECK-DEMANGLE:         0000000000000010 <xyz::baz()>:
 ; CHECK-DEMANGLE-NEXT:    ; xyz::baz():
 
 ; CHECK-NEXT: ; /tmp{{/|\\}}src.cc:3
-; CHECK-NEXT:       20: b8 14 00 00 00                movl    $20, %eax
-; CHECK-NEXT:       25: c3                            retq
+; CHECK-NEXT:       10: b8 14 00 00 00                movl    $20, %eax
+; CHECK-NEXT:       15: c3                            retq
 
 ;; When symbol information is missing, we can get function names from debug
 ;; info. The IR is intentionally doctored to have 
diff erent names in debug info
@@ -45,13 +45,13 @@
 
 ; STRIPPED:      ; xyz::bar():
 ; STRIPPED-NEXT: ; /tmp{{/|\\}}src.cc:3
-; STRIPPED-NEXT:       10: b8 0a 00 00 00                movl    $10, %eax
-; STRIPPED-NEXT:       15: c3                            retq
+; STRIPPED-NEXT:        8: b8 0a 00 00 00                movl    $10, %eax
+; STRIPPED-NEXT:        d: c3                            retq
 
 ; STRIPPED:      ; xyz::baz():
 ; STRIPPED-NEXT: ; /tmp{{/|\\}}src.cc:3
-; STRIPPED-NEXT:       20: b8 14 00 00 00                movl    $20, %eax
-; STRIPPED-NEXT:       25: c3                            retq
+; STRIPPED-NEXT:       10: b8 14 00 00 00                movl    $20, %eax
+; STRIPPED-NEXT:       15: c3                            retq
 
 ;; IR adapted from:
 ;; $ cat /tmp/src.cc