[llvm-branch-commits] [llvm] [AMDGPU] Add `.amdgpu.info` section for per-function metadata (PR #192384)

Thu Apr 16 14:53:30 PDT 2026

https://github.com/shiltian updated https://github.com/llvm/llvm-project/pull/192384

>From 49230ebef34fa82efde0e830b3cd46aea0268e8d Mon Sep 17 00:00:00 2001
From: Shilei Tian <i at tianshilei.me>
Date: Wed, 15 Apr 2026 22:27:49 -0400
Subject: [PATCH] [AMDGPU] Add `.amdgpu.info` section for per-function metadata

AMDGPU object linking requires the linker to propagate resource usage
(registers, stack, LDS) across translation units. To support this, the compiler
must emit per-function metadata and call graph edges in the relocatable object
so the linker can compute whole-program resource requirements.

This PR introduces a `.amdgpu.info` ELF section using a tagged, length-prefixed
binary format: each entry is encoded as:

```
[kind: u8] [len: u8] [payload: <len> bytes]
```

A function scope is opened by an `INFO_FUNC` entry (containing a symbol
reference), followed by per-function attributes (register counts, flags, private
segment size) and relational edges (direct calls, LDS uses, indirect call
signatures). String data such as function type signatures is stored in a
companion `.amdgpu.strtab` section.

The format is forward-compatible: a consumer that encounters an unknown kind can
skip it by reading the length byte, allowing new entry kinds to be added without
breaking existing toolchains.
---
 llvm/docs/AMDGPUUsage.rst                     | 110 ++++++++++
 .../llvm/Support/AMDGPUObjLinkingInfo.h       |  77 +++++++
 llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp   | 173 ++++++++++++++-
 llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h     |  14 ++
 llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp  |   3 +
 .../AMDGPU/AsmParser/AMDGPUAsmParser.cpp      | 111 ++++++++++
 .../MCTargetDesc/AMDGPUTargetStreamer.cpp     | 198 ++++++++++++++++++
 .../MCTargetDesc/AMDGPUTargetStreamer.h       |  32 +++
 .../AMDGPU/lds-link-time-codegen-agpr.ll      |  26 +++
 .../AMDGPU/lds-link-time-codegen-callgraph.ll |  51 +++++
 ...s-link-time-codegen-cross-tu-addr-taken.ll |  47 +++++
 .../AMDGPU/lds-link-time-codegen-indirect.ll  |  59 ++++++
 .../lds-link-time-codegen-named-barrier.ll    |  30 ++-
 .../AMDGPU/lds-link-time-codegen-typeid.ll    |  83 ++++++++
 .../CodeGen/AMDGPU/lds-link-time-codegen.ll   |  49 ++++-
 llvm/test/MC/AMDGPU/amdgpu-info-err.s         |  35 ++++
 llvm/test/MC/AMDGPU/amdgpu-info-roundtrip.s   | 115 ++++++++++
 17 files changed, 1199 insertions(+), 14 deletions(-)
 create mode 100644 llvm/include/llvm/Support/AMDGPUObjLinkingInfo.h
 create mode 100644 llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-agpr.ll
 create mode 100644 llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-callgraph.ll
 create mode 100644 llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-cross-tu-addr-taken.ll
 create mode 100644 llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-indirect.ll
 create mode 100644 llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-typeid.ll
 create mode 100644 llvm/test/MC/AMDGPU/amdgpu-info-err.s
 create mode 100644 llvm/test/MC/AMDGPU/amdgpu-info-roundtrip.s

diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index fb440bde68b8f..8a406d1a2c990 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -2835,6 +2835,8 @@ An AMDGPU target ELF code object has the standard ELF sections which include:
      ``.strtab``        ``SHT_STRTAB``   *none*
      ``.symtab``        ``SHT_SYMTAB``   *none*
      ``.text``          ``SHT_PROGBITS`` ``SHF_ALLOC`` + ``SHF_EXECINSTR``
+     ``.amdgpu.info``   ``SHT_PROGBITS`` ``SHF_EXCLUDE``
+     ``.amdgpu.strtab`` ``SHT_STRTAB``   ``SHF_EXCLUDE``
      ================== ================ =================================
 
 These sections have their standard meanings (see [ELF]_) and are only generated
@@ -2870,6 +2872,69 @@ if needed.
 ``.amdgpu.kernel.runtime.handle``
   Symbols used for device enqueue.
 
+.. _amdgpu-info-section:
+
+``.amdgpu.info``
+  Per-function metadata for AMDGPU object linking, emitted only in relocatable
+  code objects when object linking is enabled
+  (``-amdgpu-enable-object-linking``).  The linker uses this section to
+  propagate resource usage (registers, stack, LDS) and resolve call graph
+  dependencies across translation units.
+
+  Each entry uses a tagged, length-prefixed binary encoding:
+
+  .. code-block:: none
+
+     [kind: u8] [len: u8] [payload: <len> bytes]
+
+  A function scope is opened by an ``INFO_FUNC`` entry whose payload is an
+  8-byte relocated symbol reference.  All subsequent entries until the next
+  ``INFO_FUNC`` or end of section belong to that scope.  The format is
+  forward-compatible: unknown kinds can be skipped by reading the length byte.
+
+  .. table:: AMDGPU Info Entry Kinds
+     :name: amdgpu-info-entry-kinds-table
+
+     ===== ============================== ==========================================
+     Value Name                           Payload
+     ===== ============================== ==========================================
+     1     ``INFO_FUNC``                  8B symbol ref; opens function scope
+     2     ``INFO_NUM_VGPR``              u32; architectural VGPRs used
+     3     ``INFO_NUM_AGPR``              u32; accumulator VGPRs (AGPRs) used
+     4     ``INFO_NUM_SGPR``              u32; SGPRs explicitly used
+     5     ``INFO_PRIVATE_SEGMENT_SIZE``  u32; private (scratch) segment bytes
+     6     ``INFO_OCCUPANCY``             u32; target occupancy (waves per EU)
+     7     ``INFO_FLAGS``                 u32; ``FuncInfoFlags`` bitfield
+     8     ``INFO_USE``                   8B symbol ref; resource dependency edge
+     9     ``INFO_CALL``                  8B symbol ref; direct call edge
+     10    ``INFO_INDIRECT_CALL``         u32 strtab offset; indirect call type-ID
+     11    ``INFO_TYPEID``                u32 strtab offset; function type-ID
+     ===== ============================== ==========================================
+
+  .. table:: AMDGPU Info Function Flags (``INFO_FLAGS``)
+     :name: amdgpu-info-flags-table
+
+     ===== =========================== ==========================================
+     Bit   Name                        Description
+     ===== =========================== ==========================================
+     0x1   ``FUNC_IS_KERNEL``          Function is a kernel entry point
+     0x2   ``FUNC_USES_VCC``           Function uses the VCC register
+     0x4   ``FUNC_USES_FLAT_SCRATCH``  Function uses flat scratch addressing
+     0x8   ``FUNC_HAS_DYN_STACK``      Function has dynamic stack allocation
+     ===== =========================== ==========================================
+
+  Symbol references (``INFO_FUNC``, ``INFO_USE``, ``INFO_CALL``) generate
+  ``R_AMDGPU_ABS64`` relocations in ``.rela.amdgpu.info``.  String payloads
+  (``INFO_INDIRECT_CALL``, ``INFO_TYPEID``) store a ``u32`` offset into
+  the companion ``.amdgpu.strtab`` section.
+
+  See :ref:`amdgpu-assembler-directive-amdgpu-info` for the assembly syntax.
+
+``.amdgpu.strtab``
+  Null-terminated string pool for the ``.amdgpu.info`` section.  Contains
+  type-ID strings referenced by ``INFO_INDIRECT_CALL`` and ``INFO_TYPEID``
+  entries.  Only present when ``.amdgpu.info`` requires string data.
+
 .. _amdgpu-note-records:
 
 Note Records
@@ -21763,6 +21828,51 @@ semantics described in :ref:`amdgpu-amdhsa-code-object-metadata-v3`,
 
 This directive is terminated by an ``.end_amdgpu_metadata`` directive.
 
+.. _amdgpu-assembler-directive-amdgpu-info:
+
+.amdgpu_info <symbol>
++++++++++++++++++++++
+
+Begins a per-function metadata block for ``<symbol>`` in the ``.amdgpu.info``
+section (see :ref:`amdgpu-info-section`).  Only valid when the OS is ``amdhsa``.
+The block is terminated by an ``.end_amdgpu_info`` directive.
+
+The following sub-directives may appear inside the block:
+
+  .. table:: .amdgpu_info Sub-Directives
+     :name: amdgpu-info-sub-directives-table
+
+     ====================================== ==========================================
+     Directive                              Description
+     ====================================== ==========================================
+     ``.amdgpu_flags`` *value*              ``FuncInfoFlags`` bitfield (u32)
+     ``.amdgpu_num_vgpr`` *value*           Architectural VGPRs used (u32)
+     ``.amdgpu_num_agpr`` *value*           Accumulator VGPRs used (u32)
+     ``.amdgpu_num_sgpr`` *value*           SGPRs explicitly used (u32)
+     ``.amdgpu_private_segment_size`` *n*   Private segment size in bytes (u32)
+     ``.amdgpu_occupancy`` *n*              Target occupancy in waves per EU (u32)
+     ``.amdgpu_use`` *symbol*               Resource dependency (LDS or barrier)
+     ``.amdgpu_call`` *symbol*              Direct call edge to *symbol*
+     ``.amdgpu_indirect_call`` *"type-id"*  Indirect call with given type-ID string
+     ``.amdgpu_typeid`` *"type-id"*         Type-ID for an address-taken function
+     ====================================== ==========================================
+
+Example:
+
+.. code-block:: nasm
+
+   .amdgpu_info my_kernel
+     .amdgpu_flags 7
+     .amdgpu_num_vgpr 32
+     .amdgpu_num_agpr 0
+     .amdgpu_num_sgpr 33
+     .amdgpu_private_segment_size 0
+     .amdgpu_occupancy 65536
+     .amdgpu_call helper
+     .amdgpu_use lds_var
+     .amdgpu_indirect_call "vi"
+   .end_amdgpu_info
+
 .. _amdgpu-amdhsa-assembler-example-v3-onwards:
 
 Code Object V3 and Above Example Source Code
diff --git a/llvm/include/llvm/Support/AMDGPUObjLinkingInfo.h b/llvm/include/llvm/Support/AMDGPUObjLinkingInfo.h
new file mode 100644
index 0000000000000..25fec0680a9dc
--- /dev/null
+++ b/llvm/include/llvm/Support/AMDGPUObjLinkingInfo.h
@@ -0,0 +1,77 @@
+//===--- AMDGPUObjLinkingInfo.h ---------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+/// \file
+/// Enums shared between the AMDGPU backend (LLVM) and the ELF linker (LLD)
+/// for the `.amdgpu.info` object-linking metadata section.
+///
+/// Binary layout of each entry: [kind: u8] [len: u8] [payload: <len> bytes].
+/// Unknown kinds are forward-compatible: a consumer skips them by reading len.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_SUPPORT_AMDGPUOBJECTLINKINGINFO_H
+#define LLVM_SUPPORT_AMDGPUOBJECTLINKINGINFO_H
+
+#include <cstdint>
+
+namespace llvm {
+namespace AMDGPU {
+
+/// Entry kind values for the `.amdgpu.info` section.
+///
+/// Entries that appear between an INFO_FUNC and the next INFO_FUNC (or end of
+/// section) belong to the function scope opened by that INFO_FUNC.
+enum InfoKind : uint8_t {
+  /// Opens a new function scope.  Payload is an 8-byte symbol reference
+  /// (relocated) identifying the function.  All subsequent entries until the
+  /// next INFO_FUNC belong to this function.
+  INFO_FUNC = 1,
+  /// Number of architectural VGPRs used by the function.  [u32]
+  INFO_NUM_VGPR = 2,
+  /// Number of accumulator VGPRs (AGPRs) used by the function.  [u32]
+  INFO_NUM_AGPR = 3,
+  /// Number of SGPRs explicitly used by the function.  [u32]
+  INFO_NUM_SGPR = 4,
+  /// Private (scratch) memory size in bytes required by the function.  [u32]
+  INFO_PRIVATE_SEGMENT_SIZE = 5,
+  /// Target occupancy for the function, expressed as the maximum number of
+  /// waves per EU the compiler expects.  The linker uses this to guide
+  /// resource allocation decisions (e.g. LDS partitioning) so that the
+  /// resulting program can meet the occupancy target.  [u32]
+  INFO_OCCUPANCY = 6,
+  /// Bitfield of FuncInfoFlags properties for the function.  [u32]
+  INFO_FLAGS = 7,
+  /// Dependency edge: the function uses the resource identified by the
+  /// 8-byte relocated symbol (e.g. an LDS variable or named barrier).
+  INFO_USE = 8,
+  /// Direct call edge: the function calls the callee identified by the
+  /// 8-byte relocated symbol.
+  INFO_CALL = 9,
+  /// Indirect call edge: the function contains an indirect call whose
+  /// callee is expected to match the type-ID string at the given
+  /// `.amdgpu.strtab` offset.  [u32]
+  INFO_INDIRECT_CALL = 10,
+  /// Function type ID: tags an address-taken function with a type-ID
+  /// string (at the given `.amdgpu.strtab` offset) so the linker can match
+  /// it against INFO_INDIRECT_CALL entries.  [u32]
+  INFO_TYPEID = 11,
+};
+
+/// Per-function flags packed into INFO_FLAGS entries.
+enum FuncInfoFlags : uint32_t {
+  FUNC_IS_KERNEL = 0x1,
+  FUNC_USES_VCC = 0x2,
+  FUNC_USES_FLAT_SCRATCH = 0x4,
+  FUNC_HAS_DYN_STACK = 0x8,
+};
+
+} // namespace AMDGPU
+} // namespace llvm
+
+#endif // LLVM_SUPPORT_AMDGPUOBJECTLINKINGINFO_H
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
index 718b2b154e251..8012edf25ebde 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
@@ -32,6 +32,7 @@
 #include "Utils/AMDGPUBaseInfo.h"
 #include "Utils/AMDKernelCodeTUtils.h"
 #include "Utils/SIDefinesUtils.h"
+#include "llvm/ADT/StringSet.h"
 #include "llvm/Analysis/OptimizationRemarkEmitter.h"
 #include "llvm/BinaryFormat/ELF.h"
 #include "llvm/CodeGen/AsmPrinterHandler.h"
@@ -537,6 +538,149 @@ void AMDGPUAsmPrinter::validateMCResourceInfo(Function &F) {
   }
 }
 
+static void appendTypeEncoding(std::string &Enc, Type *Ty,
+                               const DataLayout &DL) {
+  if (Ty->isVoidTy()) {
+    Enc += 'v';
+    return;
+  }
+  unsigned Bits = DL.getTypeSizeInBits(Ty);
+  if (Bits <= 32)
+    Enc += 'i';
+  else if (Bits <= 64)
+    Enc += 'l';
+  else
+    Enc.append(divideCeil(Bits, 32), 'i');
+}
+
+static std::string computeTypeId(const FunctionType *FTy,
+                                 const DataLayout &DL) {
+  std::string Enc;
+  appendTypeEncoding(Enc, FTy->getReturnType(), DL);
+  for (Type *ParamTy : FTy->params())
+    appendTypeEncoding(Enc, ParamTy, DL);
+  return Enc;
+}
+
+void AMDGPUAsmPrinter::collectCallEdge(const MachineInstr &MI) {
+  if (!AMDGPUTargetMachine::EnableObjectLinking)
+    return;
+  const GCNSubtarget &STI = MF->getSubtarget<GCNSubtarget>();
+  const SIInstrInfo *TII = STI.getInstrInfo();
+  const MachineOperand *CalleeOp =
+      TII->getNamedOperand(MI, AMDGPU::OpName::callee);
+  if (!CalleeOp || !CalleeOp->isGlobal())
+    return;
+  DirectCallEdges.insert(
+      {getSymbol(&MF->getFunction()), getSymbol(CalleeOp->getGlobal())});
+}
+
+void AMDGPUAsmPrinter::emitCallGraphSection(Module &M) {
+  if (!AMDGPUTargetMachine::EnableObjectLinking)
+    return;
+
+  const NamedMDNode *LDSMD = M.getNamedMetadata("amdgpu.lds.uses");
+  bool HasLdsUses = LDSMD && LDSMD->getNumOperands() > 0;
+
+  const NamedMDNode *BarMD = M.getNamedMetadata("amdgpu.named_barrier.uses");
+  bool HasNamedBarriers = BarMD && BarMD->getNumOperands() > 0;
+
+  // Collect address-taken functions (with type IDs) and indirect call sites.
+  DenseMap<const Function *, std::string> AddrTakenTypeIds;
+  using IndirectCallInfo = std::pair<const Function *, std::string>;
+  SmallVector<IndirectCallInfo, 8> IndirectCalls;
+
+  for (const Function &F : M) {
+    bool IsKernel = AMDGPU::isKernel(F.getCallingConv());
+
+    if (!IsKernel && F.hasAddressTaken(/*PutOffender=*/nullptr,
+                                       /*IgnoreCallbackUses=*/false,
+                                       /*IgnoreAssumeLikeCalls=*/true,
+                                       /*IgnoreLLVMUsed=*/true)) {
+      AddrTakenTypeIds[&F] =
+          computeTypeId(F.getFunctionType(), M.getDataLayout());
+    }
+
+    if (F.isDeclaration())
+      continue;
+
+    StringSet<> SeenTypeIds;
+    for (const BasicBlock &BB : F) {
+      for (const Instruction &I : BB) {
+        const auto *CB = dyn_cast<CallBase>(&I);
+        if (!CB || !CB->isIndirectCall())
+          continue;
+        std::string TId =
+            computeTypeId(CB->getFunctionType(), M.getDataLayout());
+        if (SeenTypeIds.insert(TId).second)
+          IndirectCalls.push_back({&F, std::move(TId)});
+      }
+    }
+  }
+
+  if (FunctionResourceInfos.empty() && DirectCallEdges.empty() && !HasLdsUses &&
+      !HasNamedBarriers && AddrTakenTypeIds.empty() && IndirectCalls.empty())
+    return;
+
+  AMDGPU::InfoSectionData Data;
+
+  DenseSet<MCSymbol *> DefinedSyms;
+
+  for (const PerFunctionResourceInfo &PFRI : FunctionResourceInfos) {
+    MCSymbol *Sym = getSymbol(PFRI.F);
+    DefinedSyms.insert(Sym);
+
+    AMDGPU::FuncInfo FI;
+    FI.Sym = Sym;
+    FI.IsKernel = AMDGPU::isKernel(PFRI.F->getCallingConv());
+    FI.NumArchVGPR = PFRI.RI.NumVGPR;
+    FI.NumAccVGPR = PFRI.RI.NumAGPR;
+    FI.NumSGPR = PFRI.RI.NumExplicitSGPR;
+    FI.PrivateSegmentSize = static_cast<uint32_t>(PFRI.RI.PrivateSegmentSize);
+    FI.Occupancy = PFRI.Occupancy;
+    FI.UsesVCC = PFRI.RI.UsesVCC;
+    FI.UsesFlatScratch = PFRI.RI.UsesFlatScratch;
+    FI.HasDynStack = PFRI.RI.HasDynamicallySizedStack;
+
+    Data.Funcs.push_back(std::move(FI));
+  }
+
+  for (auto &[F, TypeId] : AddrTakenTypeIds) {
+    MCSymbol *Sym = getSymbol(F);
+    Data.TypeIds.push_back({Sym, TypeId});
+  }
+
+  for (auto &[CallerSym, CalleeSym] : DirectCallEdges)
+    Data.Calls.push_back({CallerSym, CalleeSym});
+  DirectCallEdges.clear();
+
+  if (HasLdsUses) {
+    for (const MDNode *N : LDSMD->operands()) {
+      auto *Func = mdconst::extract<Function>(N->getOperand(0));
+      auto *LdsVar = mdconst::extract<GlobalVariable>(N->getOperand(1));
+      Data.Uses.push_back({getSymbol(Func), getSymbol(LdsVar)});
+    }
+  }
+
+  if (HasNamedBarriers) {
+    for (const MDNode *N : BarMD->operands()) {
+      auto *BarVar = mdconst::extract<GlobalVariable>(N->getOperand(0));
+      MCSymbol *BarSym = getSymbol(BarVar);
+      for (unsigned I = 1, E = N->getNumOperands(); I < E; ++I) {
+        auto *Func = mdconst::extract<Function>(N->getOperand(I));
+        Data.Uses.push_back({getSymbol(Func), BarSym});
+      }
+    }
+  }
+
+  for (auto &[Caller, Enc] : IndirectCalls) {
+    MCSymbol *CallerSym = getSymbol(Caller);
+    Data.IndirectCalls.push_back({CallerSym, Enc});
+  }
+
+  getTargetStreamer()->EmitAMDGPUInfo(Data);
+}
+
 bool AMDGPUAsmPrinter::doFinalization(Module &M) {
   // Pad with s_code_end to help tools and guard against instruction prefetch
   // causing stale data in caches. Arguably this should be done by the linker,
@@ -553,6 +697,9 @@ bool AMDGPUAsmPrinter::doFinalization(Module &M) {
     }
   }
 
+  // Emit unified .amdgpu.callgraph section (call graph + resource usage).
+  emitCallGraphSection(M);
+
   // Assign expressions which can only be resolved when all other functions are
   // known.
   RI.finalize(OutContext);
@@ -567,8 +714,10 @@ bool AMDGPUAsmPrinter::doFinalization(Module &M) {
       RI.getMaxSGPRSymbol(OutContext), RI.getMaxNamedBarrierSymbol(OutContext));
   OutStreamer->popSection();
 
-  for (Function &F : M.functions())
-    validateMCResourceInfo(F);
+  if (!AMDGPUTargetMachine::EnableObjectLinking) {
+    for (Function &F : M.functions())
+      validateMCResourceInfo(F);
+  }
 
   RI.reset();
 
@@ -729,6 +878,26 @@ bool AMDGPUAsmPrinter::runOnMachineFunction(MachineFunction &MF) {
 
   RI.gatherResourceInfo(MF, *ResourceUsage, OutContext);
 
+  if (AMDGPUTargetMachine::EnableObjectLinking) {
+    PerFunctionResourceInfo PFRI = {&MF.getFunction(), *ResourceUsage};
+    if (AMDGPU::isKernel(MF.getFunction().getCallingConv())) {
+      unsigned TotalLDS = STM.getLocalMemorySize();
+      const auto [MinWEU, MaxWEU] = AMDGPU::getIntegerPairAttribute(
+          MF.getFunction(), "amdgpu-waves-per-eu", {0, 0}, true);
+      if (MinWEU > 0) {
+        const SIMachineFunctionInfo &SIMFI =
+            *MF.getInfo<SIMachineFunctionInfo>();
+        unsigned FlatWGSizeMax = SIMFI.getFlatWorkGroupSizes().second;
+        unsigned WavesPerWG = divideCeil(FlatWGSizeMax, STM.getWavefrontSize());
+        unsigned MinWGs = divideCeil(MinWEU * STM.getEUsPerCU(), WavesPerWG);
+        PFRI.Occupancy = MinWGs > 0 ? TotalLDS / MinWGs : TotalLDS;
+      } else {
+        PFRI.Occupancy = TotalLDS;
+      }
+    }
+    FunctionResourceInfos.push_back(PFRI);
+  }
+
   if (MFI->isModuleEntryFunction()) {
     getSIProgramInfo(CurrentProgramInfo, MF);
   }
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h
index 31d10fe92ca26..1e3417ae67673 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h
@@ -15,7 +15,9 @@
 #define LLVM_LIB_TARGET_AMDGPU_AMDGPUASMPRINTER_H
 
 #include "AMDGPUMCResourceInfo.h"
+#include "AMDGPUResourceUsageAnalysis.h"
 #include "SIProgramInfo.h"
+#include "llvm/ADT/SetVector.h"
 #include "llvm/CodeGen/AsmPrinter.h"
 
 namespace llvm {
@@ -86,6 +88,18 @@ class AMDGPUAsmPrinter final : public AsmPrinter {
 
   void initTargetStreamer(Module &M);
 
+  void emitCallGraphSection(Module &M);
+  void collectCallEdge(const MachineInstr &MI);
+
+  SetVector<std::pair<MCSymbol *, MCSymbol *>> DirectCallEdges;
+
+  struct PerFunctionResourceInfo {
+    const Function *F;
+    AMDGPUResourceUsageAnalysisImpl::SIFunctionResourceInfo RI;
+    uint32_t Occupancy = 0;
+  };
+  SmallVector<PerFunctionResourceInfo> FunctionResourceInfos;
+
   SmallString<128> getMCExprStr(const MCExpr *Value);
 
   /// Attempts to replace the validation that is missed in getSIProgramInfo due
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp b/llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp
index 56592bde3b1c7..3c89e3d287b32 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp
@@ -320,6 +320,9 @@ static void emitVGPRBlockComment(const MachineInstr *MI, const SIInstrInfo *TII,
 }
 
 void AMDGPUAsmPrinter::emitInstruction(const MachineInstr *MI) {
+  if (MI->isCall())
+    collectCallEdge(*MI);
+
   // FIXME: Enable feature predicate checks once all the test pass.
   // AMDGPU_MC::verifyInstructionPredicates(MI->getOpcode(),
   //                                        getSubtargetInfo().getFeatureBits());
diff --git a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
index 3777adc9790e8..80112a944f5ba 100644
--- a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
+++ b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
@@ -1382,6 +1382,9 @@ class AMDGPUAsmParser : public MCTargetAsmParser {
     return getRegBitWidth(RCID) / 8;
   }
 
+  AMDGPU::InfoSectionData InfoData;
+  bool HasInfoData = false;
+
 private:
   void createConstantSymbol(StringRef Id, int64_t Val);
 
@@ -1422,6 +1425,7 @@ class AMDGPUAsmParser : public MCTargetAsmParser {
   bool ParseDirectivePALMetadataBegin();
   bool ParseDirectivePALMetadata();
   bool ParseDirectiveAMDGPULDS();
+  bool ParseDirectiveAMDGPUInfo();
 
   /// Common code to parse out a block of text (typically YAML) between start and
   /// end directives.
@@ -1676,6 +1680,7 @@ class AMDGPUAsmParser : public MCTargetAsmParser {
                                uint64_t &ErrorInfo,
                                bool MatchingInlineAsm) override;
   bool ParseDirective(AsmToken DirectiveID) override;
+  void onEndOfFile() override;
   ParseStatus parseOperand(OperandVector &Operands, StringRef Mnemonic,
                            OperandMode Mode = OperandMode_Default);
   StringRef parseMnemonicSuffix(StringRef Name);
@@ -6741,6 +6746,109 @@ bool AMDGPUAsmParser::ParseDirectiveAMDGPULDS() {
   return false;
 }
 
+bool AMDGPUAsmParser::ParseDirectiveAMDGPUInfo() {
+  if (getParser().checkForValidSection())
+    return true;
+
+  StringRef FuncName;
+  if (getParser().parseIdentifier(FuncName))
+    return TokError("expected symbol name after .amdgpu_info");
+
+  MCSymbol *FuncSym = getContext().getOrCreateSymbol(FuncName);
+  AMDGPU::FuncInfo FI;
+  FI.Sym = FuncSym;
+  bool HasScalarAttrs = false;
+
+  while (true) {
+    while (trySkipToken(AsmToken::EndOfStatement))
+      ;
+
+    StringRef ID;
+    SMLoc IDLoc = getLoc();
+    if (!parseId(ID, "expected directive or .end_amdgpu_info"))
+      return true;
+
+    if (ID == ".end_amdgpu_info")
+      break;
+
+    if (ID == ".amdgpu_flags") {
+      int64_t Val;
+      if (getParser().parseAbsoluteExpression(Val))
+        return true;
+      uint32_t Flags = static_cast<uint32_t>(Val);
+      FI.IsKernel = !!(Flags & AMDGPU::FUNC_IS_KERNEL);
+      FI.UsesVCC = !!(Flags & AMDGPU::FUNC_USES_VCC);
+      FI.UsesFlatScratch = !!(Flags & AMDGPU::FUNC_USES_FLAT_SCRATCH);
+      FI.HasDynStack = !!(Flags & AMDGPU::FUNC_HAS_DYN_STACK);
+      HasScalarAttrs = true;
+    } else if (ID == ".amdgpu_num_vgpr") {
+      int64_t Val;
+      if (getParser().parseAbsoluteExpression(Val))
+        return true;
+      FI.NumArchVGPR = static_cast<uint32_t>(Val);
+      HasScalarAttrs = true;
+    } else if (ID == ".amdgpu_num_agpr") {
+      int64_t Val;
+      if (getParser().parseAbsoluteExpression(Val))
+        return true;
+      FI.NumAccVGPR = static_cast<uint32_t>(Val);
+      HasScalarAttrs = true;
+    } else if (ID == ".amdgpu_num_sgpr") {
+      int64_t Val;
+      if (getParser().parseAbsoluteExpression(Val))
+        return true;
+      FI.NumSGPR = static_cast<uint32_t>(Val);
+      HasScalarAttrs = true;
+    } else if (ID == ".amdgpu_private_segment_size") {
+      int64_t Val;
+      if (getParser().parseAbsoluteExpression(Val))
+        return true;
+      FI.PrivateSegmentSize = static_cast<uint32_t>(Val);
+      HasScalarAttrs = true;
+    } else if (ID == ".amdgpu_occupancy") {
+      int64_t Val;
+      if (getParser().parseAbsoluteExpression(Val))
+        return true;
+      FI.Occupancy = static_cast<uint32_t>(Val);
+      HasScalarAttrs = true;
+    } else if (ID == ".amdgpu_use") {
+      StringRef ResName;
+      if (getParser().parseIdentifier(ResName))
+        return TokError("expected resource symbol for .amdgpu_use");
+      InfoData.Uses.push_back(
+          {FuncSym, getContext().getOrCreateSymbol(ResName)});
+    } else if (ID == ".amdgpu_call") {
+      StringRef DstName;
+      if (getParser().parseIdentifier(DstName))
+        return TokError("expected callee symbol for .amdgpu_call");
+      InfoData.Calls.push_back(
+          {FuncSym, getContext().getOrCreateSymbol(DstName)});
+    } else if (ID == ".amdgpu_indirect_call") {
+      std::string TypeId;
+      if (getParser().parseEscapedString(TypeId))
+        return TokError("expected type ID string for .amdgpu_indirect_call");
+      InfoData.IndirectCalls.push_back({FuncSym, std::move(TypeId)});
+    } else if (ID == ".amdgpu_typeid") {
+      std::string TypeId;
+      if (getParser().parseEscapedString(TypeId))
+        return TokError("expected type ID string for .amdgpu_typeid");
+      InfoData.TypeIds.push_back({FuncSym, std::move(TypeId)});
+    } else {
+      return Error(IDLoc, "unknown .amdgpu_info directive '" + ID + "'");
+    }
+  }
+
+  if (HasScalarAttrs)
+    InfoData.Funcs.push_back(std::move(FI));
+  HasInfoData = true;
+  return false;
+}
+
+void AMDGPUAsmParser::onEndOfFile() {
+  if (HasInfoData)
+    getTargetStreamer().EmitAMDGPUInfo(InfoData);
+}
+
 bool AMDGPUAsmParser::ParseDirective(AsmToken DirectiveID) {
   StringRef IDVal = DirectiveID.getString();
 
@@ -6778,6 +6886,9 @@ bool AMDGPUAsmParser::ParseDirective(AsmToken DirectiveID) {
   if (IDVal == ".amdgpu_lds")
     return ParseDirectiveAMDGPULDS();
 
+  if (IDVal == ".amdgpu_info")
+    return ParseDirectiveAMDGPUInfo();
+
   if (IDVal == PALMD::AssemblerDirectiveBegin)
     return ParseDirectivePALMetadataBegin();
 
diff --git a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp
index d276bab0ff3be..05cd78995041b 100644
--- a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp
+++ b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp
@@ -664,6 +664,75 @@ void AMDGPUTargetAsmStreamer::EmitAmdhsaKernelDescriptor(
   OS << "\t.end_amdhsa_kernel\n";
 }
 
+void AMDGPUTargetAsmStreamer::EmitAMDGPUInfo(
+    const AMDGPU::InfoSectionData &Data) {
+  // Group edges by source function symbol.
+  DenseMap<MCSymbol *, SmallVector<MCSymbol *, 2>> FuncUses;
+  DenseMap<MCSymbol *, SmallVector<MCSymbol *, 4>> FuncCalls;
+  DenseMap<MCSymbol *, SmallVector<StringRef, 2>> FuncIndirectCalls;
+  DenseMap<MCSymbol *, SmallVector<StringRef, 1>> FuncTypeIds;
+  for (const auto &[Func, Res] : Data.Uses)
+    FuncUses[Func].push_back(Res);
+  for (const auto &[Src, Dst] : Data.Calls)
+    FuncCalls[Src].push_back(Dst);
+  for (const auto &[Func, TypeId] : Data.IndirectCalls)
+    FuncIndirectCalls[Func].push_back(TypeId);
+  for (const auto &[Sym, TypeId] : Data.TypeIds)
+    FuncTypeIds[Sym].push_back(TypeId);
+
+  DenseSet<MCSymbol *> Emitted;
+  auto EmitScope = [&](MCSymbol *Sym, const AMDGPU::FuncInfo *Info) {
+    if (!Emitted.insert(Sym).second)
+      return;
+    OS << "\t.amdgpu_info " << Sym->getName() << '\n';
+    if (Info) {
+      uint32_t Flags = 0;
+      if (Info->IsKernel)
+        Flags |= AMDGPU::FUNC_IS_KERNEL;
+      if (Info->UsesVCC)
+        Flags |= AMDGPU::FUNC_USES_VCC;
+      if (Info->UsesFlatScratch)
+        Flags |= AMDGPU::FUNC_USES_FLAT_SCRATCH;
+      if (Info->HasDynStack)
+        Flags |= AMDGPU::FUNC_HAS_DYN_STACK;
+      OS << "\t\t.amdgpu_flags " << Flags << '\n';
+      OS << "\t\t.amdgpu_num_vgpr " << Info->NumArchVGPR << '\n';
+      if (Info->NumAccVGPR)
+        OS << "\t\t.amdgpu_num_agpr " << Info->NumAccVGPR << '\n';
+      OS << "\t\t.amdgpu_num_sgpr " << Info->NumSGPR << '\n';
+      OS << "\t\t.amdgpu_private_segment_size " << Info->PrivateSegmentSize
+         << '\n';
+      OS << "\t\t.amdgpu_occupancy " << Info->Occupancy << '\n';
+    }
+    if (auto It = FuncUses.find(Sym); It != FuncUses.end())
+      for (MCSymbol *Res : It->second)
+        OS << "\t\t.amdgpu_use " << Res->getName() << '\n';
+    if (auto It = FuncCalls.find(Sym); It != FuncCalls.end())
+      for (MCSymbol *Dst : It->second)
+        OS << "\t\t.amdgpu_call " << Dst->getName() << '\n';
+    if (auto It = FuncIndirectCalls.find(Sym); It != FuncIndirectCalls.end())
+      for (StringRef TypeId : It->second)
+        OS << "\t\t.amdgpu_indirect_call \"" << TypeId << "\"\n";
+    if (auto It = FuncTypeIds.find(Sym); It != FuncTypeIds.end())
+      for (StringRef TypeId : It->second)
+        OS << "\t\t.amdgpu_typeid \"" << TypeId << "\"\n";
+    OS << "\t.end_amdgpu_info\n\n";
+  };
+
+  for (const FuncInfo &Func : Data.Funcs)
+    EmitScope(Func.Sym, &Func);
+
+  // Emit scopes for functions that only appear in edges (e.g. typeid-only).
+  for (const auto &[Sym, TypeId] : Data.TypeIds)
+    EmitScope(Sym, nullptr);
+  for (const auto &[Sym, Res] : Data.Uses)
+    EmitScope(Sym, nullptr);
+  for (const auto &[Sym, Dst] : Data.Calls)
+    EmitScope(Sym, nullptr);
+  for (const auto &[Sym, TypeId] : Data.IndirectCalls)
+    EmitScope(Sym, nullptr);
+}
+
 //===----------------------------------------------------------------------===//
 // AMDGPUTargetELFStreamer
 //===----------------------------------------------------------------------===//
@@ -1065,3 +1134,132 @@ void AMDGPUTargetELFStreamer::EmitAmdhsaKernelDescriptor(
   for (uint32_t i = 0; i < sizeof(amdhsa::kernel_descriptor_t::reserved3); ++i)
     Streamer.emitInt8(0u);
 }
+
+void AMDGPUTargetELFStreamer::EmitAMDGPUInfo(
+    const AMDGPU::InfoSectionData &Data) {
+  MCELFStreamer &S = getStreamer();
+  MCContext &Context = S.getContext();
+
+  StringMap<uint32_t> StrPoolOffsets;
+  SmallString<128> StrPool;
+  auto getOrAddString = [&](StringRef Str) -> uint32_t {
+    if (Str.empty())
+      return UINT32_MAX;
+    auto [It, Inserted] = StrPoolOffsets.try_emplace(Str, 0);
+    if (Inserted) {
+      It->second = StrPool.size();
+      StrPool.append(Str);
+      StrPool.push_back('\0');
+    }
+    return It->second;
+  };
+
+  // Pre-resolve string table offsets.
+  SmallVector<uint32_t, 4> ICallTypeIdOffsets;
+  for (const auto &[Func, TypeId] : Data.IndirectCalls)
+    ICallTypeIdOffsets.push_back(getOrAddString(TypeId));
+  SmallVector<uint32_t, 4> TypeIdOffsets;
+  for (const auto &[Sym, TypeId] : Data.TypeIds)
+    TypeIdOffsets.push_back(getOrAddString(TypeId));
+
+  // Group edges by source function symbol.
+  DenseMap<MCSymbol *, SmallVector<MCSymbol *, 2>> FuncUses;
+  DenseMap<MCSymbol *, SmallVector<MCSymbol *, 4>> FuncCalls;
+  DenseMap<MCSymbol *, SmallVector<uint32_t, 2>> FuncIndirectCalls;
+  DenseMap<MCSymbol *, SmallVector<uint32_t, 1>> FuncTypeIds;
+  for (const auto &[Func, Res] : Data.Uses)
+    FuncUses[Func].push_back(Res);
+  for (const auto &[Src, Dst] : Data.Calls)
+    FuncCalls[Src].push_back(Dst);
+  for (uint32_t I = 0, E = Data.IndirectCalls.size(); I < E; ++I) {
+    FuncIndirectCalls[Data.IndirectCalls[I].first].push_back(
+        ICallTypeIdOffsets[I]);
+  }
+  for (uint32_t I = 0, E = Data.TypeIds.size(); I < E; ++I)
+    FuncTypeIds[Data.TypeIds[I].first].push_back(TypeIdOffsets[I]);
+
+  // Helpers to emit kind+len tagged entries.
+  auto EmitU32Entry = [&](uint8_t Kind, uint32_t Val) {
+    S.emitInt8(Kind);
+    S.emitInt8(4);
+    S.emitInt32(Val);
+  };
+  auto EmitSymEntry = [&](uint8_t Kind, MCSymbol *Sym) {
+    S.emitInt8(Kind);
+    S.emitInt8(8);
+    S.emitValue(MCSymbolRefExpr::create(Sym, Context), 8);
+  };
+
+  S.pushSection();
+  MCSectionELF *InfoSec = Context.getELFSection(
+      ".amdgpu.info", ELF::SHT_PROGBITS, ELF::SHF_EXCLUDE);
+  S.switchSection(InfoSec);
+
+  DenseSet<MCSymbol *> Emitted;
+  auto EmitScope = [&](MCSymbol *Sym, const AMDGPU::FuncInfo *Info) {
+    if (!Emitted.insert(Sym).second)
+      return;
+
+    EmitSymEntry(AMDGPU::INFO_FUNC, Sym);
+
+    if (Info) {
+      uint32_t Flags = 0;
+      if (Info->IsKernel)
+        Flags |= AMDGPU::FUNC_IS_KERNEL;
+      if (Info->UsesVCC)
+        Flags |= AMDGPU::FUNC_USES_VCC;
+      if (Info->UsesFlatScratch)
+        Flags |= AMDGPU::FUNC_USES_FLAT_SCRATCH;
+      if (Info->HasDynStack)
+        Flags |= AMDGPU::FUNC_HAS_DYN_STACK;
+      EmitU32Entry(AMDGPU::INFO_FLAGS, Flags);
+      EmitU32Entry(AMDGPU::INFO_NUM_VGPR, Info->NumArchVGPR);
+      EmitU32Entry(AMDGPU::INFO_PRIVATE_SEGMENT_SIZE, Info->PrivateSegmentSize);
+      EmitU32Entry(AMDGPU::INFO_OCCUPANCY, Info->Occupancy);
+      // The following entries are only emitted if the function actually uses
+      // them because they are not always present on all architectures.
+      if (Info->NumAccVGPR)
+        EmitU32Entry(AMDGPU::INFO_NUM_AGPR, Info->NumAccVGPR);
+      EmitU32Entry(AMDGPU::INFO_NUM_SGPR, Info->NumSGPR);
+    }
+
+    if (auto It = FuncUses.find(Sym); It != FuncUses.end()) {
+      for (MCSymbol *Res : It->second)
+        EmitSymEntry(AMDGPU::INFO_USE, Res);
+    }
+    if (auto It = FuncCalls.find(Sym); It != FuncCalls.end()) {
+      for (MCSymbol *Dst : It->second)
+        EmitSymEntry(AMDGPU::INFO_CALL, Dst);
+    }
+    if (auto It = FuncIndirectCalls.find(Sym); It != FuncIndirectCalls.end()) {
+      for (uint32_t Off : It->second)
+        EmitU32Entry(AMDGPU::INFO_INDIRECT_CALL, Off);
+    }
+    if (auto It = FuncTypeIds.find(Sym); It != FuncTypeIds.end()) {
+      for (uint32_t Off : It->second)
+        EmitU32Entry(AMDGPU::INFO_TYPEID, Off);
+    }
+  };
+
+  for (const auto &Func : Data.Funcs)
+    EmitScope(Func.Sym, &Func);
+
+  // Emit scopes for functions that only appear in edges.
+  for (const auto &[Sym, TypeId] : Data.TypeIds)
+    EmitScope(Sym, /*Info=*/nullptr);
+  for (const auto &[Sym, Res] : Data.Uses)
+    EmitScope(Sym, /*Info=*/nullptr);
+  for (const auto &[Sym, Dst] : Data.Calls)
+    EmitScope(Sym, /*Info=*/nullptr);
+  for (const auto &[Sym, TypeId] : Data.IndirectCalls)
+    EmitScope(Sym, /*Info=*/nullptr);
+
+  if (!StrPool.empty()) {
+    MCSectionELF *Sec = Context.getELFSection(".amdgpu.strtab", ELF::SHT_STRTAB,
+                                              ELF::SHF_EXCLUDE);
+    S.switchSection(Sec);
+    S.emitBytes(StrPool);
+  }
+
+  S.popSection();
+}
diff --git a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.h b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.h
index 3a0d8dcd2d27c..faddd3b160079 100644
--- a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.h
+++ b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.h
@@ -11,7 +11,11 @@
 
 #include "Utils/AMDGPUBaseInfo.h"
 #include "Utils/AMDGPUPALMetadata.h"
+#include "llvm/ADT/SmallVector.h"
 #include "llvm/MC/MCStreamer.h"
+#include "llvm/Support/AMDGPUObjLinkingInfo.h"
+#include <string>
+#include <utility>
 
 namespace llvm {
 
@@ -26,6 +30,28 @@ struct MCKernelDescriptor;
 namespace HSAMD {
 struct Metadata;
 }
+
+struct FuncInfo {
+  MCSymbol *Sym = nullptr;
+  bool IsKernel = false;
+  uint32_t NumArchVGPR = 0;
+  uint32_t NumAccVGPR = 0;
+  uint32_t NumSGPR = 0;
+  uint32_t PrivateSegmentSize = 0;
+  uint32_t Occupancy = 0;
+  bool UsesVCC = false;
+  bool UsesFlatScratch = false;
+  bool HasDynStack = false;
+};
+
+struct InfoSectionData {
+  SmallVector<FuncInfo, 8> Funcs;
+  SmallVector<std::pair<MCSymbol *, MCSymbol *>, 4> Uses;
+  SmallVector<std::pair<MCSymbol *, MCSymbol *>, 8> Calls;
+  SmallVector<std::pair<MCSymbol *, std::string>, 4> IndirectCalls;
+  SmallVector<std::pair<MCSymbol *, std::string>, 4> TypeIds;
+};
+
 } // namespace AMDGPU
 
 class AMDGPUTargetStreamer : public MCTargetStreamer {
@@ -104,6 +130,8 @@ class AMDGPUTargetStreamer : public MCTargetStreamer {
                              const MCExpr *ReserveVCC,
                              const MCExpr *ReserveFlatScr) {}
 
+  virtual void EmitAMDGPUInfo(const AMDGPU::InfoSectionData &Data) {}
+
   static StringRef getArchNameFromElfMach(unsigned ElfMach);
   static unsigned getElfMach(StringRef GPU);
 
@@ -168,6 +196,8 @@ class AMDGPUTargetAsmStreamer final : public AMDGPUTargetStreamer {
                              const MCExpr *NextVGPR, const MCExpr *NextSGPR,
                              const MCExpr *ReserveVCC,
                              const MCExpr *ReserveFlatScr) override;
+
+  void EmitAMDGPUInfo(const AMDGPU::InfoSectionData &Data) override;
 };
 
 class AMDGPUTargetELFStreamer final : public AMDGPUTargetStreamer {
@@ -221,6 +251,8 @@ class AMDGPUTargetELFStreamer final : public AMDGPUTargetStreamer {
                              const MCExpr *NextVGPR, const MCExpr *NextSGPR,
                              const MCExpr *ReserveVCC,
                              const MCExpr *ReserveFlatScr) override;
+
+  void EmitAMDGPUInfo(const AMDGPU::InfoSectionData &Data) override;
 };
 }
 #endif
diff --git a/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-agpr.ll b/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-agpr.ll
new file mode 100644
index 0000000000000..441b627fa1c96
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-agpr.ll
@@ -0,0 +1,26 @@
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 -amdgpu-enable-object-linking < %s | FileCheck %s
+
+; Verify that .amdgpu_num_agpr IS emitted when AGPRs are used on a target
+; that supports them (gfx908 has a separate AGPR file).
+
+declare <4 x float> @llvm.amdgcn.mfma.f32.4x4x1f32(float, float, <4 x float>, i32, i32, i32)
+
+define void @func_with_agpr(float %a, float %b, ptr addrspace(1) %out) {
+  %result = call <4 x float> @llvm.amdgcn.mfma.f32.4x4x1f32(float %a, float %b, <4 x float> zeroinitializer, i32 0, i32 0, i32 0)
+  store <4 x float> %result, ptr addrspace(1) %out
+  ret void
+}
+
+define amdgpu_kernel void @kern(float %a, float %b, ptr addrspace(1) %out) {
+  call void @func_with_agpr(float %a, float %b, ptr addrspace(1) %out)
+  ret void
+}
+
+!llvm.module.flags = !{!0}
+!0 = !{i32 1, !"amdgpu-link-time-lds", i32 1}
+
+; CHECK:      .amdgpu_info func_with_agpr
+; CHECK:        .amdgpu_num_agpr {{[1-9][0-9]*}}
+; CHECK:      .end_amdgpu_info
+; CHECK:      .amdgpu_info kern
+; CHECK:      .end_amdgpu_info
diff --git a/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-callgraph.ll b/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-callgraph.ll
new file mode 100644
index 0000000000000..03aae43cbb00f
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-callgraph.ll
@@ -0,0 +1,51 @@
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -amdgpu-enable-object-linking -filetype=obj < %s | llvm-readobj -r --sections - | FileCheck %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -amdgpu-enable-object-linking -filetype=asm < %s | FileCheck %s --check-prefix=ASM --implicit-check-not=.amdgpu_num_agpr
+
+; Test that the unified .amdgpu.info section (.amdgpu_info blocks in assembly) is
+; emitted with correct relocations when object linking is enabled.
+
+declare void @extern_func()
+
+; The .amdgpu.info section should exist as SHT_PROGBITS with SHF_EXCLUDE.
+; CHECK:      Section {
+; CHECK:        Name: .amdgpu.info
+; CHECK:        Type: SHT_PROGBITS
+; CHECK:        Flags [
+; CHECK:          SHF_EXCLUDE
+; CHECK:        ]
+
+; Symbol references in the binary resource metadata still use R_AMDGPU_ABS64 relocations.
+; CHECK-DAG:    R_AMDGPU_ABS64 my_kernel
+; CHECK-DAG:    R_AMDGPU_ABS64 helper
+; CHECK-DAG:    R_AMDGPU_ABS64 extern_func
+
+; Assembly: per-function .amdgpu_info blocks (target flags derived from e_flags).
+; ASM-DAG:    .amdgpu_info helper
+; ASM-DAG:    .amdgpu_flags {{[0-9]+}}
+; ASM-DAG:    .amdgpu_num_vgpr {{[0-9]+}}
+; ASM-DAG:    .amdgpu_num_sgpr {{[0-9]+}}
+; ASM-DAG:    .amdgpu_private_segment_size {{[0-9]+}}
+; ASM-DAG:    .amdgpu_occupancy {{[0-9]+}}
+; ASM-DAG:    .amdgpu_call extern_func
+; ASM-DAG:    .end_amdgpu_info
+; ASM-DAG:    .amdgpu_info my_kernel
+; ASM-DAG:    .amdgpu_flags {{[0-9]+}}
+; ASM-DAG:    .amdgpu_num_vgpr {{[0-9]+}}
+; ASM-DAG:    .amdgpu_num_sgpr {{[0-9]+}}
+; ASM-DAG:    .amdgpu_private_segment_size {{[0-9]+}}
+; ASM-DAG:    .amdgpu_occupancy {{[0-9]+}}
+; ASM-DAG:    .amdgpu_call helper
+; ASM-DAG:    .end_amdgpu_info
+
+define void @helper() {
+  call void @extern_func()
+  ret void
+}
+
+define amdgpu_kernel void @my_kernel() {
+  call void @helper()
+  ret void
+}
+
+!llvm.module.flags = !{!0}
+!0 = !{i32 1, !"amdgpu-link-time-lds", i32 1}
diff --git a/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-cross-tu-addr-taken.ll b/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-cross-tu-addr-taken.ll
new file mode 100644
index 0000000000000..a9cef04b8c03c
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-cross-tu-addr-taken.ll
@@ -0,0 +1,47 @@
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -amdgpu-enable-object-linking -filetype=obj < %s | llvm-readobj -r - | FileCheck %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -amdgpu-enable-object-linking -filetype=asm < %s | FileCheck %s --check-prefix=ASM --implicit-check-not=.amdgpu_num_agpr
+
+; Test that address-taken on a declaration (cross-TU scenario) emits a
+; .amdgpu_typeid directive for the external function. The function
+; is defined in another TU but its address is taken here.
+
+declare void @external_func(i32)
+
+define void @taker(ptr %p) {
+  store ptr @external_func, ptr %p
+  ret void
+}
+
+define amdgpu_kernel void @kern() {
+  %p = alloca ptr, addrspace(5)
+  call void @taker(ptr addrspace(5) %p)
+  ret void
+}
+
+!llvm.module.flags = !{!0}
+!0 = !{i32 1, !"amdgpu-link-time-lds", i32 1}
+
+; CHECK-DAG: R_AMDGPU_ABS64 external_func
+; CHECK-DAG: R_AMDGPU_ABS64 kern
+; CHECK-DAG: R_AMDGPU_ABS64 taker
+
+; Assembly: per-function .amdgpu_info blocks (target flags derived from e_flags).
+; ASM-DAG:    .amdgpu_info taker
+; ASM-DAG:    .amdgpu_flags 0
+; ASM-DAG:    .amdgpu_num_vgpr {{[0-9]+}}
+; ASM-DAG:    .amdgpu_num_sgpr {{[0-9]+}}
+; ASM-DAG:    .amdgpu_private_segment_size {{[0-9]+}}
+; ASM-DAG:    .amdgpu_occupancy {{[0-9]+}}
+; ASM-DAG:    .amdgpu_typeid "vl"
+; ASM-DAG:    .end_amdgpu_info
+; ASM-DAG:    .amdgpu_info kern
+; ASM-DAG:    .amdgpu_flags 5
+; ASM-DAG:    .amdgpu_num_vgpr {{[0-9]+}}
+; ASM-DAG:    .amdgpu_num_sgpr {{[0-9]+}}
+; ASM-DAG:    .amdgpu_private_segment_size {{[0-9]+}}
+; ASM-DAG:    .amdgpu_occupancy {{[0-9]+}}
+; ASM-DAG:    .amdgpu_call taker
+; ASM-DAG:    .end_amdgpu_info
+; ASM-DAG:    .amdgpu_info external_func
+; ASM-DAG:    .amdgpu_typeid "vi"
+; ASM-DAG:    .end_amdgpu_info
diff --git a/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-indirect.ll b/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-indirect.ll
new file mode 100644
index 0000000000000..757cdefcad724
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-indirect.ll
@@ -0,0 +1,59 @@
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -amdgpu-enable-object-linking -filetype=obj < %s | llvm-readobj -r --sections - | FileCheck %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -amdgpu-enable-object-linking -filetype=asm < %s | FileCheck %s --check-prefix=ASM --implicit-check-not=.amdgpu_num_agpr
+
+; Test that the unified .amdgpu.info section includes address-taken metadata and
+; .amdgpu_indirect_call for functions involved in indirect calling.
+;
+; @callee_vi: void(i32) -- address taken, prototype encoding "vi"
+; @caller:    has indirect call to void(i32) -- icall encoding "vi"
+; @my_kernel: kernel that passes @callee_vi as a function pointer to @caller
+
+define void @callee_vi(i32 %x) {
+  ret void
+}
+
+define void @caller(ptr %fptr) {
+  call void %fptr(i32 1)
+  ret void
+}
+
+define amdgpu_kernel void @my_kernel() {
+  call void @caller(ptr @callee_vi)
+  ret void
+}
+
+!llvm.module.flags = !{!0}
+!0 = !{i32 1, !"amdgpu-link-time-lds", i32 1}
+
+; CHECK:      Section {
+; CHECK:        Name: .amdgpu.info
+; CHECK:        Type: SHT_PROGBITS
+; CHECK:        Flags [
+; CHECK:          SHF_EXCLUDE
+; CHECK:        ]
+
+; CHECK-DAG:    R_AMDGPU_ABS64 my_kernel
+; CHECK-DAG:    R_AMDGPU_ABS64 caller
+; CHECK-DAG:    R_AMDGPU_ABS64 callee_vi
+
+; ASM-DAG:    .amdgpu_info callee_vi
+; ASM-DAG:    .amdgpu_flags 0
+; ASM-DAG:    .amdgpu_num_vgpr {{[0-9]+}}
+; ASM-DAG:    .amdgpu_num_sgpr {{[0-9]+}}
+; ASM-DAG:    .amdgpu_private_segment_size {{[0-9]+}}
+; ASM-DAG:    .amdgpu_occupancy {{[0-9]+}}
+; ASM-DAG:    .amdgpu_typeid "vi"
+; ASM-DAG:    .end_amdgpu_info
+; ASM-DAG:    .amdgpu_info caller
+; ASM-DAG:    .amdgpu_flags 14
+; ASM-DAG:    .amdgpu_num_vgpr {{[0-9]+}}
+; ASM-DAG:    .amdgpu_indirect_call "vi"
+; ASM-DAG:    .end_amdgpu_info
+; ASM-DAG:    .amdgpu_info my_kernel
+; ASM-DAG:    .amdgpu_flags 7
+; ASM-DAG:    .amdgpu_num_vgpr {{[0-9]+}}
+; ASM-DAG:    .amdgpu_num_sgpr {{[0-9]+}}
+; ASM-DAG:    .amdgpu_private_segment_size {{[0-9]+}}
+; ASM-DAG:    .amdgpu_occupancy {{[0-9]+}}
+; ASM-DAG:    .amdgpu_call caller
+; ASM-DAG:    .end_amdgpu_info
diff --git a/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-named-barrier.ll b/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-named-barrier.ll
index 46d4c8db00f06..bf1b161416a72 100644
--- a/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-named-barrier.ll
+++ b/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-named-barrier.ll
@@ -1,9 +1,11 @@
-; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1250 -amdgpu-enable-object-linking < %s | FileCheck %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1250 -amdgpu-enable-object-linking < %s | FileCheck %s --implicit-check-not=.amdgpu_num_agpr
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1250 -amdgpu-enable-object-linking -filetype=obj < %s | llvm-readobj -r --sections - | FileCheck %s --check-prefix=ELF
 
 ; Verify object linking codegen for named barriers on GFX1250:
 ; 1. Barrier instructions use M0-based forms with relocation references
-; 2. group_segment_fixed_size = 0 (linker patches it)
-; 3. Named barrier is emitted as an SHN_AMDGPU_LDS symbol (.amdgpu_lds)
+; 2. .amdgpu.info section records the barrier as an LDS use edge
+; 3. group_segment_fixed_size = 0 (linker patches it)
+; 4. Named barrier is emitted as an SHN_AMDGPU_LDS symbol (.amdgpu_lds)
 
 @bar = internal addrspace(3) global [2 x target("amdgcn.named.barrier", 0)] poison
 
@@ -13,12 +15,30 @@
 ; CHECK: s_barrier_join m0
 ; CHECK: s_barrier_wait 1
 
-; KD: group_segment_fixed_size = 0 (linker will patch).
 ; CHECK:       .amdhsa_group_segment_fixed_size 0
 
-; LDS symbol declaration
+; CHECK:      .amdgpu_info kernel
+; CHECK:        .amdgpu_flags {{[0-9]+}}
+; CHECK:        .amdgpu_num_vgpr {{[0-9]+}}
+; CHECK:        .amdgpu_num_sgpr {{[0-9]+}}
+; CHECK:        .amdgpu_private_segment_size {{[0-9]+}}
+; CHECK:        .amdgpu_occupancy {{[0-9]+}}
+; CHECK:        .amdgpu_use __amdgpu_named_barrier.bar{{[^ ,]*}}
+; CHECK:        .amdgpu_call helper
+; CHECK:      .end_amdgpu_info
+
 ; CHECK:      .amdgpu_lds __amdgpu_named_barrier.bar{{[^ ,]*}}, 32, 4
 
+; ELF:      Section {
+; ELF:        Name: .amdgpu.info
+; ELF:        Type: SHT_PROGBITS
+; ELF:        Flags [
+; ELF:          SHF_EXCLUDE
+
+; ELF-DAG: R_AMDGPU_ABS64 kernel
+; ELF-DAG: R_AMDGPU_ABS64 __amdgpu_named_barrier.bar{{[^ ]*}}
+; ELF-DAG: R_AMDGPU_ABS64 helper
+
 define amdgpu_kernel void @kernel() {
   call void @llvm.amdgcn.s.barrier.signal.var(ptr addrspace(3) @bar, i32 3)
   call void @llvm.amdgcn.s.barrier.join(ptr addrspace(3) @bar)
diff --git a/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-typeid.ll b/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-typeid.ll
new file mode 100644
index 0000000000000..c681697757aba
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-typeid.ll
@@ -0,0 +1,83 @@
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -amdgpu-enable-object-linking -filetype=obj < %s | llvm-readobj -r - | FileCheck %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -amdgpu-enable-object-linking -filetype=asm < %s | FileCheck %s --check-prefix=ASM --implicit-check-not=.amdgpu_num_agpr
+
+; Test ABI register-size type ID generation for various function types.
+; The type ID encodes each parameter/return by bit width: v=void, i=<=32-bit,
+; l=33-64-bit. Types with the same register footprint share an encoding
+; (e.g. float(float) and i32(i32) both produce "ii").
+
+define void @void_void() {
+  ret void
+}
+
+define i32 @i32_i32(i32 %x) {
+  ret i32 %x
+}
+
+define void @void_ptr_i32(ptr %p, i32 %x) {
+  ret void
+}
+
+define i64 @i64_i64_i64(i64 %a, i64 %b) {
+  ret i64 %a
+}
+
+define float @float_float(float %x) {
+  ret float %x
+}
+
+; Take the address of each function so they appear as resource nodes.
+define void @taker() {
+  %p0 = alloca ptr, addrspace(5)
+  store volatile ptr @void_void, ptr addrspace(5) %p0
+  store volatile ptr @i32_i32, ptr addrspace(5) %p0
+  store volatile ptr @void_ptr_i32, ptr addrspace(5) %p0
+  store volatile ptr @i64_i64_i64, ptr addrspace(5) %p0
+  store volatile ptr @float_float, ptr addrspace(5) %p0
+  ret void
+}
+
+define amdgpu_kernel void @kern() {
+  call void @taker()
+  ret void
+}
+
+!llvm.module.flags = !{!0}
+!0 = !{i32 1, !"amdgpu-link-time-lds", i32 1}
+
+; CHECK-DAG: R_AMDGPU_ABS64 void_void
+; CHECK-DAG: R_AMDGPU_ABS64 i32_i32
+; CHECK-DAG: R_AMDGPU_ABS64 void_ptr_i32
+; CHECK-DAG: R_AMDGPU_ABS64 i64_i64_i64
+; CHECK-DAG: R_AMDGPU_ABS64 float_float
+; CHECK-DAG: R_AMDGPU_ABS64 taker
+; CHECK-DAG: R_AMDGPU_ABS64 kern
+
+; ASM-DAG:    .amdgpu_info void_void
+; ASM-DAG:    .amdgpu_flags 0
+; ASM-DAG:    .amdgpu_typeid "v"
+; ASM-DAG:    .end_amdgpu_info
+; ASM-DAG:    .amdgpu_info i32_i32
+; ASM-DAG:    .amdgpu_flags 0
+; ASM-DAG:    .amdgpu_typeid "ii"
+; ASM-DAG:    .end_amdgpu_info
+; ASM-DAG:    .amdgpu_info void_ptr_i32
+; ASM-DAG:    .amdgpu_flags 0
+; ASM-DAG:    .amdgpu_typeid "vli"
+; ASM-DAG:    .end_amdgpu_info
+; ASM-DAG:    .amdgpu_info i64_i64_i64
+; ASM-DAG:    .amdgpu_flags 0
+; ASM-DAG:    .amdgpu_typeid "lll"
+; ASM-DAG:    .end_amdgpu_info
+; ASM-DAG:    .amdgpu_info float_float
+; ASM-DAG:    .amdgpu_flags 0
+; ASM-DAG:    .amdgpu_typeid "ii"
+; ASM-DAG:    .end_amdgpu_info
+; ASM-DAG:    .amdgpu_info taker
+; ASM-DAG:    .amdgpu_flags 0
+; ASM-DAG:    .amdgpu_num_vgpr {{[0-9]+}}
+; ASM-DAG:    .end_amdgpu_info
+; ASM-DAG:    .amdgpu_info kern
+; ASM-DAG:    .amdgpu_flags 5
+; ASM-DAG:    .amdgpu_call taker
+; ASM-DAG:    .end_amdgpu_info
diff --git a/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen.ll b/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen.ll
index 878f3abf7ccfc..06f7c1bd58c26 100644
--- a/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen.ll
+++ b/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen.ll
@@ -1,17 +1,18 @@
-; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -amdgpu-enable-object-linking < %s | FileCheck -check-prefixes=ASM %s
-; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -amdgpu-enable-object-linking -filetype=obj < %s | llvm-readobj -r --syms - | FileCheck -check-prefixes=ELF %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -amdgpu-enable-object-linking < %s | FileCheck -check-prefixes=ASM %s --implicit-check-not=.amdgpu_num_agpr
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -amdgpu-enable-object-linking -filetype=obj < %s | llvm-readobj -r --syms --sections - | FileCheck -check-prefixes=ELF %s
 
 ; Test that with object linking enabled, external LDS declarations produce
-; @abs32 at lo relocations, SHN_AMDGPU_LDS symbols, and .amdgpu_lds directives.
-; Covers multiple LDS variables with different sizes and alignments (including
-; zero-sized dynamic LDS), usage from both kernels and device functions, and
+; @abs32 at lo relocations, SHN_AMDGPU_LDS symbols, .amdgpu_lds directives,
+; and .amdgpu_use edges in the .amdgpu.info section. Covers multiple LDS
+; variables with different sizes and alignments (including zero-sized dynamic
+; LDS), usage from both kernels and device functions, and
 ; group_segment_fixed_size = 0 (linker patches via binary patching).
 
 @lds_large = external addrspace(3) global [256 x i8], align 16
 @lds_small = external addrspace(3) global [128 x i8], align 4
 @lds_dynamic = external addrspace(3) global [0 x i8], align 8
 
-; --- Assembly checks ---
+; Instruction-level relocation checks.
 ; ASM-LABEL: {{^}}device_func:
 ; ASM: v_add_u32_e32 v{{[0-9]+}}, lds_large at abs32@lo, v{{[0-9]+}}
 
@@ -19,17 +20,51 @@
 ; ASM-DAG: s_add_i32 s{{[0-9]+}}, s{{[0-9]+}}, lds_small at abs32@lo
 ; ASM-DAG: s_add_i32 s{{[0-9]+}}, s{{[0-9]+}}, lds_dynamic at abs32@lo
 
+; .amdgpu.info section with LDS use edges.
+; ASM-DAG: .amdgpu_info device_func
+; ASM-DAG:   .amdgpu_flags {{[0-9]+}}
+; ASM-DAG:   .amdgpu_num_vgpr {{[0-9]+}}
+; ASM-DAG:   .amdgpu_num_sgpr {{[0-9]+}}
+; ASM-DAG:   .amdgpu_private_segment_size {{[0-9]+}}
+; ASM-DAG:   .amdgpu_occupancy {{[0-9]+}}
+; ASM-DAG:   .amdgpu_use lds_large
+; ASM-DAG: .end_amdgpu_info
+; ASM-DAG: .amdgpu_info test_kernel
+; ASM-DAG:   .amdgpu_flags {{[0-9]+}}
+; ASM-DAG:   .amdgpu_num_vgpr {{[0-9]+}}
+; ASM-DAG:   .amdgpu_num_sgpr {{[0-9]+}}
+; ASM-DAG:   .amdgpu_private_segment_size {{[0-9]+}}
+; ASM-DAG:   .amdgpu_occupancy {{[0-9]+}}
+; ASM-DAG:   .amdgpu_use lds_dynamic
+; ASM-DAG:   .amdgpu_use lds_small
+; ASM-DAG:   .amdgpu_call device_func
+; ASM-DAG: .end_amdgpu_info
+
+; SHN_AMDGPU_LDS directives.
 ; ASM-DAG: .amdgpu_lds lds_large, 256, 16
 ; ASM-DAG: .amdgpu_lds lds_small, 128, 4
 ; ASM-DAG: .amdgpu_lds lds_dynamic, 0, 8
 
 ; ASM: .group_segment_fixed_size: 0
 
-; --- ELF checks ---
+; .amdgpu.info section exists.
+; ELF:      Section {
+; ELF:        Name: .amdgpu.info
+; ELF:        Type: SHT_PROGBITS
+; ELF:        Flags [
+; ELF:          SHF_EXCLUDE
+
+; Relocations.
 ; ELF-DAG: R_AMDGPU_ABS32_LO lds_large
 ; ELF-DAG: R_AMDGPU_ABS32_LO lds_small
 ; ELF-DAG: R_AMDGPU_ABS32_LO lds_dynamic
+; ELF-DAG: R_AMDGPU_ABS64 device_func
+; ELF-DAG: R_AMDGPU_ABS64 test_kernel
+; ELF-DAG: R_AMDGPU_ABS64 lds_large
+; ELF-DAG: R_AMDGPU_ABS64 lds_small
+; ELF-DAG: R_AMDGPU_ABS64 lds_dynamic
 
+; SHN_AMDGPU_LDS symbols.
 ; ELF-DAG: Name: lds_large
 ; ELF-DAG: Name: lds_small
 ; ELF-DAG: Name: lds_dynamic
diff --git a/llvm/test/MC/AMDGPU/amdgpu-info-err.s b/llvm/test/MC/AMDGPU/amdgpu-info-err.s
new file mode 100644
index 0000000000000..c61575923c460
--- /dev/null
+++ b/llvm/test/MC/AMDGPU/amdgpu-info-err.s
@@ -0,0 +1,35 @@
+// RUN: not llvm-mc -triple amdgcn-amd-amdhsa -mcpu=gfx900 %s -filetype=null 2>&1 | FileCheck %s
+
+// Missing function symbol after .amdgpu_info.
+.amdgpu_info
+// CHECK: :[[@LINE-1]]:{{[0-9]+}}: error: expected symbol name after .amdgpu_info
+
+// Unknown directive inside a .amdgpu_info block.
+.amdgpu_info f_unknown_dir
+	.amdgpu_bogus 1
+// CHECK: :[[@LINE-1]]:{{[0-9]+}}: error: unknown .amdgpu_info directive '.amdgpu_bogus'
+
+// .amdgpu_use with no resource symbol.
+.amdgpu_info f_use_missing
+	.amdgpu_use
+// CHECK: :[[@LINE-1]]:{{[0-9]+}}: error: expected resource symbol for .amdgpu_use
+
+// .amdgpu_call with no callee symbol.
+.amdgpu_info f_call_missing
+	.amdgpu_call
+// CHECK: :[[@LINE-1]]:{{[0-9]+}}: error: expected callee symbol for .amdgpu_call
+
+// .amdgpu_indirect_call with no type-ID string.
+.amdgpu_info f_icall_missing
+	.amdgpu_indirect_call
+// CHECK: :[[@LINE-1]]:{{[0-9]+}}: error: expected type ID string for .amdgpu_indirect_call
+
+// .amdgpu_typeid with no type-ID string.
+.amdgpu_info f_typeid_missing
+	.amdgpu_typeid
+// CHECK: :[[@LINE-1]]:{{[0-9]+}}: error: expected type ID string for .amdgpu_typeid
+
+// Non-identifier token where a directive or .end_amdgpu_info is expected.
+.amdgpu_info f_bad_token
+	123
+// CHECK: :[[@LINE-1]]:{{[0-9]+}}: error: expected directive or .end_amdgpu_info
diff --git a/llvm/test/MC/AMDGPU/amdgpu-info-roundtrip.s b/llvm/test/MC/AMDGPU/amdgpu-info-roundtrip.s
new file mode 100644
index 0000000000000..71ae2bf27073f
--- /dev/null
+++ b/llvm/test/MC/AMDGPU/amdgpu-info-roundtrip.s
@@ -0,0 +1,115 @@
+// RUN: llvm-mc -triple=amdgcn-amd-amdhsa -mcpu=gfx900 -filetype=asm %s | FileCheck --check-prefix=ASM %s
+// RUN: llvm-mc -triple=amdgcn-amd-amdhsa -mcpu=gfx900 -filetype=obj %s | llvm-readobj -r --sections --section-data - | FileCheck --check-prefix=OBJ %s
+
+// Test that .amdgpu_info directives round-trip through the assembler (asm and
+// object emission) and produce the correct TLV-encoded .amdgpu.info section.
+
+	.text
+	.globl	my_kernel
+	.p2align	8
+	.type	my_kernel, at function
+my_kernel:
+	s_endpgm
+.Lfunc_end0:
+	.size	my_kernel, .Lfunc_end0-my_kernel
+
+	.globl	helper
+	.p2align	2
+	.type	helper, at function
+helper:
+	s_setpc_b64 s[30:31]
+.Lfunc_end1:
+	.size	helper, .Lfunc_end1-helper
+
+	.globl	addr_taken_func
+	.p2align	2
+	.type	addr_taken_func, at function
+addr_taken_func:
+	s_setpc_b64 s[30:31]
+.Lfunc_end2:
+	.size	addr_taken_func, .Lfunc_end2-addr_taken_func
+
+	.globl	extern_func
+
+// Kernel: flags=7 (KERNEL|VCC|FLAT_SCRATCH), resources, call edge, use edge,
+// indirect call, and signature. Non-zero AGPR to verify conditional emission.
+	.amdgpu_info my_kernel
+		.amdgpu_flags 7
+		.amdgpu_num_vgpr 32
+		.amdgpu_num_agpr 4
+		.amdgpu_num_sgpr 33
+		.amdgpu_private_segment_size 0
+		.amdgpu_occupancy 65536
+		.amdgpu_call helper
+		.amdgpu_use lds_var
+		.amdgpu_indirect_call "vi"
+	.end_amdgpu_info
+
+// Device function: flags=2 (VCC), call edge to external. Zero AGPR and named
+// barrier values are omitted from the input; the parser defaults them to 0 and
+// the emitter skips them.
+	.amdgpu_info helper
+		.amdgpu_flags 2
+		.amdgpu_num_vgpr 10
+		.amdgpu_num_sgpr 8
+		.amdgpu_private_segment_size 16
+		.amdgpu_occupancy 0
+		.amdgpu_call extern_func
+	.end_amdgpu_info
+
+// Address-taken function with type ID. Zero AGPR/named-barrier omitted.
+	.amdgpu_info addr_taken_func
+		.amdgpu_flags 0
+		.amdgpu_num_vgpr 4
+		.amdgpu_num_sgpr 2
+		.amdgpu_private_segment_size 0
+		.amdgpu_occupancy 0
+		.amdgpu_typeid "vi"
+	.end_amdgpu_info
+
+// ASM: .amdgpu_info my_kernel
+// ASM: .amdgpu_flags 7
+// ASM: .amdgpu_num_vgpr 32
+// ASM: .amdgpu_num_agpr 4
+// ASM: .amdgpu_num_sgpr 33
+// ASM: .amdgpu_private_segment_size 0
+// ASM: .amdgpu_occupancy 65536
+// ASM: .amdgpu_use lds_var
+// ASM: .amdgpu_call helper
+// ASM: .amdgpu_indirect_call "vi"
+// ASM: .end_amdgpu_info
+
+// ASM: .amdgpu_info helper
+// ASM: .amdgpu_flags 2
+// ASM: .amdgpu_num_vgpr 10
+// ASM-NOT: .amdgpu_num_agpr
+// ASM: .amdgpu_num_sgpr 8
+// ASM: .amdgpu_private_segment_size 16
+// ASM: .amdgpu_occupancy 0
+// ASM: .amdgpu_call extern_func
+// ASM: .end_amdgpu_info
+
+// ASM: .amdgpu_info addr_taken_func
+// ASM: .amdgpu_flags 0
+// ASM: .amdgpu_num_vgpr 4
+// ASM-NOT: .amdgpu_num_agpr
+// ASM: .amdgpu_num_sgpr 2
+// ASM: .amdgpu_private_segment_size 0
+// ASM: .amdgpu_occupancy 0
+// ASM: .amdgpu_typeid "vi"
+// ASM: .end_amdgpu_info
+
+// OBJ: Section {
+// OBJ:   Name: .amdgpu.info
+// OBJ:   Type: SHT_PROGBITS
+// OBJ:   Flags [
+// OBJ:     SHF_EXCLUDE
+// OBJ:   ]
+// OBJ: }
+
+// Relocations in .amdgpu.info should reference defined and external symbols.
+// OBJ-DAG: R_AMDGPU_ABS64 my_kernel
+// OBJ-DAG: R_AMDGPU_ABS64 helper
+// OBJ-DAG: R_AMDGPU_ABS64 addr_taken_func
+// OBJ-DAG: R_AMDGPU_ABS64 extern_func
+// OBJ-DAG: R_AMDGPU_ABS64 lds_var