[llvm-branch-commits] [llvm] [AMDGPU] Add `.amdgpu.info` section for per-function metadata (PR #192384)
Shilei Tian via llvm-branch-commits
llvm-branch-commits at lists.llvm.org
Wed Apr 15 20:58:19 PDT 2026
https://github.com/shiltian created https://github.com/llvm/llvm-project/pull/192384
AMDGPU object linking requires the linker to propagate resource usage
(registers, stack, LDS) across translation units. To support this, the compiler
must emit per-function metadata and call graph edges in the relocatable object
so the linker can compute whole-program resource requirements.
This PR introduces a `.amdgpu.info` ELF section using a tagged, length-prefixed
binary format: each entry is encoded as:
```
[kind: u8] [len: u8] [payload: <len> bytes]
```
A function scope is opened by an `INFO_FUNC` entry (containing a symbol
reference), followed by per-function attributes (register counts, flags, private
segment size) and relational edges (direct calls, LDS uses, indirect call
signatures). String data such as function type signatures is stored in a
companion `.amdgpu.strtab` section.
The format is forward-compatible: a consumer that encounters an unknown kind can
skip it by reading the length byte, allowing new entry kinds to be added without
breaking existing toolchains.
>From 9ae35fc7aaaaa36422a9b2cc4233cce60d1b05d1 Mon Sep 17 00:00:00 2001
From: Shilei Tian <i at tianshilei.me>
Date: Wed, 15 Apr 2026 22:27:49 -0400
Subject: [PATCH] [AMDGPU] Add `.amdgpu.info` section for per-function metadata
AMDGPU object linking requires the linker to propagate resource usage
(registers, stack, LDS) across translation units. To support this, the compiler
must emit per-function metadata and call graph edges in the relocatable object
so the linker can compute whole-program resource requirements.
This PR introduces a `.amdgpu.info` ELF section using a tagged, length-prefixed
binary format: each entry is encoded as:
```
[kind: u8] [len: u8] [payload: <len> bytes]
```
A function scope is opened by an `INFO_FUNC` entry (containing a symbol
reference), followed by per-function attributes (register counts, flags, private
segment size) and relational edges (direct calls, LDS uses, indirect call
signatures). String data such as function type signatures is stored in a
companion `.amdgpu.strtab` section.
The format is forward-compatible: a consumer that encounters an unknown kind can
skip it by reading the length byte, allowing new entry kinds to be added without
breaking existing toolchains.
---
.../llvm/Support/AMDGPUObjLinkingInfo.h | 53 +++++
llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp | 174 +++++++++++++++-
llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h | 14 ++
llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp | 3 +
.../AMDGPU/AsmParser/AMDGPUAsmParser.cpp | 117 +++++++++++
.../MCTargetDesc/AMDGPUTargetStreamer.cpp | 192 ++++++++++++++++++
.../MCTargetDesc/AMDGPUTargetStreamer.h | 33 +++
.../AMDGPU/lds-link-time-codegen-callgraph.ll | 60 ++++++
...s-link-time-codegen-cross-tu-addr-taken.ll | 51 +++++
.../AMDGPU/lds-link-time-codegen-indirect.ll | 63 ++++++
.../lds-link-time-codegen-named-barrier.ll | 30 ++-
.../AMDGPU/lds-link-time-codegen-prototype.ll | 83 ++++++++
.../CodeGen/AMDGPU/lds-link-time-codegen.ll | 52 ++++-
llvm/test/MC/AMDGPU/amdgpu-info-roundtrip.s | 121 +++++++++++
14 files changed, 1033 insertions(+), 13 deletions(-)
create mode 100644 llvm/include/llvm/Support/AMDGPUObjLinkingInfo.h
create mode 100644 llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-callgraph.ll
create mode 100644 llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-cross-tu-addr-taken.ll
create mode 100644 llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-indirect.ll
create mode 100644 llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-prototype.ll
create mode 100644 llvm/test/MC/AMDGPU/amdgpu-info-roundtrip.s
diff --git a/llvm/include/llvm/Support/AMDGPUObjLinkingInfo.h b/llvm/include/llvm/Support/AMDGPUObjLinkingInfo.h
new file mode 100644
index 0000000000000..f374d4dbb37d7
--- /dev/null
+++ b/llvm/include/llvm/Support/AMDGPUObjLinkingInfo.h
@@ -0,0 +1,53 @@
+//===--- AMDGPUObjLinkingInfo.h ---------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+/// \file
+/// Enums shared between the AMDGPU backend (LLVM) and the ELF linker (LLD)
+/// for the `.amdgpu.info` object-linking metadata section.
+///
+/// Binary layout of each entry: [kind: u8] [len: u8] [payload: <len> bytes].
+/// Unknown kinds are forward-compatible: a consumer skips them by reading len.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_SUPPORT_AMDGPUOBJECTLINKINGINFO_H
+#define LLVM_SUPPORT_AMDGPUOBJECTLINKINGINFO_H
+
+#include <cstdint>
+
+namespace llvm {
+namespace AMDGPU {
+
+/// Entry kind values for the `.amdgpu.info` section.
+enum InfoKind : uint8_t {
+ INFO_FUNC = 1, // [symbol_ref: 8B] — opens function scope
+ INFO_NUM_VGPR = 2, // [u32]
+ INFO_NUM_AGPR = 3, // [u32]
+ INFO_NUM_SGPR = 4, // [u32]
+ INFO_NUM_NAMED_BARRIER = 5, // [u32]
+ INFO_PRIVATE_SEGMENT_SIZE = 6, // [u32]
+ INFO_OCCUPANCY_LDS_LIMIT = 7, // [u32]
+ INFO_FLAGS = 8, // [u32] — FuncInfoFlags bitfield
+ INFO_USE = 9, // [dst_symbol: 8B]
+ INFO_CALL = 10, // [dst_symbol: 8B]
+ INFO_INDIRECT_CALL = 11, // [strtab_offset: u32]
+ INFO_SIGNATURE = 12, // [strtab_offset: u32]
+};
+
+/// Per-function flags packed into INFO_FLAGS entries.
+enum FuncInfoFlags : uint32_t {
+ FUNC_IS_KERNEL = 0x1,
+ FUNC_USES_VCC = 0x2,
+ FUNC_USES_FLAT_SCRATCH = 0x4,
+ FUNC_HAS_DYN_STACK = 0x8,
+};
+
+} // namespace AMDGPU
+} // namespace llvm
+
+#endif // LLVM_SUPPORT_AMDGPUOBJECTLINKINGINFO_H
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
index 718b2b154e251..d5c0130475cca 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
@@ -32,6 +32,7 @@
#include "Utils/AMDGPUBaseInfo.h"
#include "Utils/AMDKernelCodeTUtils.h"
#include "Utils/SIDefinesUtils.h"
+#include "llvm/ADT/StringSet.h"
#include "llvm/Analysis/OptimizationRemarkEmitter.h"
#include "llvm/BinaryFormat/ELF.h"
#include "llvm/CodeGen/AsmPrinterHandler.h"
@@ -537,6 +538,150 @@ void AMDGPUAsmPrinter::validateMCResourceInfo(Function &F) {
}
}
+static void appendTypeEncoding(std::string &Enc, Type *Ty,
+ const DataLayout &DL) {
+ if (Ty->isVoidTy()) {
+ Enc += 'v';
+ return;
+ }
+ unsigned Bits = DL.getTypeSizeInBits(Ty);
+ if (Bits <= 32)
+ Enc += 'i';
+ else if (Bits <= 64)
+ Enc += 'l';
+ else
+ Enc.append(divideCeil(Bits, 32), 'i');
+}
+
+static std::string computeTypeId(const FunctionType *FTy,
+ const DataLayout &DL) {
+ std::string Enc;
+ appendTypeEncoding(Enc, FTy->getReturnType(), DL);
+ for (Type *ParamTy : FTy->params())
+ appendTypeEncoding(Enc, ParamTy, DL);
+ return Enc;
+}
+
+void AMDGPUAsmPrinter::collectCallEdge(const MachineInstr &MI) {
+ if (!AMDGPUTargetMachine::EnableObjectLinking)
+ return;
+ const GCNSubtarget &STI = MF->getSubtarget<GCNSubtarget>();
+ const SIInstrInfo *TII = STI.getInstrInfo();
+ const MachineOperand *CalleeOp =
+ TII->getNamedOperand(MI, AMDGPU::OpName::callee);
+ if (!CalleeOp || !CalleeOp->isGlobal())
+ return;
+ DirectCallEdges.insert(
+ {getSymbol(&MF->getFunction()), getSymbol(CalleeOp->getGlobal())});
+}
+
+void AMDGPUAsmPrinter::emitCallGraphSection(Module &M) {
+ if (!AMDGPUTargetMachine::EnableObjectLinking)
+ return;
+
+ const NamedMDNode *LdsMD = M.getNamedMetadata("amdgpu.lds.uses");
+ bool HasLdsUses = LdsMD && LdsMD->getNumOperands() > 0;
+
+ const NamedMDNode *BarMD = M.getNamedMetadata("amdgpu.named_barrier.uses");
+ bool HasNamedBarriers = BarMD && BarMD->getNumOperands() > 0;
+
+ // Collect address-taken functions (with type IDs) and indirect call sites.
+ DenseMap<const Function *, std::string> AddrTakenTypeIds;
+ using IndirectCallInfo = std::pair<const Function *, std::string>;
+ SmallVector<IndirectCallInfo, 8> IndirectCalls;
+
+ for (const Function &F : M) {
+ bool IsKernel = AMDGPU::isKernel(F.getCallingConv());
+
+ if (!IsKernel && F.hasAddressTaken(/*PutOffender=*/nullptr,
+ /*IgnoreCallbackUses=*/false,
+ /*IgnoreAssumeLikeCalls=*/true,
+ /*IgnoreLLVMUsed=*/true)) {
+ AddrTakenTypeIds[&F] =
+ computeTypeId(F.getFunctionType(), M.getDataLayout());
+ }
+
+ if (F.isDeclaration())
+ continue;
+
+ StringSet<> SeenTypeIds;
+ for (const BasicBlock &BB : F) {
+ for (const Instruction &I : BB) {
+ const auto *CB = dyn_cast<CallBase>(&I);
+ if (!CB || !CB->isIndirectCall())
+ continue;
+ std::string TId =
+ computeTypeId(CB->getFunctionType(), M.getDataLayout());
+ if (SeenTypeIds.insert(TId).second)
+ IndirectCalls.push_back({&F, std::move(TId)});
+ }
+ }
+ }
+
+ if (FunctionResourceInfos.empty() && DirectCallEdges.empty() && !HasLdsUses &&
+ !HasNamedBarriers && AddrTakenTypeIds.empty() && IndirectCalls.empty())
+ return;
+
+ AMDGPU::InfoSectionData Data;
+
+ DenseSet<MCSymbol *> DefinedSyms;
+
+ for (const auto &PFRI : FunctionResourceInfos) {
+ MCSymbol *Sym = getSymbol(PFRI.F);
+ DefinedSyms.insert(Sym);
+
+ AMDGPU::FuncInfo FI;
+ FI.Sym = Sym;
+ FI.IsKernel = AMDGPU::isKernel(PFRI.F->getCallingConv());
+ FI.NumArchVGPR = PFRI.RI.NumVGPR;
+ FI.NumAccVGPR = PFRI.RI.NumAGPR;
+ FI.NumSGPR = PFRI.RI.NumExplicitSGPR;
+ FI.NumNamedBarrier = PFRI.RI.NumNamedBarrier;
+ FI.PrivateSegmentSize = static_cast<uint32_t>(PFRI.RI.PrivateSegmentSize);
+ FI.OccupancyLDSLimit = PFRI.OccupancyLDSLimit;
+ FI.UsesVCC = PFRI.RI.UsesVCC;
+ FI.UsesFlatScratch = PFRI.RI.UsesFlatScratch;
+ FI.HasDynStack = PFRI.RI.HasDynamicallySizedStack;
+
+ Data.Funcs.push_back(std::move(FI));
+ }
+
+ for (auto &[F, TypeId] : AddrTakenTypeIds) {
+ MCSymbol *Sym = getSymbol(F);
+ Data.Signatures.push_back({Sym, TypeId});
+ }
+
+ for (auto &[CallerSym, CalleeSym] : DirectCallEdges)
+ Data.Calls.push_back({CallerSym, CalleeSym});
+ DirectCallEdges.clear();
+
+ if (HasLdsUses) {
+ for (const MDNode *N : LdsMD->operands()) {
+ auto *Func = mdconst::extract<Function>(N->getOperand(0));
+ auto *LdsVar = mdconst::extract<GlobalVariable>(N->getOperand(1));
+ Data.Uses.push_back({getSymbol(Func), getSymbol(LdsVar)});
+ }
+ }
+
+ if (HasNamedBarriers) {
+ for (const MDNode *N : BarMD->operands()) {
+ auto *BarVar = mdconst::extract<GlobalVariable>(N->getOperand(0));
+ MCSymbol *BarSym = getSymbol(BarVar);
+ for (unsigned I = 1, E = N->getNumOperands(); I < E; ++I) {
+ auto *Func = mdconst::extract<Function>(N->getOperand(I));
+ Data.Uses.push_back({getSymbol(Func), BarSym});
+ }
+ }
+ }
+
+ for (auto &[Caller, Enc] : IndirectCalls) {
+ MCSymbol *CallerSym = getSymbol(Caller);
+ Data.IndirectCalls.push_back({CallerSym, Enc});
+ }
+
+ getTargetStreamer()->EmitAMDGPUInfo(Data);
+}
+
bool AMDGPUAsmPrinter::doFinalization(Module &M) {
// Pad with s_code_end to help tools and guard against instruction prefetch
// causing stale data in caches. Arguably this should be done by the linker,
@@ -553,6 +698,9 @@ bool AMDGPUAsmPrinter::doFinalization(Module &M) {
}
}
+ // Emit unified .amdgpu.callgraph section (call graph + resource usage).
+ emitCallGraphSection(M);
+
// Assign expressions which can only be resolved when all other functions are
// known.
RI.finalize(OutContext);
@@ -567,8 +715,10 @@ bool AMDGPUAsmPrinter::doFinalization(Module &M) {
RI.getMaxSGPRSymbol(OutContext), RI.getMaxNamedBarrierSymbol(OutContext));
OutStreamer->popSection();
- for (Function &F : M.functions())
- validateMCResourceInfo(F);
+ if (!AMDGPUTargetMachine::EnableObjectLinking) {
+ for (Function &F : M.functions())
+ validateMCResourceInfo(F);
+ }
RI.reset();
@@ -729,6 +879,26 @@ bool AMDGPUAsmPrinter::runOnMachineFunction(MachineFunction &MF) {
RI.gatherResourceInfo(MF, *ResourceUsage, OutContext);
+ if (AMDGPUTargetMachine::EnableObjectLinking) {
+ PerFunctionResourceInfo PFRI = {&MF.getFunction(), *ResourceUsage};
+ if (AMDGPU::isKernel(MF.getFunction().getCallingConv())) {
+ unsigned TotalLDS = STM.getLocalMemorySize();
+ const auto [MinWEU, MaxWEU] = AMDGPU::getIntegerPairAttribute(
+ MF.getFunction(), "amdgpu-waves-per-eu", {0, 0}, true);
+ if (MinWEU > 0) {
+ const SIMachineFunctionInfo &SIMFI =
+ *MF.getInfo<SIMachineFunctionInfo>();
+ unsigned FlatWGSizeMax = SIMFI.getFlatWorkGroupSizes().second;
+ unsigned WavesPerWG = divideCeil(FlatWGSizeMax, STM.getWavefrontSize());
+ unsigned MinWGs = divideCeil(MinWEU * STM.getEUsPerCU(), WavesPerWG);
+ PFRI.OccupancyLDSLimit = MinWGs > 0 ? TotalLDS / MinWGs : TotalLDS;
+ } else {
+ PFRI.OccupancyLDSLimit = TotalLDS;
+ }
+ }
+ FunctionResourceInfos.push_back(PFRI);
+ }
+
if (MFI->isModuleEntryFunction()) {
getSIProgramInfo(CurrentProgramInfo, MF);
}
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h
index 31d10fe92ca26..073440a73c8e6 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.h
@@ -15,7 +15,9 @@
#define LLVM_LIB_TARGET_AMDGPU_AMDGPUASMPRINTER_H
#include "AMDGPUMCResourceInfo.h"
+#include "AMDGPUResourceUsageAnalysis.h"
#include "SIProgramInfo.h"
+#include "llvm/ADT/SetVector.h"
#include "llvm/CodeGen/AsmPrinter.h"
namespace llvm {
@@ -86,6 +88,18 @@ class AMDGPUAsmPrinter final : public AsmPrinter {
void initTargetStreamer(Module &M);
+ void emitCallGraphSection(Module &M);
+ void collectCallEdge(const MachineInstr &MI);
+
+ SetVector<std::pair<MCSymbol *, MCSymbol *>> DirectCallEdges;
+
+ struct PerFunctionResourceInfo {
+ const Function *F;
+ AMDGPUResourceUsageAnalysisImpl::SIFunctionResourceInfo RI;
+ uint32_t OccupancyLDSLimit = 0;
+ };
+ SmallVector<PerFunctionResourceInfo> FunctionResourceInfos;
+
SmallString<128> getMCExprStr(const MCExpr *Value);
/// Attempts to replace the validation that is missed in getSIProgramInfo due
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp b/llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp
index 56592bde3b1c7..3c89e3d287b32 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUMCInstLower.cpp
@@ -320,6 +320,9 @@ static void emitVGPRBlockComment(const MachineInstr *MI, const SIInstrInfo *TII,
}
void AMDGPUAsmPrinter::emitInstruction(const MachineInstr *MI) {
+ if (MI->isCall())
+ collectCallEdge(*MI);
+
// FIXME: Enable feature predicate checks once all the test pass.
// AMDGPU_MC::verifyInstructionPredicates(MI->getOpcode(),
// getSubtargetInfo().getFeatureBits());
diff --git a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
index 3777adc9790e8..4b36f5aff53a5 100644
--- a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
+++ b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
@@ -1382,6 +1382,9 @@ class AMDGPUAsmParser : public MCTargetAsmParser {
return getRegBitWidth(RCID) / 8;
}
+ AMDGPU::InfoSectionData InfoData;
+ bool HasInfoData = false;
+
private:
void createConstantSymbol(StringRef Id, int64_t Val);
@@ -1422,6 +1425,7 @@ class AMDGPUAsmParser : public MCTargetAsmParser {
bool ParseDirectivePALMetadataBegin();
bool ParseDirectivePALMetadata();
bool ParseDirectiveAMDGPULDS();
+ bool ParseDirectiveAMDGPUInfo();
/// Common code to parse out a block of text (typically YAML) between start and
/// end directives.
@@ -1676,6 +1680,7 @@ class AMDGPUAsmParser : public MCTargetAsmParser {
uint64_t &ErrorInfo,
bool MatchingInlineAsm) override;
bool ParseDirective(AsmToken DirectiveID) override;
+ void onEndOfFile() override;
ParseStatus parseOperand(OperandVector &Operands, StringRef Mnemonic,
OperandMode Mode = OperandMode_Default);
StringRef parseMnemonicSuffix(StringRef Name);
@@ -6741,6 +6746,115 @@ bool AMDGPUAsmParser::ParseDirectiveAMDGPULDS() {
return false;
}
+bool AMDGPUAsmParser::ParseDirectiveAMDGPUInfo() {
+ if (getParser().checkForValidSection())
+ return true;
+
+ StringRef FuncName;
+ if (getParser().parseIdentifier(FuncName))
+ return TokError("expected symbol name after .amdgpu_info");
+
+ MCSymbol *FuncSym = getContext().getOrCreateSymbol(FuncName);
+ AMDGPU::FuncInfo FI;
+ FI.Sym = FuncSym;
+ bool HasScalarAttrs = false;
+
+ while (true) {
+ while (trySkipToken(AsmToken::EndOfStatement))
+ ;
+
+ StringRef ID;
+ SMLoc IDLoc = getLoc();
+ if (!parseId(ID, "expected directive or .end_amdgpu_info"))
+ return true;
+
+ if (ID == ".end_amdgpu_info")
+ break;
+
+ if (ID == ".amdgpu_flags") {
+ int64_t Val;
+ if (getParser().parseAbsoluteExpression(Val))
+ return true;
+ uint32_t Flags = static_cast<uint32_t>(Val);
+ FI.IsKernel = (Flags & AMDGPU::FUNC_IS_KERNEL) != 0;
+ FI.UsesVCC = (Flags & AMDGPU::FUNC_USES_VCC) != 0;
+ FI.UsesFlatScratch = (Flags & AMDGPU::FUNC_USES_FLAT_SCRATCH) != 0;
+ FI.HasDynStack = (Flags & AMDGPU::FUNC_HAS_DYN_STACK) != 0;
+ HasScalarAttrs = true;
+ } else if (ID == ".amdgpu_num_vgpr") {
+ int64_t Val;
+ if (getParser().parseAbsoluteExpression(Val))
+ return true;
+ FI.NumArchVGPR = static_cast<uint32_t>(Val);
+ HasScalarAttrs = true;
+ } else if (ID == ".amdgpu_num_agpr") {
+ int64_t Val;
+ if (getParser().parseAbsoluteExpression(Val))
+ return true;
+ FI.NumAccVGPR = static_cast<uint32_t>(Val);
+ HasScalarAttrs = true;
+ } else if (ID == ".amdgpu_num_sgpr") {
+ int64_t Val;
+ if (getParser().parseAbsoluteExpression(Val))
+ return true;
+ FI.NumSGPR = static_cast<uint32_t>(Val);
+ HasScalarAttrs = true;
+ } else if (ID == ".amdgpu_num_named_barrier") {
+ int64_t Val;
+ if (getParser().parseAbsoluteExpression(Val))
+ return true;
+ FI.NumNamedBarrier = static_cast<uint32_t>(Val);
+ HasScalarAttrs = true;
+ } else if (ID == ".amdgpu_private_segment_size") {
+ int64_t Val;
+ if (getParser().parseAbsoluteExpression(Val))
+ return true;
+ FI.PrivateSegmentSize = static_cast<uint32_t>(Val);
+ HasScalarAttrs = true;
+ } else if (ID == ".amdgpu_occupancy_lds_limit") {
+ int64_t Val;
+ if (getParser().parseAbsoluteExpression(Val))
+ return true;
+ FI.OccupancyLDSLimit = static_cast<uint32_t>(Val);
+ HasScalarAttrs = true;
+ } else if (ID == ".amdgpu_use") {
+ StringRef ResName;
+ if (getParser().parseIdentifier(ResName))
+ return TokError("expected resource symbol for .amdgpu_use");
+ InfoData.Uses.push_back(
+ {FuncSym, getContext().getOrCreateSymbol(ResName)});
+ } else if (ID == ".amdgpu_call") {
+ StringRef DstName;
+ if (getParser().parseIdentifier(DstName))
+ return TokError("expected callee symbol for .amdgpu_call");
+ InfoData.Calls.push_back(
+ {FuncSym, getContext().getOrCreateSymbol(DstName)});
+ } else if (ID == ".amdgpu_indirect_call") {
+ std::string TypeId;
+ if (getParser().parseEscapedString(TypeId))
+ return TokError("expected type ID string for .amdgpu_indirect_call");
+ InfoData.IndirectCalls.push_back({FuncSym, std::move(TypeId)});
+ } else if (ID == ".amdgpu_signature") {
+ std::string TypeId;
+ if (getParser().parseEscapedString(TypeId))
+ return TokError("expected type ID string for .amdgpu_signature");
+ InfoData.Signatures.push_back({FuncSym, std::move(TypeId)});
+ } else {
+ return Error(IDLoc, "unknown .amdgpu_info directive '" + ID + "'");
+ }
+ }
+
+ if (HasScalarAttrs)
+ InfoData.Funcs.push_back(std::move(FI));
+ HasInfoData = true;
+ return false;
+}
+
+void AMDGPUAsmParser::onEndOfFile() {
+ if (HasInfoData)
+ getTargetStreamer().EmitAMDGPUInfo(InfoData);
+}
+
bool AMDGPUAsmParser::ParseDirective(AsmToken DirectiveID) {
StringRef IDVal = DirectiveID.getString();
@@ -6778,6 +6892,9 @@ bool AMDGPUAsmParser::ParseDirective(AsmToken DirectiveID) {
if (IDVal == ".amdgpu_lds")
return ParseDirectiveAMDGPULDS();
+ if (IDVal == ".amdgpu_info")
+ return ParseDirectiveAMDGPUInfo();
+
if (IDVal == PALMD::AssemblerDirectiveBegin)
return ParseDirectivePALMetadataBegin();
diff --git a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp
index d276bab0ff3be..95ff7e0aac1ea 100644
--- a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp
+++ b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp
@@ -664,6 +664,76 @@ void AMDGPUTargetAsmStreamer::EmitAmdhsaKernelDescriptor(
OS << "\t.end_amdhsa_kernel\n";
}
+void AMDGPUTargetAsmStreamer::EmitAMDGPUInfo(
+ const AMDGPU::InfoSectionData &Data) {
+ // Group edges by source function symbol.
+ DenseMap<MCSymbol *, SmallVector<MCSymbol *, 2>> FuncUses;
+ DenseMap<MCSymbol *, SmallVector<MCSymbol *, 4>> FuncCalls;
+ DenseMap<MCSymbol *, SmallVector<StringRef, 2>> FuncIndirectCalls;
+ DenseMap<MCSymbol *, SmallVector<StringRef, 1>> FuncSignatures;
+ for (const auto &[Func, Res] : Data.Uses)
+ FuncUses[Func].push_back(Res);
+ for (const auto &[Src, Dst] : Data.Calls)
+ FuncCalls[Src].push_back(Dst);
+ for (const auto &[Func, TypeId] : Data.IndirectCalls)
+ FuncIndirectCalls[Func].push_back(TypeId);
+ for (const auto &[Sym, TypeId] : Data.Signatures)
+ FuncSignatures[Sym].push_back(TypeId);
+
+ DenseSet<MCSymbol *> Emitted;
+ auto EmitScope = [&](MCSymbol *Sym, const AMDGPU::FuncInfo *Info) {
+ if (!Emitted.insert(Sym).second)
+ return;
+ OS << "\t.amdgpu_info " << Sym->getName() << '\n';
+ if (Info) {
+ uint32_t Flags = 0;
+ if (Info->IsKernel)
+ Flags |= AMDGPU::FUNC_IS_KERNEL;
+ if (Info->UsesVCC)
+ Flags |= AMDGPU::FUNC_USES_VCC;
+ if (Info->UsesFlatScratch)
+ Flags |= AMDGPU::FUNC_USES_FLAT_SCRATCH;
+ if (Info->HasDynStack)
+ Flags |= AMDGPU::FUNC_HAS_DYN_STACK;
+ OS << "\t\t.amdgpu_flags " << Flags << '\n';
+ OS << "\t\t.amdgpu_num_vgpr " << Info->NumArchVGPR << '\n';
+ OS << "\t\t.amdgpu_num_agpr " << Info->NumAccVGPR << '\n';
+ OS << "\t\t.amdgpu_num_sgpr " << Info->NumSGPR << '\n';
+ OS << "\t\t.amdgpu_num_named_barrier " << Info->NumNamedBarrier << '\n';
+ OS << "\t\t.amdgpu_private_segment_size " << Info->PrivateSegmentSize
+ << '\n';
+ OS << "\t\t.amdgpu_occupancy_lds_limit " << Info->OccupancyLDSLimit
+ << '\n';
+ }
+ if (auto It = FuncUses.find(Sym); It != FuncUses.end())
+ for (MCSymbol *Res : It->second)
+ OS << "\t\t.amdgpu_use " << Res->getName() << '\n';
+ if (auto It = FuncCalls.find(Sym); It != FuncCalls.end())
+ for (MCSymbol *Dst : It->second)
+ OS << "\t\t.amdgpu_call " << Dst->getName() << '\n';
+ if (auto It = FuncIndirectCalls.find(Sym); It != FuncIndirectCalls.end())
+ for (StringRef TypeId : It->second)
+ OS << "\t\t.amdgpu_indirect_call \"" << TypeId << "\"\n";
+ if (auto It = FuncSignatures.find(Sym); It != FuncSignatures.end())
+ for (StringRef TypeId : It->second)
+ OS << "\t\t.amdgpu_signature \"" << TypeId << "\"\n";
+ OS << "\t.end_amdgpu_info\n\n";
+ };
+
+ for (const auto &Func : Data.Funcs)
+ EmitScope(Func.Sym, &Func);
+
+ // Emit scopes for functions that only appear in edges (e.g. signature-only).
+ for (const auto &[Sym, TypeId] : Data.Signatures)
+ EmitScope(Sym, nullptr);
+ for (const auto &[Sym, Res] : Data.Uses)
+ EmitScope(Sym, nullptr);
+ for (const auto &[Sym, Dst] : Data.Calls)
+ EmitScope(Sym, nullptr);
+ for (const auto &[Sym, TypeId] : Data.IndirectCalls)
+ EmitScope(Sym, nullptr);
+}
+
//===----------------------------------------------------------------------===//
// AMDGPUTargetELFStreamer
//===----------------------------------------------------------------------===//
@@ -1065,3 +1135,125 @@ void AMDGPUTargetELFStreamer::EmitAmdhsaKernelDescriptor(
for (uint32_t i = 0; i < sizeof(amdhsa::kernel_descriptor_t::reserved3); ++i)
Streamer.emitInt8(0u);
}
+
+void AMDGPUTargetELFStreamer::EmitAMDGPUInfo(
+ const AMDGPU::InfoSectionData &Data) {
+ auto &S = getStreamer();
+ auto &Context = S.getContext();
+
+ StringMap<uint32_t> StrPoolOffsets;
+ SmallString<128> StrPool;
+ auto getOrAddString = [&](StringRef Str) -> uint32_t {
+ if (Str.empty())
+ return UINT32_MAX;
+ auto [It, Inserted] = StrPoolOffsets.try_emplace(Str, 0);
+ if (Inserted) {
+ It->second = StrPool.size();
+ StrPool.append(Str);
+ StrPool.push_back('\0');
+ }
+ return It->second;
+ };
+
+ // Pre-resolve string table offsets.
+ SmallVector<uint32_t, 4> ICallTypeIdOffsets;
+ for (const auto &[Func, TypeId] : Data.IndirectCalls)
+ ICallTypeIdOffsets.push_back(getOrAddString(TypeId));
+ SmallVector<uint32_t, 4> SigTypeIdOffsets;
+ for (const auto &[Sym, TypeId] : Data.Signatures)
+ SigTypeIdOffsets.push_back(getOrAddString(TypeId));
+
+ // Group edges by source function symbol.
+ DenseMap<MCSymbol *, SmallVector<MCSymbol *, 2>> FuncUses;
+ DenseMap<MCSymbol *, SmallVector<MCSymbol *, 4>> FuncCalls;
+ DenseMap<MCSymbol *, SmallVector<uint32_t, 2>> FuncIndirectCalls;
+ DenseMap<MCSymbol *, SmallVector<uint32_t, 1>> FuncSignatures;
+ for (const auto &[Func, Res] : Data.Uses)
+ FuncUses[Func].push_back(Res);
+ for (const auto &[Src, Dst] : Data.Calls)
+ FuncCalls[Src].push_back(Dst);
+ for (uint32_t I = 0, E = Data.IndirectCalls.size(); I < E; ++I)
+ FuncIndirectCalls[Data.IndirectCalls[I].first].push_back(
+ ICallTypeIdOffsets[I]);
+ for (uint32_t I = 0, E = Data.Signatures.size(); I < E; ++I)
+ FuncSignatures[Data.Signatures[I].first].push_back(SigTypeIdOffsets[I]);
+
+ // Helpers to emit kind+len tagged entries.
+ auto EmitU32Entry = [&](uint8_t Kind, uint32_t Val) {
+ S.emitInt8(Kind);
+ S.emitInt8(4);
+ S.emitInt32(Val);
+ };
+ auto EmitSymEntry = [&](uint8_t Kind, MCSymbol *Sym) {
+ S.emitInt8(Kind);
+ S.emitInt8(8);
+ S.emitValue(MCSymbolRefExpr::create(Sym, Context), 8);
+ };
+
+ S.pushSection();
+ MCSectionELF *InfoSec = Context.getELFSection(
+ ".amdgpu.info", ELF::SHT_PROGBITS, ELF::SHF_EXCLUDE);
+ S.switchSection(InfoSec);
+
+ DenseSet<MCSymbol *> Emitted;
+ auto EmitScope = [&](MCSymbol *Sym, const AMDGPU::FuncInfo *Info) {
+ if (!Emitted.insert(Sym).second)
+ return;
+
+ EmitSymEntry(AMDGPU::INFO_FUNC, Sym);
+
+ if (Info) {
+ uint32_t Flags = 0;
+ if (Info->IsKernel)
+ Flags |= AMDGPU::FUNC_IS_KERNEL;
+ if (Info->UsesVCC)
+ Flags |= AMDGPU::FUNC_USES_VCC;
+ if (Info->UsesFlatScratch)
+ Flags |= AMDGPU::FUNC_USES_FLAT_SCRATCH;
+ if (Info->HasDynStack)
+ Flags |= AMDGPU::FUNC_HAS_DYN_STACK;
+ EmitU32Entry(AMDGPU::INFO_FLAGS, Flags);
+ EmitU32Entry(AMDGPU::INFO_NUM_VGPR, Info->NumArchVGPR);
+ EmitU32Entry(AMDGPU::INFO_NUM_AGPR, Info->NumAccVGPR);
+ EmitU32Entry(AMDGPU::INFO_NUM_SGPR, Info->NumSGPR);
+ EmitU32Entry(AMDGPU::INFO_NUM_NAMED_BARRIER, Info->NumNamedBarrier);
+ EmitU32Entry(AMDGPU::INFO_PRIVATE_SEGMENT_SIZE, Info->PrivateSegmentSize);
+ EmitU32Entry(AMDGPU::INFO_OCCUPANCY_LDS_LIMIT, Info->OccupancyLDSLimit);
+ }
+
+ if (auto It = FuncUses.find(Sym); It != FuncUses.end())
+ for (MCSymbol *Res : It->second)
+ EmitSymEntry(AMDGPU::INFO_USE, Res);
+ if (auto It = FuncCalls.find(Sym); It != FuncCalls.end())
+ for (MCSymbol *Dst : It->second)
+ EmitSymEntry(AMDGPU::INFO_CALL, Dst);
+ if (auto It = FuncIndirectCalls.find(Sym); It != FuncIndirectCalls.end())
+ for (uint32_t Off : It->second)
+ EmitU32Entry(AMDGPU::INFO_INDIRECT_CALL, Off);
+ if (auto It = FuncSignatures.find(Sym); It != FuncSignatures.end())
+ for (uint32_t Off : It->second)
+ EmitU32Entry(AMDGPU::INFO_SIGNATURE, Off);
+ };
+
+ for (const auto &Func : Data.Funcs)
+ EmitScope(Func.Sym, &Func);
+
+ // Emit scopes for functions that only appear in edges.
+ for (const auto &[Sym, TypeId] : Data.Signatures)
+ EmitScope(Sym, nullptr);
+ for (const auto &[Sym, Res] : Data.Uses)
+ EmitScope(Sym, nullptr);
+ for (const auto &[Sym, Dst] : Data.Calls)
+ EmitScope(Sym, nullptr);
+ for (const auto &[Sym, TypeId] : Data.IndirectCalls)
+ EmitScope(Sym, nullptr);
+
+ if (!StrPool.empty()) {
+ MCSectionELF *Sec = Context.getELFSection(".amdgpu.strtab", ELF::SHT_STRTAB,
+ ELF::SHF_EXCLUDE);
+ S.switchSection(Sec);
+ S.emitBytes(StrPool);
+ }
+
+ S.popSection();
+}
diff --git a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.h b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.h
index 3a0d8dcd2d27c..31c5a5272e0b2 100644
--- a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.h
+++ b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.h
@@ -11,7 +11,11 @@
#include "Utils/AMDGPUBaseInfo.h"
#include "Utils/AMDGPUPALMetadata.h"
+#include "llvm/ADT/SmallVector.h"
#include "llvm/MC/MCStreamer.h"
+#include "llvm/Support/AMDGPUObjLinkingInfo.h"
+#include <string>
+#include <utility>
namespace llvm {
@@ -26,6 +30,29 @@ struct MCKernelDescriptor;
namespace HSAMD {
struct Metadata;
}
+
+struct FuncInfo {
+ MCSymbol *Sym = nullptr;
+ bool IsKernel = false;
+ uint32_t NumArchVGPR = 0;
+ uint32_t NumAccVGPR = 0;
+ uint32_t NumSGPR = 0;
+ uint32_t NumNamedBarrier = 0;
+ uint32_t PrivateSegmentSize = 0;
+ uint32_t OccupancyLDSLimit = 0;
+ bool UsesVCC = false;
+ bool UsesFlatScratch = false;
+ bool HasDynStack = false;
+};
+
+struct InfoSectionData {
+ SmallVector<FuncInfo, 8> Funcs;
+ SmallVector<std::pair<MCSymbol *, MCSymbol *>, 4> Uses;
+ SmallVector<std::pair<MCSymbol *, MCSymbol *>, 8> Calls;
+ SmallVector<std::pair<MCSymbol *, std::string>, 4> IndirectCalls;
+ SmallVector<std::pair<MCSymbol *, std::string>, 4> Signatures;
+};
+
} // namespace AMDGPU
class AMDGPUTargetStreamer : public MCTargetStreamer {
@@ -104,6 +131,8 @@ class AMDGPUTargetStreamer : public MCTargetStreamer {
const MCExpr *ReserveVCC,
const MCExpr *ReserveFlatScr) {}
+ virtual void EmitAMDGPUInfo(const AMDGPU::InfoSectionData &Data) {}
+
static StringRef getArchNameFromElfMach(unsigned ElfMach);
static unsigned getElfMach(StringRef GPU);
@@ -168,6 +197,8 @@ class AMDGPUTargetAsmStreamer final : public AMDGPUTargetStreamer {
const MCExpr *NextVGPR, const MCExpr *NextSGPR,
const MCExpr *ReserveVCC,
const MCExpr *ReserveFlatScr) override;
+
+ void EmitAMDGPUInfo(const AMDGPU::InfoSectionData &Data) override;
};
class AMDGPUTargetELFStreamer final : public AMDGPUTargetStreamer {
@@ -221,6 +252,8 @@ class AMDGPUTargetELFStreamer final : public AMDGPUTargetStreamer {
const MCExpr *NextVGPR, const MCExpr *NextSGPR,
const MCExpr *ReserveVCC,
const MCExpr *ReserveFlatScr) override;
+
+ void EmitAMDGPUInfo(const AMDGPU::InfoSectionData &Data) override;
};
}
#endif
diff --git a/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-callgraph.ll b/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-callgraph.ll
new file mode 100644
index 0000000000000..f90b66e813b0f
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-callgraph.ll
@@ -0,0 +1,60 @@
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -amdgpu-enable-object-linking -amdgpu-enable-lower-module-lds=0 -filetype=obj < %s | llvm-readobj -r --sections - | FileCheck %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -amdgpu-enable-object-linking -amdgpu-enable-lower-module-lds=0 -filetype=asm < %s | FileCheck %s --check-prefix=ASM
+
+; Test that the unified .amdgpu.info section (.amdgpu_info blocks in assembly) is
+; emitted with correct relocations when the module has the
+; "amdgpu-link-time-lds" module flag.
+;
+; The helper function does NOT use LDS (to avoid AMDGPUAlwaysInlinePass
+; force-inlining it when -amdgpu-enable-lower-module-lds=0). It just calls
+; an external function, which is sufficient to generate a call edge.
+
+declare void @extern_func()
+
+; The .amdgpu.info section should exist as SHT_PROGBITS with SHF_EXCLUDE.
+; CHECK: Section {
+; CHECK: Name: .amdgpu.info
+; CHECK: Type: SHT_PROGBITS
+; CHECK: Flags [
+; CHECK: SHF_EXCLUDE
+; CHECK: ]
+
+; Symbol references in the binary resource metadata still use R_AMDGPU_ABS64 relocations.
+; CHECK-DAG: R_AMDGPU_ABS64 my_kernel
+; CHECK-DAG: R_AMDGPU_ABS64 helper
+; CHECK-DAG: R_AMDGPU_ABS64 extern_func
+
+; Assembly: per-function .amdgpu_info blocks (target flags derived from e_flags).
+; ASM-DAG: .amdgpu_info helper
+; ASM-DAG: .amdgpu_flags {{[0-9]+}}
+; ASM-DAG: .amdgpu_num_vgpr {{[0-9]+}}
+; ASM-DAG: .amdgpu_num_agpr {{[0-9]+}}
+; ASM-DAG: .amdgpu_num_sgpr {{[0-9]+}}
+; ASM-DAG: .amdgpu_num_named_barrier {{[0-9]+}}
+; ASM-DAG: .amdgpu_private_segment_size {{[0-9]+}}
+; ASM-DAG: .amdgpu_occupancy_lds_limit {{[0-9]+}}
+; ASM-DAG: .amdgpu_call extern_func
+; ASM-DAG: .end_amdgpu_info
+; ASM-DAG: .amdgpu_info my_kernel
+; ASM-DAG: .amdgpu_flags {{[0-9]+}}
+; ASM-DAG: .amdgpu_num_vgpr {{[0-9]+}}
+; ASM-DAG: .amdgpu_num_agpr {{[0-9]+}}
+; ASM-DAG: .amdgpu_num_sgpr {{[0-9]+}}
+; ASM-DAG: .amdgpu_num_named_barrier {{[0-9]+}}
+; ASM-DAG: .amdgpu_private_segment_size {{[0-9]+}}
+; ASM-DAG: .amdgpu_occupancy_lds_limit {{[0-9]+}}
+; ASM-DAG: .amdgpu_call helper
+; ASM-DAG: .end_amdgpu_info
+
+define void @helper() {
+ call void @extern_func()
+ ret void
+}
+
+define amdgpu_kernel void @my_kernel() {
+ call void @helper()
+ ret void
+}
+
+!llvm.module.flags = !{!0}
+!0 = !{i32 1, !"amdgpu-link-time-lds", i32 1}
diff --git a/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-cross-tu-addr-taken.ll b/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-cross-tu-addr-taken.ll
new file mode 100644
index 0000000000000..3fc88ac27d4b6
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-cross-tu-addr-taken.ll
@@ -0,0 +1,51 @@
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -amdgpu-enable-object-linking -amdgpu-enable-lower-module-lds=0 -filetype=obj < %s | llvm-readobj -r - | FileCheck %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -amdgpu-enable-object-linking -amdgpu-enable-lower-module-lds=0 -filetype=asm < %s | FileCheck %s --check-prefix=ASM
+
+; Test that address-taken on a declaration (cross-TU scenario) emits a
+; .amdgpu_signature directive for the external function. The function
+; is defined in another TU but its address is taken here.
+
+declare void @external_func(i32)
+
+define void @taker(ptr %p) {
+ store ptr @external_func, ptr %p
+ ret void
+}
+
+define amdgpu_kernel void @kern() {
+ %p = alloca ptr, addrspace(5)
+ call void @taker(ptr addrspace(5) %p)
+ ret void
+}
+
+!llvm.module.flags = !{!0}
+!0 = !{i32 1, !"amdgpu-link-time-lds", i32 1}
+
+; CHECK-DAG: R_AMDGPU_ABS64 external_func
+; CHECK-DAG: R_AMDGPU_ABS64 kern
+; CHECK-DAG: R_AMDGPU_ABS64 taker
+
+; Assembly: per-function .amdgpu_info blocks (target flags derived from e_flags).
+; ASM-DAG: .amdgpu_info taker
+; ASM-DAG: .amdgpu_flags 0
+; ASM-DAG: .amdgpu_num_vgpr {{[0-9]+}}
+; ASM-DAG: .amdgpu_num_agpr {{[0-9]+}}
+; ASM-DAG: .amdgpu_num_sgpr {{[0-9]+}}
+; ASM-DAG: .amdgpu_num_named_barrier {{[0-9]+}}
+; ASM-DAG: .amdgpu_private_segment_size {{[0-9]+}}
+; ASM-DAG: .amdgpu_occupancy_lds_limit {{[0-9]+}}
+; ASM-DAG: .amdgpu_signature "vl"
+; ASM-DAG: .end_amdgpu_info
+; ASM-DAG: .amdgpu_info kern
+; ASM-DAG: .amdgpu_flags 5
+; ASM-DAG: .amdgpu_num_vgpr {{[0-9]+}}
+; ASM-DAG: .amdgpu_num_agpr {{[0-9]+}}
+; ASM-DAG: .amdgpu_num_sgpr {{[0-9]+}}
+; ASM-DAG: .amdgpu_num_named_barrier {{[0-9]+}}
+; ASM-DAG: .amdgpu_private_segment_size {{[0-9]+}}
+; ASM-DAG: .amdgpu_occupancy_lds_limit {{[0-9]+}}
+; ASM-DAG: .amdgpu_call taker
+; ASM-DAG: .end_amdgpu_info
+; ASM-DAG: .amdgpu_info external_func
+; ASM-DAG: .amdgpu_signature "vi"
+; ASM-DAG: .end_amdgpu_info
diff --git a/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-indirect.ll b/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-indirect.ll
new file mode 100644
index 0000000000000..28a8139384eb1
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-indirect.ll
@@ -0,0 +1,63 @@
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -amdgpu-enable-object-linking -amdgpu-enable-lower-module-lds=0 -filetype=obj < %s | llvm-readobj -r --sections - | FileCheck %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -amdgpu-enable-object-linking -amdgpu-enable-lower-module-lds=0 -filetype=asm < %s | FileCheck %s --check-prefix=ASM
+
+; Test that the unified .amdgpu.info section includes address-taken metadata and
+; .amdgpu_indirect_call for functions involved in indirect calling.
+;
+; @callee_vi: void(i32) -- address taken, prototype encoding "vi"
+; @caller: has indirect call to void(i32) -- icall encoding "vi"
+; @my_kernel: kernel that passes @callee_vi as a function pointer to @caller
+
+define void @callee_vi(i32 %x) {
+ ret void
+}
+
+define void @caller(ptr %fptr) {
+ call void %fptr(i32 1)
+ ret void
+}
+
+define amdgpu_kernel void @my_kernel() {
+ call void @caller(ptr @callee_vi)
+ ret void
+}
+
+!llvm.module.flags = !{!0}
+!0 = !{i32 1, !"amdgpu-link-time-lds", i32 1}
+
+; CHECK: Section {
+; CHECK: Name: .amdgpu.info
+; CHECK: Type: SHT_PROGBITS
+; CHECK: Flags [
+; CHECK: SHF_EXCLUDE
+; CHECK: ]
+
+; CHECK-DAG: R_AMDGPU_ABS64 my_kernel
+; CHECK-DAG: R_AMDGPU_ABS64 caller
+; CHECK-DAG: R_AMDGPU_ABS64 callee_vi
+
+; ASM-DAG: .amdgpu_info callee_vi
+; ASM-DAG: .amdgpu_flags 0
+; ASM-DAG: .amdgpu_num_vgpr {{[0-9]+}}
+; ASM-DAG: .amdgpu_num_agpr {{[0-9]+}}
+; ASM-DAG: .amdgpu_num_sgpr {{[0-9]+}}
+; ASM-DAG: .amdgpu_num_named_barrier {{[0-9]+}}
+; ASM-DAG: .amdgpu_private_segment_size {{[0-9]+}}
+; ASM-DAG: .amdgpu_occupancy_lds_limit {{[0-9]+}}
+; ASM-DAG: .amdgpu_signature "vi"
+; ASM-DAG: .end_amdgpu_info
+; ASM-DAG: .amdgpu_info caller
+; ASM-DAG: .amdgpu_flags 14
+; ASM-DAG: .amdgpu_num_vgpr {{[0-9]+}}
+; ASM-DAG: .amdgpu_indirect_call "vi"
+; ASM-DAG: .end_amdgpu_info
+; ASM-DAG: .amdgpu_info my_kernel
+; ASM-DAG: .amdgpu_flags 7
+; ASM-DAG: .amdgpu_num_vgpr {{[0-9]+}}
+; ASM-DAG: .amdgpu_num_agpr {{[0-9]+}}
+; ASM-DAG: .amdgpu_num_sgpr {{[0-9]+}}
+; ASM-DAG: .amdgpu_num_named_barrier {{[0-9]+}}
+; ASM-DAG: .amdgpu_private_segment_size {{[0-9]+}}
+; ASM-DAG: .amdgpu_occupancy_lds_limit {{[0-9]+}}
+; ASM-DAG: .amdgpu_call caller
+; ASM-DAG: .end_amdgpu_info
diff --git a/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-named-barrier.ll b/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-named-barrier.ll
index 46d4c8db00f06..82ac4e12dc5c4 100644
--- a/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-named-barrier.ll
+++ b/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-named-barrier.ll
@@ -1,9 +1,11 @@
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1250 -amdgpu-enable-object-linking < %s | FileCheck %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1250 -amdgpu-enable-object-linking -filetype=obj < %s | llvm-readobj -r --sections - | FileCheck %s --check-prefix=ELF
; Verify object linking codegen for named barriers on GFX1250:
; 1. Barrier instructions use M0-based forms with relocation references
-; 2. group_segment_fixed_size = 0 (linker patches it)
-; 3. Named barrier is emitted as an SHN_AMDGPU_LDS symbol (.amdgpu_lds)
+; 2. .amdgpu.info section records the barrier as an LDS use edge
+; 3. group_segment_fixed_size = 0 (linker patches it)
+; 4. Named barrier is emitted as an SHN_AMDGPU_LDS symbol (.amdgpu_lds)
@bar = internal addrspace(3) global [2 x target("amdgcn.named.barrier", 0)] poison
@@ -13,12 +15,32 @@
; CHECK: s_barrier_join m0
; CHECK: s_barrier_wait 1
-; KD: group_segment_fixed_size = 0 (linker will patch).
; CHECK: .amdhsa_group_segment_fixed_size 0
-; LDS symbol declaration
+; CHECK: .amdgpu_info kernel
+; CHECK: .amdgpu_flags {{[0-9]+}}
+; CHECK: .amdgpu_num_vgpr {{[0-9]+}}
+; CHECK: .amdgpu_num_agpr {{[0-9]+}}
+; CHECK: .amdgpu_num_sgpr {{[0-9]+}}
+; CHECK: .amdgpu_num_named_barrier {{[0-9]+}}
+; CHECK: .amdgpu_private_segment_size {{[0-9]+}}
+; CHECK: .amdgpu_occupancy_lds_limit {{[0-9]+}}
+; CHECK: .amdgpu_use __amdgpu_named_barrier.bar{{[^ ,]*}}
+; CHECK: .amdgpu_call helper
+; CHECK: .end_amdgpu_info
+
; CHECK: .amdgpu_lds __amdgpu_named_barrier.bar{{[^ ,]*}}, 32, 4
+; ELF: Section {
+; ELF: Name: .amdgpu.info
+; ELF: Type: SHT_PROGBITS
+; ELF: Flags [
+; ELF: SHF_EXCLUDE
+
+; ELF-DAG: R_AMDGPU_ABS64 kernel
+; ELF-DAG: R_AMDGPU_ABS64 __amdgpu_named_barrier.bar{{[^ ]*}}
+; ELF-DAG: R_AMDGPU_ABS64 helper
+
define amdgpu_kernel void @kernel() {
call void @llvm.amdgcn.s.barrier.signal.var(ptr addrspace(3) @bar, i32 3)
call void @llvm.amdgcn.s.barrier.join(ptr addrspace(3) @bar)
diff --git a/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-prototype.ll b/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-prototype.ll
new file mode 100644
index 0000000000000..27603375e978a
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen-prototype.ll
@@ -0,0 +1,83 @@
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -amdgpu-enable-object-linking -amdgpu-enable-lower-module-lds=0 -filetype=obj < %s | llvm-readobj -r - | FileCheck %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -amdgpu-enable-object-linking -amdgpu-enable-lower-module-lds=0 -filetype=asm < %s | FileCheck %s --check-prefix=ASM
+
+; Test ABI register-size type ID generation for various function signatures.
+; The type ID encodes each parameter/return by bit width: v=void, i=<=32-bit,
+; l=33-64-bit. Types with the same register footprint share an encoding
+; (e.g. float(float) and i32(i32) both produce "ii").
+
+define void @void_void() {
+ ret void
+}
+
+define i32 @i32_i32(i32 %x) {
+ ret i32 %x
+}
+
+define void @void_ptr_i32(ptr %p, i32 %x) {
+ ret void
+}
+
+define i64 @i64_i64_i64(i64 %a, i64 %b) {
+ ret i64 %a
+}
+
+define float @float_float(float %x) {
+ ret float %x
+}
+
+; Take the address of each function so they appear as resource nodes.
+define void @taker() {
+ %p0 = alloca ptr, addrspace(5)
+ store volatile ptr @void_void, ptr addrspace(5) %p0
+ store volatile ptr @i32_i32, ptr addrspace(5) %p0
+ store volatile ptr @void_ptr_i32, ptr addrspace(5) %p0
+ store volatile ptr @i64_i64_i64, ptr addrspace(5) %p0
+ store volatile ptr @float_float, ptr addrspace(5) %p0
+ ret void
+}
+
+define amdgpu_kernel void @kern() {
+ call void @taker()
+ ret void
+}
+
+!llvm.module.flags = !{!0}
+!0 = !{i32 1, !"amdgpu-link-time-lds", i32 1}
+
+; CHECK-DAG: R_AMDGPU_ABS64 void_void
+; CHECK-DAG: R_AMDGPU_ABS64 i32_i32
+; CHECK-DAG: R_AMDGPU_ABS64 void_ptr_i32
+; CHECK-DAG: R_AMDGPU_ABS64 i64_i64_i64
+; CHECK-DAG: R_AMDGPU_ABS64 float_float
+; CHECK-DAG: R_AMDGPU_ABS64 taker
+; CHECK-DAG: R_AMDGPU_ABS64 kern
+
+; ASM-DAG: .amdgpu_info void_void
+; ASM-DAG: .amdgpu_flags 0
+; ASM-DAG: .amdgpu_signature "v"
+; ASM-DAG: .end_amdgpu_info
+; ASM-DAG: .amdgpu_info i32_i32
+; ASM-DAG: .amdgpu_flags 0
+; ASM-DAG: .amdgpu_signature "ii"
+; ASM-DAG: .end_amdgpu_info
+; ASM-DAG: .amdgpu_info void_ptr_i32
+; ASM-DAG: .amdgpu_flags 0
+; ASM-DAG: .amdgpu_signature "vli"
+; ASM-DAG: .end_amdgpu_info
+; ASM-DAG: .amdgpu_info i64_i64_i64
+; ASM-DAG: .amdgpu_flags 0
+; ASM-DAG: .amdgpu_signature "lll"
+; ASM-DAG: .end_amdgpu_info
+; ASM-DAG: .amdgpu_info float_float
+; ASM-DAG: .amdgpu_flags 0
+; ASM-DAG: .amdgpu_signature "ii"
+; ASM-DAG: .end_amdgpu_info
+; ASM-DAG: .amdgpu_info taker
+; ASM-DAG: .amdgpu_flags 0
+; ASM-DAG: .amdgpu_num_vgpr {{[0-9]+}}
+; ASM-DAG: .end_amdgpu_info
+; ASM-DAG: .amdgpu_info kern
+; ASM-DAG: .amdgpu_flags 5
+; ASM-DAG: .amdgpu_call taker
+; ASM-DAG: .end_amdgpu_info
diff --git a/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen.ll b/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen.ll
index 19a7e86711da7..818f8401fad3b 100644
--- a/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen.ll
+++ b/llvm/test/CodeGen/AMDGPU/lds-link-time-codegen.ll
@@ -1,36 +1,74 @@
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -amdgpu-enable-object-linking < %s | FileCheck -check-prefixes=ASM %s
-; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -amdgpu-enable-object-linking -filetype=obj < %s | llvm-readobj -r --syms - | FileCheck -check-prefixes=ELF %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -amdgpu-enable-object-linking -filetype=obj < %s | llvm-readobj -r --syms --sections - | FileCheck -check-prefixes=ELF %s
; Test that with object linking enabled, external LDS declarations produce
-; @abs32 at lo relocations, SHN_AMDGPU_LDS symbols, and .amdgpu_lds directives.
-; Covers multiple LDS variables with different sizes and alignments (including
-; zero-sized dynamic LDS), usage from both kernels and device functions, and
+; @abs32 at lo relocations, SHN_AMDGPU_LDS symbols, .amdgpu_lds directives,
+; and .amdgpu_use edges in the .amdgpu.info section. Covers multiple LDS
+; variables with different sizes and alignments (including zero-sized dynamic
+; LDS), usage from both kernels and device functions, and
; group_segment_fixed_size = 0 (linker patches via binary patching).
@lds_large = external addrspace(3) global [256 x i8], align 16
@lds_small = external addrspace(3) global [128 x i8], align 4
@lds_dynamic = external addrspace(3) global [0 x i8], align 8
-; --- Assembly checks ---
+; Instruction-level relocation checks.
; ASM-LABEL: {{^}}device_func:
; ASM: v_add_u32_e32 v{{[0-9]+}}, lds_large at abs32@lo, v{{[0-9]+}}
; ASM-LABEL: {{^}}test_kernel:
; ASM-DAG: s_add_i32 s{{[0-9]+}}, s{{[0-9]+}}, lds_small at abs32@lo
; ASM-DAG: s_add_i32 s{{[0-9]+}}, s{{[0-9]+}}, lds_dynamic at abs32@lo
-; ASM-DAG: s_add_i32 s{{[0-9]+}}, s{{[0-9]+}}, lds_large at abs32@lo
+; .amdgpu.info section with LDS use edges.
+; ASM-DAG: .amdgpu_info device_func
+; ASM-DAG: .amdgpu_flags {{[0-9]+}}
+; ASM-DAG: .amdgpu_num_vgpr {{[0-9]+}}
+; ASM-DAG: .amdgpu_num_agpr {{[0-9]+}}
+; ASM-DAG: .amdgpu_num_sgpr {{[0-9]+}}
+; ASM-DAG: .amdgpu_num_named_barrier {{[0-9]+}}
+; ASM-DAG: .amdgpu_private_segment_size {{[0-9]+}}
+; ASM-DAG: .amdgpu_occupancy_lds_limit {{[0-9]+}}
+; ASM-DAG: .amdgpu_use lds_large
+; ASM-DAG: .end_amdgpu_info
+; ASM-DAG: .amdgpu_info test_kernel
+; ASM-DAG: .amdgpu_flags {{[0-9]+}}
+; ASM-DAG: .amdgpu_num_vgpr {{[0-9]+}}
+; ASM-DAG: .amdgpu_num_agpr {{[0-9]+}}
+; ASM-DAG: .amdgpu_num_sgpr {{[0-9]+}}
+; ASM-DAG: .amdgpu_num_named_barrier {{[0-9]+}}
+; ASM-DAG: .amdgpu_private_segment_size {{[0-9]+}}
+; ASM-DAG: .amdgpu_occupancy_lds_limit {{[0-9]+}}
+; ASM-DAG: .amdgpu_use lds_dynamic
+; ASM-DAG: .amdgpu_use lds_small
+; ASM-DAG: .amdgpu_call device_func
+; ASM-DAG: .end_amdgpu_info
+
+; SHN_AMDGPU_LDS directives.
; ASM-DAG: .amdgpu_lds lds_large, 256, 16
; ASM-DAG: .amdgpu_lds lds_small, 128, 4
; ASM-DAG: .amdgpu_lds lds_dynamic, 0, 8
; ASM: .group_segment_fixed_size: 0
-; --- ELF checks ---
+; .amdgpu.info section exists.
+; ELF: Section {
+; ELF: Name: .amdgpu.info
+; ELF: Type: SHT_PROGBITS
+; ELF: Flags [
+; ELF: SHF_EXCLUDE
+
+; Relocations.
; ELF-DAG: R_AMDGPU_ABS32_LO lds_large
; ELF-DAG: R_AMDGPU_ABS32_LO lds_small
; ELF-DAG: R_AMDGPU_ABS32_LO lds_dynamic
+; ELF-DAG: R_AMDGPU_ABS64 device_func
+; ELF-DAG: R_AMDGPU_ABS64 test_kernel
+; ELF-DAG: R_AMDGPU_ABS64 lds_large
+; ELF-DAG: R_AMDGPU_ABS64 lds_small
+; ELF-DAG: R_AMDGPU_ABS64 lds_dynamic
+; SHN_AMDGPU_LDS symbols.
; ELF-DAG: Name: lds_large
; ELF-DAG: Name: lds_small
; ELF-DAG: Name: lds_dynamic
diff --git a/llvm/test/MC/AMDGPU/amdgpu-info-roundtrip.s b/llvm/test/MC/AMDGPU/amdgpu-info-roundtrip.s
new file mode 100644
index 0000000000000..dafd0bd2782ac
--- /dev/null
+++ b/llvm/test/MC/AMDGPU/amdgpu-info-roundtrip.s
@@ -0,0 +1,121 @@
+// RUN: llvm-mc -triple=amdgcn-amd-amdhsa -mcpu=gfx900 -filetype=asm %s | FileCheck --check-prefix=ASM %s
+// RUN: llvm-mc -triple=amdgcn-amd-amdhsa -mcpu=gfx900 -filetype=obj %s | llvm-readobj -r --sections --section-data - | FileCheck --check-prefix=OBJ %s
+
+// Test that .amdgpu_info directives round-trip through the assembler (asm and
+// object emission) and produce the correct TLV-encoded .amdgpu.info section.
+
+ .text
+ .globl my_kernel
+ .p2align 8
+ .type my_kernel, at function
+my_kernel:
+ s_endpgm
+.Lfunc_end0:
+ .size my_kernel, .Lfunc_end0-my_kernel
+
+ .globl helper
+ .p2align 2
+ .type helper, at function
+helper:
+ s_setpc_b64 s[30:31]
+.Lfunc_end1:
+ .size helper, .Lfunc_end1-helper
+
+ .globl addr_taken_func
+ .p2align 2
+ .type addr_taken_func, at function
+addr_taken_func:
+ s_setpc_b64 s[30:31]
+.Lfunc_end2:
+ .size addr_taken_func, .Lfunc_end2-addr_taken_func
+
+ .globl extern_func
+
+// Kernel: flags=7 (KERNEL|VCC|FLAT_SCRATCH), resources, call edge, use edge,
+// indirect call, and signature.
+ .amdgpu_info my_kernel
+ .amdgpu_flags 7
+ .amdgpu_num_vgpr 32
+ .amdgpu_num_agpr 0
+ .amdgpu_num_sgpr 33
+ .amdgpu_num_named_barrier 0
+ .amdgpu_private_segment_size 0
+ .amdgpu_occupancy_lds_limit 65536
+ .amdgpu_call helper
+ .amdgpu_use lds_var
+ .amdgpu_indirect_call "vi"
+ .end_amdgpu_info
+
+// Device function: flags=2 (VCC), call edge to external.
+ .amdgpu_info helper
+ .amdgpu_flags 2
+ .amdgpu_num_vgpr 10
+ .amdgpu_num_agpr 0
+ .amdgpu_num_sgpr 8
+ .amdgpu_num_named_barrier 0
+ .amdgpu_private_segment_size 16
+ .amdgpu_occupancy_lds_limit 0
+ .amdgpu_call extern_func
+ .end_amdgpu_info
+
+// Address-taken function with signature.
+ .amdgpu_info addr_taken_func
+ .amdgpu_flags 0
+ .amdgpu_num_vgpr 4
+ .amdgpu_num_agpr 0
+ .amdgpu_num_sgpr 2
+ .amdgpu_num_named_barrier 0
+ .amdgpu_private_segment_size 0
+ .amdgpu_occupancy_lds_limit 0
+ .amdgpu_signature "vi"
+ .end_amdgpu_info
+
+// ASM: .amdgpu_info my_kernel
+// ASM: .amdgpu_flags 7
+// ASM: .amdgpu_num_vgpr 32
+// ASM: .amdgpu_num_agpr 0
+// ASM: .amdgpu_num_sgpr 33
+// ASM: .amdgpu_num_named_barrier 0
+// ASM: .amdgpu_private_segment_size 0
+// ASM: .amdgpu_occupancy_lds_limit 65536
+// ASM: .amdgpu_use lds_var
+// ASM: .amdgpu_call helper
+// ASM: .amdgpu_indirect_call "vi"
+// ASM: .end_amdgpu_info
+
+// ASM: .amdgpu_info helper
+// ASM: .amdgpu_flags 2
+// ASM: .amdgpu_num_vgpr 10
+// ASM: .amdgpu_num_agpr 0
+// ASM: .amdgpu_num_sgpr 8
+// ASM: .amdgpu_num_named_barrier 0
+// ASM: .amdgpu_private_segment_size 16
+// ASM: .amdgpu_occupancy_lds_limit 0
+// ASM: .amdgpu_call extern_func
+// ASM: .end_amdgpu_info
+
+// ASM: .amdgpu_info addr_taken_func
+// ASM: .amdgpu_flags 0
+// ASM: .amdgpu_num_vgpr 4
+// ASM: .amdgpu_num_agpr 0
+// ASM: .amdgpu_num_sgpr 2
+// ASM: .amdgpu_num_named_barrier 0
+// ASM: .amdgpu_private_segment_size 0
+// ASM: .amdgpu_occupancy_lds_limit 0
+// ASM: .amdgpu_signature "vi"
+// ASM: .end_amdgpu_info
+
+// OBJ: Section {
+// OBJ: Name: .amdgpu.info
+// OBJ: Type: SHT_PROGBITS
+// OBJ: Flags [
+// OBJ: SHF_EXCLUDE
+// OBJ: ]
+// OBJ: }
+
+// Relocations in .amdgpu.info should reference defined and external symbols.
+// OBJ-DAG: R_AMDGPU_ABS64 my_kernel
+// OBJ-DAG: R_AMDGPU_ABS64 helper
+// OBJ-DAG: R_AMDGPU_ABS64 addr_taken_func
+// OBJ-DAG: R_AMDGPU_ABS64 extern_func
+// OBJ-DAG: R_AMDGPU_ABS64 lds_var
More information about the llvm-branch-commits
mailing list