[llvm] [AMDGPU][MC] Function scope resource usage struct (PR #188031)
Janek van Oirschot via llvm-commits
llvm-commits at lists.llvm.org
Fri Mar 27 08:50:59 PDT 2026
https://github.com/JanekvO updated https://github.com/llvm/llvm-project/pull/188031
>From 5dbc1d50743d7b81f317de4a2d749679cac585a6 Mon Sep 17 00:00:00 2001
From: Janek van Oirschot <janek.vanoirschot at amd.com>
Date: Wed, 18 Mar 2026 14:26:51 +0000
Subject: [PATCH 1/3] [AMDGPU][MC] Function scope resource usage struct and
callgraph info
---
llvm/docs/AMDGPUUsage.rst | 74 +++++++++++-
llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp | 29 +++++
.../AMDGPU/AsmParser/AMDGPUAsmParser.cpp | 106 ++++++++++++++++
.../MCTargetDesc/AMDGPUTargetStreamer.cpp | 64 ++++++++++
.../MCTargetDesc/AMDGPUTargetStreamer.h | 25 ++++
.../AMDGPU/branch-relaxation-gfx1250.ll | 13 ++
llvm/test/CodeGen/AMDGPU/branch-relaxation.ll | 13 ++
llvm/test/CodeGen/AMDGPU/lds-relocs.ll | 3 +
.../CodeGen/AMDGPU/resource-info-section.ll | 98 +++++++++++++++
.../MC/AMDGPU/amdgpu-resource-usage-err.s | 64 ++++++++++
llvm/test/MC/AMDGPU/amdgpu-resource-usage.s | 114 ++++++++++++++++++
11 files changed, 601 insertions(+), 2 deletions(-)
create mode 100644 llvm/test/CodeGen/AMDGPU/resource-info-section.ll
create mode 100644 llvm/test/MC/AMDGPU/amdgpu-resource-usage-err.s
create mode 100644 llvm/test/MC/AMDGPU/amdgpu-resource-usage.s
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index 1ede5ca2d4cf6..23dfb1786f4ba 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -2354,8 +2354,8 @@ As part of the AMDGPU MC layer, AMDGPU provides the following target-specific
=================== ================= ========================================================
-Function Resource Usage
------------------------
+Function Resource Usage Symbols
+-------------------------------
A function's resource usage depends on each of its callees' resource usage. The
expressions used to denote resource usage reflect this by propagating each
@@ -2403,6 +2403,76 @@ unit's worst case (i.e, maxima) ``num_vgpr``, ``num_agpr``, and
symbolic expressions. These three symbols are ``amdgcn.max_num_vgpr``,
``amdgcn.max_num_agpr``, and ``amdgcn.max_num_sgpr``.
+Function Resource Usage Asm Directives
+--------------------------------------
+
+A function's resource usage depends on each of its callees' resource usage.
+Accomodating this are the AMDGPU resource usage assembler directives and ELF
+section. The assembler directives emit a pre- and post-marked sequence of
+assembler directives after every function that state a function's resource
+usage and callees. The resource usage this emit is **only for this function's
+usage** and does not yet consider the callees' resource usage. For the
+propagated resource usage, any user of the section or resource info will have
+to walk the callgraph and compute the total use.
+
+ .. table:: Function Resource Usage Asm Directives:
+ :name: function-usage-directive-table
+
+ ====================================== ========= ======================== ===================================================================================================
+ Directive Required? Occurrences Per Function Description
+ ====================================== ========= ======================== ===================================================================================================
+ .amdgpu_resource_usage <function name> yes 1 Denotes the start of resource usage directives for <function name>
+ .end_amdgpu_resource_usage yes 1 Denotes end of resource usage directives
+ .num_vgpr <i32> yes 1 Number of VGPRs used by the function
+ .num_agpr <i32> yes 1 Number of AGPRs used by the function
+ .num_sgpr <i32> yes 1 Number of SGPRs used by the function
+ .named_barrier <i32> yes 1 Number of named barriers used by the function
+ .private_seg_size <i32> yes 1 Total stack size required for the function
+ .uses_vcc <i1> yes 1 Boolean denoting whether vcc is used in the function
+ .uses_flat_scratch <i1> yes 1 Boolean denoting whether flat scratch is used in the function
+ .has_dyn_sized_stack <i1> yes 1 Boolean denoting whether stack in the function is dynamically sized
+ .has_recursion <i1> yes 1 Boolean denoting whether recursion is used in the function
+ .has_indirect_call <i1> yes 1 Boolean denoting whether the function has an indirect call
+ .callee <function name> no 0 or more Callee functions called by the function, each unique callee getting its own .callee directive
+ ====================================== ========= ======================== ===================================================================================================
+
+Function Resource Usage ELF Section
+-----------------------------------
+
+The resource usage section contains binary structs representing each resource
+usage entry for a function. The resource usage section is named
+.AMDGPU.resource_usage and has an additional relocation section (e.g.,
+.rela.AMDGPU.resource_usage) which holds information on which offset of the
+.AMDGPU.resource_usage section denotes which function in addition to tracking
+callers and callees.
+
+Resource usage is a binary struct of its required resource information. The
+booleans are packed into a flag of type i32. The total size of each resource
+usage struct is, therefore, 24-bytes (i.e., sizeof(num_vgpr) + sizeof(num_agpr)
++ sizeof(num_sgpr) + sizeof(named_barrier) + sizeof(private_seg_size) +
+sizeof(flags)). The flags are packed as follows:
+
+ .. table:: Function Resource Usage Flags:
+ :name: function-usage-flags-table
+
+ =========================== =======
+ Function usage property Bit
+ =========================== =======
+ uses_vcc 0
+ uses_flat_scratch 1
+ has_dyn_sized_stack 2
+ has_recursion 3
+ has_indirect_call 4
+ =========================== =======
+
+Resource usage relocation section contains the the offset into the resource
+usage section for each function in a translation unit. In addition to this
+function to resource usage entry mapping, it embeds the callees of each caller
+by having the first relocation to an offset denote the function the entry is
+mapped to and any subsequent relocations for that same offset denote a callee
+of the function mapped to that entry (similar to how CGProfile specifies the
+caller and callee relation.
+
.. _amdgpu-elf-code-object:
ELF Code Object
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
index 1f83df8099803..b07f0ad49b4fe 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
@@ -31,6 +31,7 @@
#include "Utils/AMDGPUBaseInfo.h"
#include "Utils/AMDKernelCodeTUtils.h"
#include "Utils/SIDefinesUtils.h"
+#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/Analysis/OptimizationRemarkEmitter.h"
#include "llvm/BinaryFormat/ELF.h"
#include "llvm/CodeGen/MachineFrameInfo.h"
@@ -758,6 +759,34 @@ bool AMDGPUAsmPrinter::runOnMachineFunction(MachineFunction &MF) {
OutContext, IsLocal));
}
+ // Emit per-function local resource usage and callee info into
+ // .AMDGPU.resource_info section.
+ {
+ uint32_t Flags = 0;
+ Flags |= (ResourceUsage->UsesVCC ? 1u : 0u) << 0;
+ Flags |= (ResourceUsage->UsesFlatScratch ? 1u : 0u) << 1;
+ Flags |= (ResourceUsage->HasDynamicallySizedStack ? 1u : 0u) << 2;
+ Flags |= (ResourceUsage->HasRecursion ? 1u : 0u) << 3;
+ Flags |= (ResourceUsage->HasIndirectCall ? 1u : 0u) << 4;
+
+ // Collect unique callee symbols.
+ SmallVector<MCSymbol *, 8> CalleeSyms;
+ SmallPtrSet<const Function *, 8> SeenCallees;
+ for (const Function *Callee : ResourceUsage->Callees) {
+ if (SeenCallees.insert(Callee).second)
+ CalleeSyms.push_back(MF.getTarget().getSymbol(Callee));
+ }
+
+ // Emit function local resource usage info. Does not contain any callee
+ // propagated resource info, users of this section info should be able to
+ // gather all resource info and walk the callgraph to combine for any
+ // callee resource info.
+ getTargetStreamer()->emitResourceUsageEntry(
+ CurrentFnSym, ResourceUsage->NumVGPR, ResourceUsage->NumAGPR,
+ ResourceUsage->NumExplicitSGPR, ResourceUsage->NumNamedBarrier,
+ ResourceUsage->PrivateSegmentSize, Flags, CalleeSyms);
+ }
+
// Emit _dvgpr$ symbol when appropriate.
emitDVgprSymbol(MF);
diff --git a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
index fddb36133afb8..1a350d5fee1c4 100644
--- a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
+++ b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
@@ -1422,6 +1422,7 @@ class AMDGPUAsmParser : public MCTargetAsmParser {
bool ParseDirectivePALMetadataBegin();
bool ParseDirectivePALMetadata();
bool ParseDirectiveAMDGPULDS();
+ bool ParseDirectiveAMDGPUResourceUsage();
/// Common code to parse out a block of text (typically YAML) between start and
/// end directives.
@@ -6741,6 +6742,108 @@ bool AMDGPUAsmParser::ParseDirectiveAMDGPULDS() {
return false;
}
+/// ParseDirectiveAMDGPUResourceUsage
+/// ::= .amdgpu_resource_usage <symbol>
+/// .num_vgpr <int>
+/// .num_agpr <int>
+/// .num_sgpr <int>
+/// .named_barrier <int>
+/// .private_seg_size <int>
+/// .uses_vcc <int>
+/// .uses_flat_scratch <int>
+/// .has_dyn_sized_stack <int>
+/// .has_recursion <int>
+/// .has_indirect_call <int>
+/// .callee <symbol> (zero or more)
+/// .end_amdgpu_resource_usage
+bool AMDGPUAsmParser::ParseDirectiveAMDGPUResourceUsage() {
+ StringRef SymName;
+ if (getParser().parseIdentifier(SymName))
+ return TokError("expected symbol name after .amdgpu_resource_usage");
+
+ StringSet<> Seen;
+ MCSymbol *FnSym = getContext().getOrCreateSymbol(SymName);
+
+ uint32_t NumVGPR = 0, NumAGPR = 0, NumSGPR = 0;
+ uint32_t NumNamedBarrier = 0, PrivateSegmentSize = 0;
+ uint32_t UsesVCC = 0, UsesFlatScratch = 0, HasDynSizedStack = 0;
+ uint32_t HasRecursion = 0, HasIndirectCall = 0;
+ SmallVector<MCSymbol *, 4> Callees;
+
+ while (true) {
+ while (trySkipToken(AsmToken::EndOfStatement))
+ ;
+
+ StringRef ID;
+ if (!parseId(ID, "expected field directive or .end_amdgpu_resource_usage"))
+ return true;
+
+ if (ID == ".end_amdgpu_resource_usage")
+ break;
+
+ if (ID == ".callee") {
+ StringRef CalleeName;
+ if (getParser().parseIdentifier(CalleeName))
+ return TokError("expected symbol name after .callee");
+ Callees.push_back(getContext().getOrCreateSymbol(CalleeName));
+ continue;
+ }
+
+ if (!Seen.insert(ID).second)
+ return TokError("resource usage directives already declared");
+
+ int64_t Val;
+ if (getParser().parseAbsoluteExpression(Val))
+ return true;
+ if (Val < 0)
+ return TokError("value must be non-negative");
+
+ if (ID == ".num_vgpr")
+ NumVGPR = Val;
+ else if (ID == ".num_agpr")
+ NumAGPR = Val;
+ else if (ID == ".num_sgpr")
+ NumSGPR = Val;
+ else if (ID == ".named_barrier")
+ NumNamedBarrier = Val;
+ else if (ID == ".private_seg_size")
+ PrivateSegmentSize = Val;
+ else if (ID == ".uses_vcc")
+ UsesVCC = Val;
+ else if (ID == ".uses_flat_scratch")
+ UsesFlatScratch = Val;
+ else if (ID == ".has_dyn_sized_stack")
+ HasDynSizedStack = Val;
+ else if (ID == ".has_recursion")
+ HasRecursion = Val;
+ else if (ID == ".has_indirect_call")
+ HasIndirectCall = Val;
+ else
+ return TokError("unknown field '" + ID + "' in .amdgpu_resource_usage");
+ }
+
+ for (StringRef StrRef :
+ {".num_vgpr", ".num_agpr", ".num_sgpr", ".named_barrier",
+ ".private_seg_size", ".uses_vcc", ".uses_flat_scratch",
+ ".has_dyn_sized_stack", ".has_recursion", ".has_indirect_call"}) {
+ if (!Seen.contains(StrRef))
+ return TokError("requires " + StrRef +
+ " directive in .amdgpu_resource_usage");
+ }
+
+ uint32_t Flags = 0;
+ Flags |= (UsesVCC ? 1u : 0u) << 0;
+ Flags |= (UsesFlatScratch ? 1u : 0u) << 1;
+ Flags |= (HasDynSizedStack ? 1u : 0u) << 2;
+ Flags |= (HasRecursion ? 1u : 0u) << 3;
+ Flags |= (HasIndirectCall ? 1u : 0u) << 4;
+
+ getTargetStreamer().emitResourceUsageEntry(
+ FnSym, NumVGPR, NumAGPR, NumSGPR, NumNamedBarrier, PrivateSegmentSize,
+ Flags, Callees);
+ return false;
+}
+
bool AMDGPUAsmParser::ParseDirective(AsmToken DirectiveID) {
StringRef IDVal = DirectiveID.getString();
@@ -6784,6 +6887,9 @@ bool AMDGPUAsmParser::ParseDirective(AsmToken DirectiveID) {
if (IDVal == PALMD::AssemblerDirective)
return ParseDirectivePALMetadata();
+ if (IDVal == ".amdgpu_resource_usage")
+ return ParseDirectiveAMDGPUResourceUsage();
+
return true;
}
diff --git a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp
index fae61302ebd90..3631efb41d360 100644
--- a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp
+++ b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp
@@ -660,6 +660,26 @@ void AMDGPUTargetAsmStreamer::EmitAmdhsaKernelDescriptor(
OS << "\t.end_amdhsa_kernel\n";
}
+void AMDGPUTargetAsmStreamer::emitResourceUsageEntry(
+ MCSymbol *FnSym, uint32_t NumVGPR, uint32_t NumAGPR, uint32_t NumSGPR,
+ uint32_t NumNamedBarrier, uint32_t PrivateSegmentSize, uint32_t Flags,
+ ArrayRef<MCSymbol *> Callees) {
+ OS << "\t.amdgpu_resource_usage " << FnSym->getName() << '\n';
+ OS << "\t\t.num_vgpr " << NumVGPR << '\n';
+ OS << "\t\t.num_agpr " << NumAGPR << '\n';
+ OS << "\t\t.num_sgpr " << NumSGPR << '\n';
+ OS << "\t\t.named_barrier " << NumNamedBarrier << '\n';
+ OS << "\t\t.private_seg_size " << PrivateSegmentSize << '\n';
+ OS << "\t\t.uses_vcc " << ((Flags >> 0) & 1) << '\n';
+ OS << "\t\t.uses_flat_scratch " << ((Flags >> 1) & 1) << '\n';
+ OS << "\t\t.has_dyn_sized_stack " << ((Flags >> 2) & 1) << '\n';
+ OS << "\t\t.has_recursion " << ((Flags >> 3) & 1) << '\n';
+ OS << "\t\t.has_indirect_call " << ((Flags >> 4) & 1) << '\n';
+ for (MCSymbol *Callee : Callees)
+ OS << "\t\t.callee " << Callee->getName() << '\n';
+ OS << "\t.end_amdgpu_resource_usage\n";
+}
+
//===----------------------------------------------------------------------===//
// AMDGPUTargetELFStreamer
//===----------------------------------------------------------------------===//
@@ -1061,3 +1081,47 @@ void AMDGPUTargetELFStreamer::EmitAmdhsaKernelDescriptor(
for (uint32_t i = 0; i < sizeof(amdhsa::kernel_descriptor_t::reserved3); ++i)
Streamer.emitInt8(0u);
}
+
+void AMDGPUTargetELFStreamer::emitResourceUsageEntry(
+ MCSymbol *FnSym, uint32_t NumVGPR, uint32_t NumAGPR, uint32_t NumSGPR,
+ uint32_t NumNamedBarrier, uint32_t PrivateSegmentSize, uint32_t Flags,
+ ArrayRef<MCSymbol *> Callees) {
+ auto &S = getStreamer();
+ auto &Context = S.getContext();
+ const unsigned ResourceInfoEntrySize = 24;
+
+ // TODO: Custom elf section type for support in linker.
+ MCSection *Sec =
+ Context.getELFSection(".AMDGPU.resource_info", ELF::SHT_PROGBITS,
+ ELF::SHF_EXCLUDE, ResourceInfoEntrySize);
+ S.pushSection();
+ S.switchSection(Sec);
+
+ // Emit R_AMDGPU_NONE relocation pointing to the function symbol.
+ // Use the current section offset (= ResourceInfoEntryCount * 24).
+ auto *SectionSym = MCSymbolRefExpr::create(Sec->getBeginSymbol(), Context);
+ auto *Offset = MCBinaryExpr::createAdd(
+ SectionSym,
+ MCConstantExpr::create(ResourceInfoEntryCount * ResourceInfoEntrySize,
+ Context),
+ Context);
+ S.emitRelocDirective(*Offset, "R_AMDGPU_NONE",
+ MCSymbolRefExpr::create(FnSym, Context));
+
+ // Emit callee relocations at the same offset as the function identity.
+ for (MCSymbol *Callee : Callees)
+ S.emitRelocDirective(*Offset, "R_AMDGPU_NONE",
+ MCSymbolRefExpr::create(Callee, Context));
+
+ ++ResourceInfoEntryCount;
+
+ // Emit the 24-byte struct.
+ S.emitIntValue(NumVGPR, 4);
+ S.emitIntValue(NumAGPR, 4);
+ S.emitIntValue(NumSGPR, 4);
+ S.emitIntValue(NumNamedBarrier, 4);
+ S.emitIntValue(PrivateSegmentSize, 4);
+ S.emitIntValue(Flags, 4);
+
+ S.popSection();
+}
diff --git a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.h b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.h
index 3a0d8dcd2d27c..ab15c83d51d5c 100644
--- a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.h
+++ b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.h
@@ -11,6 +11,7 @@
#include "Utils/AMDGPUBaseInfo.h"
#include "Utils/AMDGPUPALMetadata.h"
+#include "llvm/ADT/ArrayRef.h"
#include "llvm/MC/MCStreamer.h"
namespace llvm {
@@ -104,6 +105,15 @@ class AMDGPUTargetStreamer : public MCTargetStreamer {
const MCExpr *ReserveVCC,
const MCExpr *ReserveFlatScr) {}
+ /// Emit a per-function resource usage entry into the
+ /// .AMDGPU.resource_info section, along with callee relocations.
+ virtual void emitResourceUsageEntry(MCSymbol *FnSym, uint32_t NumVGPR,
+ uint32_t NumAGPR, uint32_t NumSGPR,
+ uint32_t NumNamedBarrier,
+ uint32_t PrivateSegmentSize,
+ uint32_t Flags,
+ ArrayRef<MCSymbol *> Callees = {}) {}
+
static StringRef getArchNameFromElfMach(unsigned ElfMach);
static unsigned getElfMach(StringRef GPU);
@@ -168,12 +178,21 @@ class AMDGPUTargetAsmStreamer final : public AMDGPUTargetStreamer {
const MCExpr *NextVGPR, const MCExpr *NextSGPR,
const MCExpr *ReserveVCC,
const MCExpr *ReserveFlatScr) override;
+
+ void emitResourceUsageEntry(MCSymbol *FnSym, uint32_t NumVGPR,
+ uint32_t NumAGPR, uint32_t NumSGPR,
+ uint32_t NumNamedBarrier,
+ uint32_t PrivateSegmentSize, uint32_t Flags,
+ ArrayRef<MCSymbol *> Callees = {}) override;
};
class AMDGPUTargetELFStreamer final : public AMDGPUTargetStreamer {
const MCSubtargetInfo &STI;
MCStreamer &Streamer;
+ // Counter for computing relocation offsets.
+ unsigned ResourceInfoEntryCount = 0;
+
void EmitNote(StringRef Name, const MCExpr *DescSize, unsigned NoteType,
function_ref<void(MCELFStreamer &)> EmitDesc);
@@ -221,6 +240,12 @@ class AMDGPUTargetELFStreamer final : public AMDGPUTargetStreamer {
const MCExpr *NextVGPR, const MCExpr *NextSGPR,
const MCExpr *ReserveVCC,
const MCExpr *ReserveFlatScr) override;
+
+ void emitResourceUsageEntry(MCSymbol *FnSym, uint32_t NumVGPR,
+ uint32_t NumAGPR, uint32_t NumSGPR,
+ uint32_t NumNamedBarrier,
+ uint32_t PrivateSegmentSize, uint32_t Flags,
+ ArrayRef<MCSymbol *> Callees = {}) override;
};
}
#endif
diff --git a/llvm/test/CodeGen/AMDGPU/branch-relaxation-gfx1250.ll b/llvm/test/CodeGen/AMDGPU/branch-relaxation-gfx1250.ll
index 779118bd33027..3b09af7368f50 100644
--- a/llvm/test/CodeGen/AMDGPU/branch-relaxation-gfx1250.ll
+++ b/llvm/test/CodeGen/AMDGPU/branch-relaxation-gfx1250.ll
@@ -9,6 +9,19 @@
; RUN: llvm-readobj -r %t.o | FileCheck --check-prefix=OBJ %s
; OBJ: Relocations [
+; OBJ-NEXT: Section (5) .rel.AMDGPU.resource_info {
+; OBJ-NEXT: 0x0 R_AMDGPU_NONE uniform_conditional_max_short_forward_branch
+; OBJ-NEXT: 0x18 R_AMDGPU_NONE uniform_conditional_min_long_forward_branch
+; OBJ-NEXT: 0x30 R_AMDGPU_NONE uniform_conditional_min_long_forward_vcnd_branch
+; OBJ-NEXT: 0x48 R_AMDGPU_NONE min_long_forward_vbranch
+; OBJ-NEXT: 0x60 R_AMDGPU_NONE long_backward_sbranch
+; OBJ-NEXT: 0x78 R_AMDGPU_NONE uniform_unconditional_min_long_forward_branch
+; OBJ-NEXT: 0x90 R_AMDGPU_NONE uniform_unconditional_min_long_backward_branch
+; OBJ-NEXT: 0xA8 R_AMDGPU_NONE expand_requires_expand
+; OBJ-NEXT: 0xC0 R_AMDGPU_NONE uniform_inside_divergent
+; OBJ-NEXT: 0xD8 R_AMDGPU_NONE analyze_mask_branch
+; OBJ-NEXT: 0xF0 R_AMDGPU_NONE long_branch_hang
+; OBJ-NEXT: }
; OBJ-NEXT: ]
; Restrict maximum branch to between +7 and -8 dwords
diff --git a/llvm/test/CodeGen/AMDGPU/branch-relaxation.ll b/llvm/test/CodeGen/AMDGPU/branch-relaxation.ll
index 155e54baf15a1..daa9994461af1 100644
--- a/llvm/test/CodeGen/AMDGPU/branch-relaxation.ll
+++ b/llvm/test/CodeGen/AMDGPU/branch-relaxation.ll
@@ -10,6 +10,19 @@
; RUN: llvm-readobj -r %t.o | FileCheck --check-prefix=OBJ %s
; OBJ: Relocations [
+; OBJ-NEXT: Section (5) .rel.AMDGPU.resource_info {
+; OBJ-NEXT: 0x0 R_AMDGPU_NONE uniform_conditional_max_short_forward_branch
+; OBJ-NEXT: 0x18 R_AMDGPU_NONE uniform_conditional_min_long_forward_branch
+; OBJ-NEXT: 0x30 R_AMDGPU_NONE uniform_conditional_min_long_forward_vcnd_branch
+; OBJ-NEXT: 0x48 R_AMDGPU_NONE min_long_forward_vbranch
+; OBJ-NEXT: 0x60 R_AMDGPU_NONE long_backward_sbranch
+; OBJ-NEXT: 0x78 R_AMDGPU_NONE uniform_unconditional_min_long_forward_branch
+; OBJ-NEXT: 0x90 R_AMDGPU_NONE uniform_unconditional_min_long_backward_branch
+; OBJ-NEXT: 0xA8 R_AMDGPU_NONE expand_requires_expand
+; OBJ-NEXT: 0xC0 R_AMDGPU_NONE uniform_inside_divergent
+; OBJ-NEXT: 0xD8 R_AMDGPU_NONE analyze_mask_branch
+; OBJ-NEXT: 0xF0 R_AMDGPU_NONE long_branch_hang
+; OBJ-NEXT: }
; OBJ-NEXT: ]
; Restrict maximum branch to between +7 and -8 dwords
diff --git a/llvm/test/CodeGen/AMDGPU/lds-relocs.ll b/llvm/test/CodeGen/AMDGPU/lds-relocs.ll
index 447cb62643384..2d45b28bab888 100644
--- a/llvm/test/CodeGen/AMDGPU/lds-relocs.ll
+++ b/llvm/test/CodeGen/AMDGPU/lds-relocs.ll
@@ -9,6 +9,9 @@
; ELF-NEXT: 0x{{[0-9A-F]*}} R_AMDGPU_ABS32_LO lds.external
; ELF-NEXT: 0x{{[0-9A-F]*}} R_AMDGPU_ABS32_LO lds.defined
; ELF-NEXT: }
+; ELF-NEXT: Section (6) .rel.AMDGPU.resource_info {
+; ELF-NEXT: 0x0 R_AMDGPU_NONE test_basic
+; ELF-NEXT: }
; ELF-NEXT: ]
; ELF: Symbol {
diff --git a/llvm/test/CodeGen/AMDGPU/resource-info-section.ll b/llvm/test/CodeGen/AMDGPU/resource-info-section.ll
new file mode 100644
index 0000000000000..9aa8f401a9f82
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/resource-info-section.ll
@@ -0,0 +1,98 @@
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -enable-ipra=0 < %s | FileCheck -check-prefix=ASM %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -enable-ipra=0 -filetype=obj < %s -o %t
+; RUN: llvm-readelf -r %t | FileCheck -check-prefix=RELOC %s
+
+; ASM-LABEL: {{^}}leaf:
+; ASM: .amdgpu_resource_usage leaf
+; ASM-NEXT: .num_vgpr 0
+; ASM-NEXT: .num_agpr 0
+; ASM-NEXT: .num_sgpr 32
+; ASM-NEXT: .named_barrier 0
+; ASM-NEXT: .private_seg_size 0
+; ASM-NEXT: .uses_vcc 0
+; ASM-NEXT: .uses_flat_scratch 0
+; ASM-NEXT: .has_dyn_sized_stack 0
+; ASM-NEXT: .has_recursion 0
+; ASM-NEXT: .has_indirect_call 0
+; ASM-NEXT: .end_amdgpu_resource_usage
+define void @leaf() {
+ ret void
+}
+
+; ASM-LABEL: {{^}}use_vcc:
+; ASM: .amdgpu_resource_usage use_vcc
+; ASM-NEXT: .num_vgpr 0
+; ASM-NEXT: .num_agpr 0
+; ASM-NEXT: .num_sgpr 32
+; ASM-NEXT: .named_barrier 0
+; ASM-NEXT: .private_seg_size 0
+; ASM-NEXT: .uses_vcc 1
+; ASM-NEXT: .uses_flat_scratch 0
+; ASM-NEXT: .has_dyn_sized_stack 0
+; ASM-NEXT: .has_recursion 0
+; ASM-NEXT: .has_indirect_call 0
+; ASM-NEXT: .end_amdgpu_resource_usage
+define void @use_vcc() {
+ call void asm sideeffect "", "~{vcc}" ()
+ ret void
+}
+
+; ASM-LABEL: {{^}}caller:
+; ASM: .amdgpu_resource_usage caller
+; ASM: .callee use_vcc
+; ASM-NEXT: .end_amdgpu_resource_usage
+define void @caller() {
+ call void @use_vcc()
+ ret void
+}
+
+; ASM-LABEL: {{^}}kernel:
+; ASM: .amdgpu_resource_usage kernel
+; ASM: .callee caller
+; ASM-NEXT: .end_amdgpu_resource_usage
+define amdgpu_kernel void @kernel() {
+ call void @caller()
+ ret void
+}
+
+
+; ASM-LABEL: {{^}}rcaller2:
+; ASM: .amdgpu_resource_usage rcaller2
+; ASM: .callee rcaller1
+; ASM-NEXT: .end_amdgpu_resource_usage
+; ASM-LABEL: {{^}}rcaller1:
+; ASM: .amdgpu_resource_usage rcaller1
+; ASM: .callee rcaller2
+; ASM-NEXT: .end_amdgpu_resource_usage
+define void @rcaller1() {
+ call void @rcaller2()
+ ret void
+}
+define void @rcaller2() {
+ call void @rcaller1()
+ ret void
+}
+
+; ASM-LABEL: {{^}}kernel_recurse:
+; ASM: .amdgpu_resource_usage kernel
+; ASM: .callee rcaller1
+; ASM-NEXT: .end_amdgpu_resource_usage
+define amdgpu_kernel void @kernel_recurse() {
+ call void @rcaller1()
+ ret void
+}
+
+; RELOC: Relocation section '.rela.AMDGPU.resource_info'
+; RELOC: 0000000000000000 {{[0-9a-f]+}} R_AMDGPU_NONE {{[0-9a-f]+}} leaf + 0
+; RELOC-NEXT: 0000000000000018 {{[0-9a-f]+}} R_AMDGPU_NONE {{[0-9a-f]+}} use_vcc + 0
+; RELOC-NEXT: 0000000000000030 {{[0-9a-f]+}} R_AMDGPU_NONE {{[0-9a-f]+}} caller + 0
+; RELOC-NEXT: 0000000000000030 {{[0-9a-f]+}} R_AMDGPU_NONE {{[0-9a-f]+}} use_vcc + 0
+; RELOC-NEXT: 0000000000000048 {{[0-9a-f]+}} R_AMDGPU_NONE {{[0-9a-f]+}} kernel + 0
+; RELOC-NEXT: 0000000000000048 {{[0-9a-f]+}} R_AMDGPU_NONE {{[0-9a-f]+}} caller + 0
+; RELOC-NEXT: 0000000000000060 {{[0-9a-f]+}} R_AMDGPU_NONE {{[0-9a-f]+}} rcaller2 + 0
+; RELOC-NEXT: 0000000000000060 {{[0-9a-f]+}} R_AMDGPU_NONE {{[0-9a-f]+}} rcaller1 + 0
+; RELOC-NEXT: 0000000000000078 {{[0-9a-f]+}} R_AMDGPU_NONE {{[0-9a-f]+}} rcaller1 + 0
+; RELOC-NEXT: 0000000000000078 {{[0-9a-f]+}} R_AMDGPU_NONE {{[0-9a-f]+}} rcaller2 + 0
+; RELOC-NEXT: 0000000000000090 {{[0-9a-f]+}} R_AMDGPU_NONE {{[0-9a-f]+}} kernel_recurse + 0
+; RELOC-NEXT: 0000000000000090 {{[0-9a-f]+}} R_AMDGPU_NONE {{[0-9a-f]+}} rcaller1 + 0
+
diff --git a/llvm/test/MC/AMDGPU/amdgpu-resource-usage-err.s b/llvm/test/MC/AMDGPU/amdgpu-resource-usage-err.s
new file mode 100644
index 0000000000000..e7db9189adc5b
--- /dev/null
+++ b/llvm/test/MC/AMDGPU/amdgpu-resource-usage-err.s
@@ -0,0 +1,64 @@
+// RUN: not llvm-mc -triple amdgcn-amd-amdhsa -mcpu=gfx900 %s -o - -filetype=null 2>&1 | FileCheck %s
+
+// Missing symbol name after .amdgpu_resource_usage.
+ .amdgpu_resource_usage
+// CHECK: :[[@LINE-1]]:{{[0-9]+}}: error: expected symbol name after .amdgpu_resource_usage
+
+// Missing symbol name after .callee.
+ .amdgpu_resource_usage fn1
+ .num_vgpr 0
+ .num_agpr 0
+ .num_sgpr 0
+ .named_barrier 0
+ .private_seg_size 0
+ .uses_vcc 0
+ .uses_flat_scratch 0
+ .has_dyn_sized_stack 0
+ .has_recursion 0
+ .has_indirect_call 0
+ .callee
+// CHECK: :[[@LINE-1]]:{{[0-9]+}}: error: expected symbol name after .callee
+ .end_amdgpu_resource_usage
+
+// Duplicate field directive.
+ .amdgpu_resource_usage fn2
+ .num_vgpr 0
+ .num_vgpr 1
+// CHECK: :[[@LINE-1]]:{{[0-9]+}}: error: resource usage directives already declared
+ .end_amdgpu_resource_usage
+
+// Negative value.
+ .amdgpu_resource_usage fn3
+ .num_vgpr -1
+// CHECK: :[[@LINE-1]]:{{[0-9]+}}: error: value must be non-negative
+ .end_amdgpu_resource_usage
+
+// Unknown field.
+ .amdgpu_resource_usage fn4
+ .num_vgpr 0
+ .num_agpr 0
+ .num_sgpr 0
+ .named_barrier 0
+ .private_seg_size 0
+ .uses_vcc 0
+ .uses_flat_scratch 0
+ .has_dyn_sized_stack 0
+ .has_recursion 0
+ .has_indirect_call 0
+ .bogus_field 42
+// CHECK: :[[@LINE-1]]:{{[0-9]+}}: error: unknown field '.bogus_field' in .amdgpu_resource_usage
+ .end_amdgpu_resource_usage
+
+// Missing required field (.num_sgpr omitted).
+ .amdgpu_resource_usage fn5
+ .num_vgpr 0
+ .num_agpr 0
+ .named_barrier 0
+ .private_seg_size 0
+ .uses_vcc 0
+ .uses_flat_scratch 0
+ .has_dyn_sized_stack 0
+ .has_recursion 0
+ .has_indirect_call 0
+ .end_amdgpu_resource_usage
+// CHECK: :[[@LINE-1]]:{{[0-9]+}}: error: requires .num_sgpr directive in .amdgpu_resource_usage
diff --git a/llvm/test/MC/AMDGPU/amdgpu-resource-usage.s b/llvm/test/MC/AMDGPU/amdgpu-resource-usage.s
new file mode 100644
index 0000000000000..7d10684361526
--- /dev/null
+++ b/llvm/test/MC/AMDGPU/amdgpu-resource-usage.s
@@ -0,0 +1,114 @@
+// RUN: llvm-mc -triple amdgcn-amd-amdhsa -mcpu=gfx900 < %s | FileCheck --check-prefix=ASM %s
+// RUN: llvm-mc -triple amdgcn-amd-amdhsa -mcpu=gfx900 -filetype=obj < %s > %t
+// RUN: llvm-readelf -S %t | FileCheck --check-prefix=ELF-SEC %s
+// RUN: llvm-readelf -r %t | FileCheck --check-prefix=ELF-RELOC %s
+
+.text
+
+.globl bar
+.type bar, at function
+bar:
+ s_endpgm
+
+.type baz, at function
+baz:
+ s_endpgm
+
+.globl foo
+.type foo, at function
+foo:
+ s_endpgm
+
+.extern external_fn
+
+// ASM: .amdgpu_resource_usage bar
+// ASM-NEXT: .num_vgpr 65
+// ASM-NEXT: .num_agpr 0
+// ASM-NEXT: .num_sgpr 25
+// ASM-NEXT: .named_barrier 0
+// ASM-NEXT: .private_seg_size 16
+// ASM-NEXT: .uses_vcc 1
+// ASM-NEXT: .uses_flat_scratch 0
+// ASM-NEXT: .has_dyn_sized_stack 0
+// ASM-NEXT: .has_recursion 0
+// ASM-NEXT: .has_indirect_call 0
+// ASM-NEXT: .callee baz
+// ASM-NEXT: .end_amdgpu_resource_usage
+ .amdgpu_resource_usage bar
+ .num_vgpr 65
+ .num_agpr 0
+ .num_sgpr 25
+ .named_barrier 0
+ .private_seg_size 16
+ .uses_vcc 1
+ .uses_flat_scratch 0
+ .has_dyn_sized_stack 0
+ .has_recursion 0
+ .has_indirect_call 0
+ .callee baz
+ .end_amdgpu_resource_usage
+
+// ASM: .amdgpu_resource_usage foo
+// ASM-NEXT: .num_vgpr 10
+// ASM-NEXT: .num_agpr 4
+// ASM-NEXT: .num_sgpr 8
+// ASM-NEXT: .named_barrier 2
+// ASM-NEXT: .private_seg_size 0
+// ASM-NEXT: .uses_vcc 0
+// ASM-NEXT: .uses_flat_scratch 1
+// ASM-NEXT: .has_dyn_sized_stack 1
+// ASM-NEXT: .has_recursion 1
+// ASM-NEXT: .has_indirect_call 1
+// ASM-NEXT: .callee bar
+// ASM-NEXT: .callee external_fn
+// ASM-NEXT: .end_amdgpu_resource_usage
+ .amdgpu_resource_usage foo
+ .num_vgpr 10
+ .num_agpr 4
+ .num_sgpr 8
+ .named_barrier 2
+ .private_seg_size 0
+ .uses_vcc 0
+ .uses_flat_scratch 1
+ .has_dyn_sized_stack 1
+ .has_recursion 1
+ .has_indirect_call 1
+ .callee bar
+ .callee external_fn
+ .end_amdgpu_resource_usage
+
+// ASM: .amdgpu_resource_usage baz
+// ASM-NEXT: .num_vgpr 2
+// ASM-NEXT: .num_agpr 0
+// ASM-NEXT: .num_sgpr 4
+// ASM-NEXT: .named_barrier 0
+// ASM-NEXT: .private_seg_size 0
+// ASM-NEXT: .uses_vcc 0
+// ASM-NEXT: .uses_flat_scratch 0
+// ASM-NEXT: .has_dyn_sized_stack 0
+// ASM-NEXT: .has_recursion 0
+// ASM-NEXT: .has_indirect_call 0
+// ASM-NEXT: .end_amdgpu_resource_usage
+ .amdgpu_resource_usage baz
+ .num_vgpr 2
+ .num_agpr 0
+ .num_sgpr 4
+ .named_barrier 0
+ .private_seg_size 0
+ .uses_vcc 0
+ .uses_flat_scratch 0
+ .has_dyn_sized_stack 0
+ .has_recursion 0
+ .has_indirect_call 0
+ .end_amdgpu_resource_usage
+
+// ELF-SEC: .AMDGPU.resource_info PROGBITS {{[0-9a-f]+}} {{[0-9a-f]+}} 000048 18 E 0 0 1
+
+// ELF-RELOC: Relocation section '.rela.AMDGPU.resource_info'
+// ELF-RELOC: 0000000000000000 {{[0-9a-f]+}} R_AMDGPU_NONE {{[0-9a-f]+}} bar + 0
+// ELF-RELOC-NEXT: 0000000000000000 {{[0-9a-f]+}} R_AMDGPU_NONE {{[0-9a-f]+}} baz + 0
+// ELF-RELOC-NEXT: 0000000000000018 {{[0-9a-f]+}} R_AMDGPU_NONE {{[0-9a-f]+}} foo + 0
+// ELF-RELOC-NEXT: 0000000000000018 {{[0-9a-f]+}} R_AMDGPU_NONE {{[0-9a-f]+}} bar + 0
+// ELF-RELOC-NEXT: 0000000000000018 {{[0-9a-f]+}} R_AMDGPU_NONE {{[0-9a-f]+}} external_fn + 0
+// ELF-RELOC-NEXT: 0000000000000030 {{[0-9a-f]+}} R_AMDGPU_NONE {{[0-9a-f]+}} baz + 0
+
>From ce026c013f0f1762356f4252ccf063a395a31f9c Mon Sep 17 00:00:00 2001
From: Janek van Oirschot <janek.vanoirschot at amd.com>
Date: Wed, 25 Mar 2026 16:25:16 +0000
Subject: [PATCH 2/3] Remove callees, remove flags that can be inferred by
callgraph, rephrase docs
---
llvm/docs/AMDGPUUsage.rst | 37 +++--------
llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp | 19 +-----
.../AMDGPU/AsmParser/AMDGPUAsmParser.cpp | 34 +++--------
.../MCTargetDesc/AMDGPUTargetStreamer.cpp | 17 +-----
.../MCTargetDesc/AMDGPUTargetStreamer.h | 13 ++--
.../AMDGPU/branch-relaxation-gfx1250.ll | 2 +-
llvm/test/CodeGen/AMDGPU/branch-relaxation.ll | 2 +-
llvm/test/CodeGen/AMDGPU/lds-relocs.ll | 2 +-
.../CodeGen/AMDGPU/resource-info-section.ll | 61 +------------------
.../MC/AMDGPU/amdgpu-resource-usage-err.s | 22 +------
llvm/test/MC/AMDGPU/amdgpu-resource-usage.s | 25 +-------
11 files changed, 36 insertions(+), 198 deletions(-)
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index 23dfb1786f4ba..cb54603e42753 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -2408,12 +2408,10 @@ Function Resource Usage Asm Directives
A function's resource usage depends on each of its callees' resource usage.
Accomodating this are the AMDGPU resource usage assembler directives and ELF
-section. The assembler directives emit a pre- and post-marked sequence of
-assembler directives after every function that state a function's resource
-usage and callees. The resource usage this emit is **only for this function's
-usage** and does not yet consider the callees' resource usage. For the
-propagated resource usage, any user of the section or resource info will have
-to walk the callgraph and compute the total use.
+section. The assembler directives emit a pre- and post-marked sequence after
+every function that state a function's resource usage. The resource usage
+emitted is **only for this function's usage** and does not yet consider the
+callees' resource usage.
.. table:: Function Resource Usage Asm Directives:
:name: function-usage-directive-table
@@ -2431,25 +2429,20 @@ to walk the callgraph and compute the total use.
.uses_vcc <i1> yes 1 Boolean denoting whether vcc is used in the function
.uses_flat_scratch <i1> yes 1 Boolean denoting whether flat scratch is used in the function
.has_dyn_sized_stack <i1> yes 1 Boolean denoting whether stack in the function is dynamically sized
- .has_recursion <i1> yes 1 Boolean denoting whether recursion is used in the function
- .has_indirect_call <i1> yes 1 Boolean denoting whether the function has an indirect call
- .callee <function name> no 0 or more Callee functions called by the function, each unique callee getting its own .callee directive
====================================== ========= ======================== ===================================================================================================
Function Resource Usage ELF Section
-----------------------------------
The resource usage section contains binary structs representing each resource
-usage entry for a function. The resource usage section is named
-.AMDGPU.resource_usage and has an additional relocation section (e.g.,
-.rela.AMDGPU.resource_usage) which holds information on which offset of the
-.AMDGPU.resource_usage section denotes which function in addition to tracking
-callers and callees.
+usage entry for a function. This section is named .AMDGPU.resource_usage and
+has an additional relocation section (e.g., .rela.AMDGPU.resource_usage) that
+maps the offsets within the .AMDGPU.resource_usage section to each function.
Resource usage is a binary struct of its required resource information. The
-booleans are packed into a flag of type i32. The total size of each resource
-usage struct is, therefore, 24-bytes (i.e., sizeof(num_vgpr) + sizeof(num_agpr)
-+ sizeof(num_sgpr) + sizeof(named_barrier) + sizeof(private_seg_size) +
+booleans elements are packed into a flag of type i32. The total size of each
+resource usage struct is 24-bytes (i.e., sizeof(num_vgpr) + sizeof(num_agpr) +
+sizeof(num_sgpr) + sizeof(named_barrier) + sizeof(private_seg_size) +
sizeof(flags)). The flags are packed as follows:
.. table:: Function Resource Usage Flags:
@@ -2461,18 +2454,8 @@ sizeof(flags)). The flags are packed as follows:
uses_vcc 0
uses_flat_scratch 1
has_dyn_sized_stack 2
- has_recursion 3
- has_indirect_call 4
=========================== =======
-Resource usage relocation section contains the the offset into the resource
-usage section for each function in a translation unit. In addition to this
-function to resource usage entry mapping, it embeds the callees of each caller
-by having the first relocation to an offset denote the function the entry is
-mapped to and any subsequent relocations for that same offset denote a callee
-of the function mapped to that entry (similar to how CGProfile specifies the
-caller and callee relation.
-
.. _amdgpu-elf-code-object:
ELF Code Object
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
index b07f0ad49b4fe..865b3022e864a 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
@@ -759,32 +759,19 @@ bool AMDGPUAsmPrinter::runOnMachineFunction(MachineFunction &MF) {
OutContext, IsLocal));
}
- // Emit per-function local resource usage and callee info into
- // .AMDGPU.resource_info section.
+ // Emit per-function local resource usage into .AMDGPU.resource_usage section.
{
uint32_t Flags = 0;
Flags |= (ResourceUsage->UsesVCC ? 1u : 0u) << 0;
Flags |= (ResourceUsage->UsesFlatScratch ? 1u : 0u) << 1;
Flags |= (ResourceUsage->HasDynamicallySizedStack ? 1u : 0u) << 2;
- Flags |= (ResourceUsage->HasRecursion ? 1u : 0u) << 3;
- Flags |= (ResourceUsage->HasIndirectCall ? 1u : 0u) << 4;
-
- // Collect unique callee symbols.
- SmallVector<MCSymbol *, 8> CalleeSyms;
- SmallPtrSet<const Function *, 8> SeenCallees;
- for (const Function *Callee : ResourceUsage->Callees) {
- if (SeenCallees.insert(Callee).second)
- CalleeSyms.push_back(MF.getTarget().getSymbol(Callee));
- }
// Emit function local resource usage info. Does not contain any callee
- // propagated resource info, users of this section info should be able to
- // gather all resource info and walk the callgraph to combine for any
- // callee resource info.
+ // propagated resource info.
getTargetStreamer()->emitResourceUsageEntry(
CurrentFnSym, ResourceUsage->NumVGPR, ResourceUsage->NumAGPR,
ResourceUsage->NumExplicitSGPR, ResourceUsage->NumNamedBarrier,
- ResourceUsage->PrivateSegmentSize, Flags, CalleeSyms);
+ ResourceUsage->PrivateSegmentSize, Flags);
}
// Emit _dvgpr$ symbol when appropriate.
diff --git a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
index 1a350d5fee1c4..3f703bf5f2ba3 100644
--- a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
+++ b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
@@ -6752,9 +6752,6 @@ bool AMDGPUAsmParser::ParseDirectiveAMDGPULDS() {
/// .uses_vcc <int>
/// .uses_flat_scratch <int>
/// .has_dyn_sized_stack <int>
-/// .has_recursion <int>
-/// .has_indirect_call <int>
-/// .callee <symbol> (zero or more)
/// .end_amdgpu_resource_usage
bool AMDGPUAsmParser::ParseDirectiveAMDGPUResourceUsage() {
StringRef SymName;
@@ -6767,8 +6764,6 @@ bool AMDGPUAsmParser::ParseDirectiveAMDGPUResourceUsage() {
uint32_t NumVGPR = 0, NumAGPR = 0, NumSGPR = 0;
uint32_t NumNamedBarrier = 0, PrivateSegmentSize = 0;
uint32_t UsesVCC = 0, UsesFlatScratch = 0, HasDynSizedStack = 0;
- uint32_t HasRecursion = 0, HasIndirectCall = 0;
- SmallVector<MCSymbol *, 4> Callees;
while (true) {
while (trySkipToken(AsmToken::EndOfStatement))
@@ -6781,14 +6776,6 @@ bool AMDGPUAsmParser::ParseDirectiveAMDGPUResourceUsage() {
if (ID == ".end_amdgpu_resource_usage")
break;
- if (ID == ".callee") {
- StringRef CalleeName;
- if (getParser().parseIdentifier(CalleeName))
- return TokError("expected symbol name after .callee");
- Callees.push_back(getContext().getOrCreateSymbol(CalleeName));
- continue;
- }
-
if (!Seen.insert(ID).second)
return TokError("resource usage directives already declared");
@@ -6814,20 +6801,15 @@ bool AMDGPUAsmParser::ParseDirectiveAMDGPUResourceUsage() {
UsesFlatScratch = Val;
else if (ID == ".has_dyn_sized_stack")
HasDynSizedStack = Val;
- else if (ID == ".has_recursion")
- HasRecursion = Val;
- else if (ID == ".has_indirect_call")
- HasIndirectCall = Val;
else
return TokError("unknown field '" + ID + "' in .amdgpu_resource_usage");
}
- for (StringRef StrRef :
- {".num_vgpr", ".num_agpr", ".num_sgpr", ".named_barrier",
- ".private_seg_size", ".uses_vcc", ".uses_flat_scratch",
- ".has_dyn_sized_stack", ".has_recursion", ".has_indirect_call"}) {
+ for (StringRef StrRef : {".num_vgpr", ".num_agpr", ".num_sgpr",
+ ".named_barrier", ".private_seg_size", ".uses_vcc",
+ ".uses_flat_scratch", ".has_dyn_sized_stack"}) {
if (!Seen.contains(StrRef))
- return TokError("requires " + StrRef +
+ return TokError("missing required " + StrRef +
" directive in .amdgpu_resource_usage");
}
@@ -6835,12 +6817,10 @@ bool AMDGPUAsmParser::ParseDirectiveAMDGPUResourceUsage() {
Flags |= (UsesVCC ? 1u : 0u) << 0;
Flags |= (UsesFlatScratch ? 1u : 0u) << 1;
Flags |= (HasDynSizedStack ? 1u : 0u) << 2;
- Flags |= (HasRecursion ? 1u : 0u) << 3;
- Flags |= (HasIndirectCall ? 1u : 0u) << 4;
- getTargetStreamer().emitResourceUsageEntry(
- FnSym, NumVGPR, NumAGPR, NumSGPR, NumNamedBarrier, PrivateSegmentSize,
- Flags, Callees);
+ getTargetStreamer().emitResourceUsageEntry(FnSym, NumVGPR, NumAGPR, NumSGPR,
+ NumNamedBarrier,
+ PrivateSegmentSize, Flags);
return false;
}
diff --git a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp
index 3631efb41d360..a9348edc12642 100644
--- a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp
+++ b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp
@@ -662,8 +662,7 @@ void AMDGPUTargetAsmStreamer::EmitAmdhsaKernelDescriptor(
void AMDGPUTargetAsmStreamer::emitResourceUsageEntry(
MCSymbol *FnSym, uint32_t NumVGPR, uint32_t NumAGPR, uint32_t NumSGPR,
- uint32_t NumNamedBarrier, uint32_t PrivateSegmentSize, uint32_t Flags,
- ArrayRef<MCSymbol *> Callees) {
+ uint32_t NumNamedBarrier, uint32_t PrivateSegmentSize, uint32_t Flags) {
OS << "\t.amdgpu_resource_usage " << FnSym->getName() << '\n';
OS << "\t\t.num_vgpr " << NumVGPR << '\n';
OS << "\t\t.num_agpr " << NumAGPR << '\n';
@@ -673,10 +672,6 @@ void AMDGPUTargetAsmStreamer::emitResourceUsageEntry(
OS << "\t\t.uses_vcc " << ((Flags >> 0) & 1) << '\n';
OS << "\t\t.uses_flat_scratch " << ((Flags >> 1) & 1) << '\n';
OS << "\t\t.has_dyn_sized_stack " << ((Flags >> 2) & 1) << '\n';
- OS << "\t\t.has_recursion " << ((Flags >> 3) & 1) << '\n';
- OS << "\t\t.has_indirect_call " << ((Flags >> 4) & 1) << '\n';
- for (MCSymbol *Callee : Callees)
- OS << "\t\t.callee " << Callee->getName() << '\n';
OS << "\t.end_amdgpu_resource_usage\n";
}
@@ -1084,15 +1079,14 @@ void AMDGPUTargetELFStreamer::EmitAmdhsaKernelDescriptor(
void AMDGPUTargetELFStreamer::emitResourceUsageEntry(
MCSymbol *FnSym, uint32_t NumVGPR, uint32_t NumAGPR, uint32_t NumSGPR,
- uint32_t NumNamedBarrier, uint32_t PrivateSegmentSize, uint32_t Flags,
- ArrayRef<MCSymbol *> Callees) {
+ uint32_t NumNamedBarrier, uint32_t PrivateSegmentSize, uint32_t Flags) {
auto &S = getStreamer();
auto &Context = S.getContext();
const unsigned ResourceInfoEntrySize = 24;
// TODO: Custom elf section type for support in linker.
MCSection *Sec =
- Context.getELFSection(".AMDGPU.resource_info", ELF::SHT_PROGBITS,
+ Context.getELFSection(".AMDGPU.resource_usage", ELF::SHT_PROGBITS,
ELF::SHF_EXCLUDE, ResourceInfoEntrySize);
S.pushSection();
S.switchSection(Sec);
@@ -1108,11 +1102,6 @@ void AMDGPUTargetELFStreamer::emitResourceUsageEntry(
S.emitRelocDirective(*Offset, "R_AMDGPU_NONE",
MCSymbolRefExpr::create(FnSym, Context));
- // Emit callee relocations at the same offset as the function identity.
- for (MCSymbol *Callee : Callees)
- S.emitRelocDirective(*Offset, "R_AMDGPU_NONE",
- MCSymbolRefExpr::create(Callee, Context));
-
++ResourceInfoEntryCount;
// Emit the 24-byte struct.
diff --git a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.h b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.h
index ab15c83d51d5c..d9cc6470fa50a 100644
--- a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.h
+++ b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.h
@@ -106,13 +106,12 @@ class AMDGPUTargetStreamer : public MCTargetStreamer {
const MCExpr *ReserveFlatScr) {}
/// Emit a per-function resource usage entry into the
- /// .AMDGPU.resource_info section, along with callee relocations.
+ /// .AMDGPU.resource_usage section.
virtual void emitResourceUsageEntry(MCSymbol *FnSym, uint32_t NumVGPR,
uint32_t NumAGPR, uint32_t NumSGPR,
uint32_t NumNamedBarrier,
uint32_t PrivateSegmentSize,
- uint32_t Flags,
- ArrayRef<MCSymbol *> Callees = {}) {}
+ uint32_t Flags) {}
static StringRef getArchNameFromElfMach(unsigned ElfMach);
static unsigned getElfMach(StringRef GPU);
@@ -182,8 +181,8 @@ class AMDGPUTargetAsmStreamer final : public AMDGPUTargetStreamer {
void emitResourceUsageEntry(MCSymbol *FnSym, uint32_t NumVGPR,
uint32_t NumAGPR, uint32_t NumSGPR,
uint32_t NumNamedBarrier,
- uint32_t PrivateSegmentSize, uint32_t Flags,
- ArrayRef<MCSymbol *> Callees = {}) override;
+ uint32_t PrivateSegmentSize,
+ uint32_t Flags) override;
};
class AMDGPUTargetELFStreamer final : public AMDGPUTargetStreamer {
@@ -244,8 +243,8 @@ class AMDGPUTargetELFStreamer final : public AMDGPUTargetStreamer {
void emitResourceUsageEntry(MCSymbol *FnSym, uint32_t NumVGPR,
uint32_t NumAGPR, uint32_t NumSGPR,
uint32_t NumNamedBarrier,
- uint32_t PrivateSegmentSize, uint32_t Flags,
- ArrayRef<MCSymbol *> Callees = {}) override;
+ uint32_t PrivateSegmentSize,
+ uint32_t Flags) override;
};
}
#endif
diff --git a/llvm/test/CodeGen/AMDGPU/branch-relaxation-gfx1250.ll b/llvm/test/CodeGen/AMDGPU/branch-relaxation-gfx1250.ll
index 3b09af7368f50..b61c27e80b018 100644
--- a/llvm/test/CodeGen/AMDGPU/branch-relaxation-gfx1250.ll
+++ b/llvm/test/CodeGen/AMDGPU/branch-relaxation-gfx1250.ll
@@ -9,7 +9,7 @@
; RUN: llvm-readobj -r %t.o | FileCheck --check-prefix=OBJ %s
; OBJ: Relocations [
-; OBJ-NEXT: Section (5) .rel.AMDGPU.resource_info {
+; OBJ-NEXT: Section (5) .rel.AMDGPU.resource_usage {
; OBJ-NEXT: 0x0 R_AMDGPU_NONE uniform_conditional_max_short_forward_branch
; OBJ-NEXT: 0x18 R_AMDGPU_NONE uniform_conditional_min_long_forward_branch
; OBJ-NEXT: 0x30 R_AMDGPU_NONE uniform_conditional_min_long_forward_vcnd_branch
diff --git a/llvm/test/CodeGen/AMDGPU/branch-relaxation.ll b/llvm/test/CodeGen/AMDGPU/branch-relaxation.ll
index daa9994461af1..ac72b632d1eca 100644
--- a/llvm/test/CodeGen/AMDGPU/branch-relaxation.ll
+++ b/llvm/test/CodeGen/AMDGPU/branch-relaxation.ll
@@ -10,7 +10,7 @@
; RUN: llvm-readobj -r %t.o | FileCheck --check-prefix=OBJ %s
; OBJ: Relocations [
-; OBJ-NEXT: Section (5) .rel.AMDGPU.resource_info {
+; OBJ-NEXT: Section (5) .rel.AMDGPU.resource_usage {
; OBJ-NEXT: 0x0 R_AMDGPU_NONE uniform_conditional_max_short_forward_branch
; OBJ-NEXT: 0x18 R_AMDGPU_NONE uniform_conditional_min_long_forward_branch
; OBJ-NEXT: 0x30 R_AMDGPU_NONE uniform_conditional_min_long_forward_vcnd_branch
diff --git a/llvm/test/CodeGen/AMDGPU/lds-relocs.ll b/llvm/test/CodeGen/AMDGPU/lds-relocs.ll
index 2d45b28bab888..6ef8fe71ee2e8 100644
--- a/llvm/test/CodeGen/AMDGPU/lds-relocs.ll
+++ b/llvm/test/CodeGen/AMDGPU/lds-relocs.ll
@@ -9,7 +9,7 @@
; ELF-NEXT: 0x{{[0-9A-F]*}} R_AMDGPU_ABS32_LO lds.external
; ELF-NEXT: 0x{{[0-9A-F]*}} R_AMDGPU_ABS32_LO lds.defined
; ELF-NEXT: }
-; ELF-NEXT: Section (6) .rel.AMDGPU.resource_info {
+; ELF-NEXT: Section (6) .rel.AMDGPU.resource_usage {
; ELF-NEXT: 0x0 R_AMDGPU_NONE test_basic
; ELF-NEXT: }
; ELF-NEXT: ]
diff --git a/llvm/test/CodeGen/AMDGPU/resource-info-section.ll b/llvm/test/CodeGen/AMDGPU/resource-info-section.ll
index 9aa8f401a9f82..e745dffaa4fab 100644
--- a/llvm/test/CodeGen/AMDGPU/resource-info-section.ll
+++ b/llvm/test/CodeGen/AMDGPU/resource-info-section.ll
@@ -12,8 +12,6 @@
; ASM-NEXT: .uses_vcc 0
; ASM-NEXT: .uses_flat_scratch 0
; ASM-NEXT: .has_dyn_sized_stack 0
-; ASM-NEXT: .has_recursion 0
-; ASM-NEXT: .has_indirect_call 0
; ASM-NEXT: .end_amdgpu_resource_usage
define void @leaf() {
ret void
@@ -29,70 +27,13 @@ define void @leaf() {
; ASM-NEXT: .uses_vcc 1
; ASM-NEXT: .uses_flat_scratch 0
; ASM-NEXT: .has_dyn_sized_stack 0
-; ASM-NEXT: .has_recursion 0
-; ASM-NEXT: .has_indirect_call 0
; ASM-NEXT: .end_amdgpu_resource_usage
define void @use_vcc() {
call void asm sideeffect "", "~{vcc}" ()
ret void
}
-; ASM-LABEL: {{^}}caller:
-; ASM: .amdgpu_resource_usage caller
-; ASM: .callee use_vcc
-; ASM-NEXT: .end_amdgpu_resource_usage
-define void @caller() {
- call void @use_vcc()
- ret void
-}
-
-; ASM-LABEL: {{^}}kernel:
-; ASM: .amdgpu_resource_usage kernel
-; ASM: .callee caller
-; ASM-NEXT: .end_amdgpu_resource_usage
-define amdgpu_kernel void @kernel() {
- call void @caller()
- ret void
-}
-
-
-; ASM-LABEL: {{^}}rcaller2:
-; ASM: .amdgpu_resource_usage rcaller2
-; ASM: .callee rcaller1
-; ASM-NEXT: .end_amdgpu_resource_usage
-; ASM-LABEL: {{^}}rcaller1:
-; ASM: .amdgpu_resource_usage rcaller1
-; ASM: .callee rcaller2
-; ASM-NEXT: .end_amdgpu_resource_usage
-define void @rcaller1() {
- call void @rcaller2()
- ret void
-}
-define void @rcaller2() {
- call void @rcaller1()
- ret void
-}
-
-; ASM-LABEL: {{^}}kernel_recurse:
-; ASM: .amdgpu_resource_usage kernel
-; ASM: .callee rcaller1
-; ASM-NEXT: .end_amdgpu_resource_usage
-define amdgpu_kernel void @kernel_recurse() {
- call void @rcaller1()
- ret void
-}
-
-; RELOC: Relocation section '.rela.AMDGPU.resource_info'
+; RELOC: Relocation section '.rela.AMDGPU.resource_usage'
; RELOC: 0000000000000000 {{[0-9a-f]+}} R_AMDGPU_NONE {{[0-9a-f]+}} leaf + 0
; RELOC-NEXT: 0000000000000018 {{[0-9a-f]+}} R_AMDGPU_NONE {{[0-9a-f]+}} use_vcc + 0
-; RELOC-NEXT: 0000000000000030 {{[0-9a-f]+}} R_AMDGPU_NONE {{[0-9a-f]+}} caller + 0
-; RELOC-NEXT: 0000000000000030 {{[0-9a-f]+}} R_AMDGPU_NONE {{[0-9a-f]+}} use_vcc + 0
-; RELOC-NEXT: 0000000000000048 {{[0-9a-f]+}} R_AMDGPU_NONE {{[0-9a-f]+}} kernel + 0
-; RELOC-NEXT: 0000000000000048 {{[0-9a-f]+}} R_AMDGPU_NONE {{[0-9a-f]+}} caller + 0
-; RELOC-NEXT: 0000000000000060 {{[0-9a-f]+}} R_AMDGPU_NONE {{[0-9a-f]+}} rcaller2 + 0
-; RELOC-NEXT: 0000000000000060 {{[0-9a-f]+}} R_AMDGPU_NONE {{[0-9a-f]+}} rcaller1 + 0
-; RELOC-NEXT: 0000000000000078 {{[0-9a-f]+}} R_AMDGPU_NONE {{[0-9a-f]+}} rcaller1 + 0
-; RELOC-NEXT: 0000000000000078 {{[0-9a-f]+}} R_AMDGPU_NONE {{[0-9a-f]+}} rcaller2 + 0
-; RELOC-NEXT: 0000000000000090 {{[0-9a-f]+}} R_AMDGPU_NONE {{[0-9a-f]+}} kernel_recurse + 0
-; RELOC-NEXT: 0000000000000090 {{[0-9a-f]+}} R_AMDGPU_NONE {{[0-9a-f]+}} rcaller1 + 0
diff --git a/llvm/test/MC/AMDGPU/amdgpu-resource-usage-err.s b/llvm/test/MC/AMDGPU/amdgpu-resource-usage-err.s
index e7db9189adc5b..77e1f30597ebd 100644
--- a/llvm/test/MC/AMDGPU/amdgpu-resource-usage-err.s
+++ b/llvm/test/MC/AMDGPU/amdgpu-resource-usage-err.s
@@ -4,22 +4,6 @@
.amdgpu_resource_usage
// CHECK: :[[@LINE-1]]:{{[0-9]+}}: error: expected symbol name after .amdgpu_resource_usage
-// Missing symbol name after .callee.
- .amdgpu_resource_usage fn1
- .num_vgpr 0
- .num_agpr 0
- .num_sgpr 0
- .named_barrier 0
- .private_seg_size 0
- .uses_vcc 0
- .uses_flat_scratch 0
- .has_dyn_sized_stack 0
- .has_recursion 0
- .has_indirect_call 0
- .callee
-// CHECK: :[[@LINE-1]]:{{[0-9]+}}: error: expected symbol name after .callee
- .end_amdgpu_resource_usage
-
// Duplicate field directive.
.amdgpu_resource_usage fn2
.num_vgpr 0
@@ -43,8 +27,6 @@
.uses_vcc 0
.uses_flat_scratch 0
.has_dyn_sized_stack 0
- .has_recursion 0
- .has_indirect_call 0
.bogus_field 42
// CHECK: :[[@LINE-1]]:{{[0-9]+}}: error: unknown field '.bogus_field' in .amdgpu_resource_usage
.end_amdgpu_resource_usage
@@ -58,7 +40,5 @@
.uses_vcc 0
.uses_flat_scratch 0
.has_dyn_sized_stack 0
- .has_recursion 0
- .has_indirect_call 0
.end_amdgpu_resource_usage
-// CHECK: :[[@LINE-1]]:{{[0-9]+}}: error: requires .num_sgpr directive in .amdgpu_resource_usage
+// CHECK: :[[@LINE-1]]:{{[0-9]+}}: error: missing required .num_sgpr directive in .amdgpu_resource_usage
diff --git a/llvm/test/MC/AMDGPU/amdgpu-resource-usage.s b/llvm/test/MC/AMDGPU/amdgpu-resource-usage.s
index 7d10684361526..c0483ec63a50a 100644
--- a/llvm/test/MC/AMDGPU/amdgpu-resource-usage.s
+++ b/llvm/test/MC/AMDGPU/amdgpu-resource-usage.s
@@ -30,9 +30,6 @@ foo:
// ASM-NEXT: .uses_vcc 1
// ASM-NEXT: .uses_flat_scratch 0
// ASM-NEXT: .has_dyn_sized_stack 0
-// ASM-NEXT: .has_recursion 0
-// ASM-NEXT: .has_indirect_call 0
-// ASM-NEXT: .callee baz
// ASM-NEXT: .end_amdgpu_resource_usage
.amdgpu_resource_usage bar
.num_vgpr 65
@@ -43,9 +40,6 @@ foo:
.uses_vcc 1
.uses_flat_scratch 0
.has_dyn_sized_stack 0
- .has_recursion 0
- .has_indirect_call 0
- .callee baz
.end_amdgpu_resource_usage
// ASM: .amdgpu_resource_usage foo
@@ -57,10 +51,6 @@ foo:
// ASM-NEXT: .uses_vcc 0
// ASM-NEXT: .uses_flat_scratch 1
// ASM-NEXT: .has_dyn_sized_stack 1
-// ASM-NEXT: .has_recursion 1
-// ASM-NEXT: .has_indirect_call 1
-// ASM-NEXT: .callee bar
-// ASM-NEXT: .callee external_fn
// ASM-NEXT: .end_amdgpu_resource_usage
.amdgpu_resource_usage foo
.num_vgpr 10
@@ -71,10 +61,6 @@ foo:
.uses_vcc 0
.uses_flat_scratch 1
.has_dyn_sized_stack 1
- .has_recursion 1
- .has_indirect_call 1
- .callee bar
- .callee external_fn
.end_amdgpu_resource_usage
// ASM: .amdgpu_resource_usage baz
@@ -86,8 +72,6 @@ foo:
// ASM-NEXT: .uses_vcc 0
// ASM-NEXT: .uses_flat_scratch 0
// ASM-NEXT: .has_dyn_sized_stack 0
-// ASM-NEXT: .has_recursion 0
-// ASM-NEXT: .has_indirect_call 0
// ASM-NEXT: .end_amdgpu_resource_usage
.amdgpu_resource_usage baz
.num_vgpr 2
@@ -98,17 +82,12 @@ foo:
.uses_vcc 0
.uses_flat_scratch 0
.has_dyn_sized_stack 0
- .has_recursion 0
- .has_indirect_call 0
.end_amdgpu_resource_usage
-// ELF-SEC: .AMDGPU.resource_info PROGBITS {{[0-9a-f]+}} {{[0-9a-f]+}} 000048 18 E 0 0 1
+// ELF-SEC: .AMDGPU.resource_usage PROGBITS {{[0-9a-f]+}} {{[0-9a-f]+}} 000048 18 E 0 0 1
-// ELF-RELOC: Relocation section '.rela.AMDGPU.resource_info'
+// ELF-RELOC: Relocation section '.rela.AMDGPU.resource_usage'
// ELF-RELOC: 0000000000000000 {{[0-9a-f]+}} R_AMDGPU_NONE {{[0-9a-f]+}} bar + 0
-// ELF-RELOC-NEXT: 0000000000000000 {{[0-9a-f]+}} R_AMDGPU_NONE {{[0-9a-f]+}} baz + 0
// ELF-RELOC-NEXT: 0000000000000018 {{[0-9a-f]+}} R_AMDGPU_NONE {{[0-9a-f]+}} foo + 0
-// ELF-RELOC-NEXT: 0000000000000018 {{[0-9a-f]+}} R_AMDGPU_NONE {{[0-9a-f]+}} bar + 0
-// ELF-RELOC-NEXT: 0000000000000018 {{[0-9a-f]+}} R_AMDGPU_NONE {{[0-9a-f]+}} external_fn + 0
// ELF-RELOC-NEXT: 0000000000000030 {{[0-9a-f]+}} R_AMDGPU_NONE {{[0-9a-f]+}} baz + 0
>From 108fccf884c1a45f3f298b7e194313e280ebeafa Mon Sep 17 00:00:00 2001
From: Janek van Oirschot <janek.vanoirschot at amd.com>
Date: Fri, 27 Mar 2026 15:50:36 +0000
Subject: [PATCH 3/3] Use conservative defaults
---
llvm/docs/AMDGPUUsage.rst | 20 +++++-----
.../AMDGPU/AsmParser/AMDGPUAsmParser.cpp | 8 ----
.../MCTargetDesc/AMDGPUTargetStreamer.cpp | 38 ++++++++++++++++++-
.../MCTargetDesc/AMDGPUTargetStreamer.h | 4 ++
.../MC/AMDGPU/amdgpu-resource-usage-err.s | 11 ------
llvm/test/MC/AMDGPU/amdgpu-resource-usage.s | 9 ++++-
6 files changed, 59 insertions(+), 31 deletions(-)
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index cb54603e42753..584d7c30e796d 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -2419,16 +2419,16 @@ callees' resource usage.
====================================== ========= ======================== ===================================================================================================
Directive Required? Occurrences Per Function Description
====================================== ========= ======================== ===================================================================================================
- .amdgpu_resource_usage <function name> yes 1 Denotes the start of resource usage directives for <function name>
- .end_amdgpu_resource_usage yes 1 Denotes end of resource usage directives
- .num_vgpr <i32> yes 1 Number of VGPRs used by the function
- .num_agpr <i32> yes 1 Number of AGPRs used by the function
- .num_sgpr <i32> yes 1 Number of SGPRs used by the function
- .named_barrier <i32> yes 1 Number of named barriers used by the function
- .private_seg_size <i32> yes 1 Total stack size required for the function
- .uses_vcc <i1> yes 1 Boolean denoting whether vcc is used in the function
- .uses_flat_scratch <i1> yes 1 Boolean denoting whether flat scratch is used in the function
- .has_dyn_sized_stack <i1> yes 1 Boolean denoting whether stack in the function is dynamically sized
+ .amdgpu_resource_usage <function name> no 0 or 1 Denotes the start of resource usage directives for <function name>
+ .end_amdgpu_resource_usage no 0 or 1 Denotes end of resource usage directives
+ .num_vgpr <i32> no 0 or 1 Number of VGPRs used by the function (default: max addressable VGPRs)
+ .num_agpr <i32> no 0 or 1 Number of AGPRs used by the function (default: max addressable AGPRs, or 0 if unsupported)
+ .num_sgpr <i32> no 0 or 1 Number of SGPRs used by the function (default: max addressable SGPRs)
+ .named_barrier <i32> no 0 or 1 Number of named barriers used by the function (default: 16)
+ .private_seg_size <i32> no 0 or 1 Total stack size required for the function (default: max wave scratch size)
+ .uses_vcc <i1> no 0 or 1 Boolean denoting whether vcc is used in the function (default: 1)
+ .uses_flat_scratch <i1> no 0 or 1 Boolean denoting whether flat scratch is used in the function (default: 1)
+ .has_dyn_sized_stack <i1> no 0 or 1 Boolean denoting whether stack in the function is dynamically sized (default: 1)
====================================== ========= ======================== ===================================================================================================
Function Resource Usage ELF Section
diff --git a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
index 3f703bf5f2ba3..b305d3450534f 100644
--- a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
+++ b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
@@ -6805,14 +6805,6 @@ bool AMDGPUAsmParser::ParseDirectiveAMDGPUResourceUsage() {
return TokError("unknown field '" + ID + "' in .amdgpu_resource_usage");
}
- for (StringRef StrRef : {".num_vgpr", ".num_agpr", ".num_sgpr",
- ".named_barrier", ".private_seg_size", ".uses_vcc",
- ".uses_flat_scratch", ".has_dyn_sized_stack"}) {
- if (!Seen.contains(StrRef))
- return TokError("missing required " + StrRef +
- " directive in .amdgpu_resource_usage");
- }
-
uint32_t Flags = 0;
Flags |= (UsesVCC ? 1u : 0u) << 0;
Flags |= (UsesFlatScratch ? 1u : 0u) << 1;
diff --git a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp
index a9348edc12642..b829a4c4f349f 100644
--- a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp
+++ b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp
@@ -15,6 +15,7 @@
#include "AMDGPUMCKernelDescriptor.h"
#include "AMDGPUMCTargetDesc.h"
#include "AMDGPUPTNote.h"
+#include "SIDefines.h"
#include "Utils/AMDGPUBaseInfo.h"
#include "Utils/AMDKernelCodeTUtils.h"
#include "llvm/BinaryFormat/AMDGPUMetadataVerifier.h"
@@ -25,6 +26,7 @@
#include "llvm/MC/MCELFObjectWriter.h"
#include "llvm/MC/MCELFStreamer.h"
#include "llvm/MC/MCSubtargetInfo.h"
+#include "llvm/MC/MCSymbolELF.h"
#include "llvm/Support/AMDGPUMetadata.h"
#include "llvm/Support/AMDHSAKernelDescriptor.h"
#include "llvm/Support/CommandLine.h"
@@ -689,8 +691,40 @@ MCELFStreamer &AMDGPUTargetELFStreamer::getStreamer() {
// A hook for emitting stuff at the end.
// We use it for emitting the accumulated PAL metadata as a .note record.
-// The PAL metadata is reset after it is emitted.
+// The PAL metadata is reset after it is emitted and emitting default function
+// resource usage values for functions with undetermined resource usage.
void AMDGPUTargetELFStreamer::finish() {
+ if (STI.getTargetTriple().getOS() == Triple::AMDHSA) {
+ // Use conservative defaults.
+ uint32_t NumVGPR = IsaInfo::getAddressableNumVGPRs(&STI, 0);
+ uint32_t NumAGPR =
+ hasMAIInsts(STI) ? IsaInfo::getAddressableNumArchVGPRs(&STI) : 0;
+ uint32_t NumSGPR = IsaInfo::getAddressableNumSGPRs(&STI);
+ uint32_t NumNamedBarrier = AMDGPU::Barrier::NAMED_BARRIER_LAST;
+ // Computed similarly to GCNSubtarget::getMaxWaveScratchSize.
+ uint32_t PrivateSegmentSize;
+ if (isGFX12Plus(STI))
+ PrivateSegmentSize = (64 * 4) * ((1 << 18) - 1);
+ else if (isGFX11(STI))
+ PrivateSegmentSize = (64 * 4) * ((1 << 15) - 1);
+ else
+ PrivateSegmentSize = (256 * 4) * ((1 << 13) - 1);
+ // Assumes all boolean flags set: uses_vcc | uses_flat_scratch |
+ // has_dyn_sized_stack.
+ uint32_t Flags = 0x7;
+
+ for (const MCSymbol &Sym : getStreamer().getAssembler().symbols()) {
+ // Only emit conservative defaults for function with no resource usage
+ // info entries yet.
+ auto &SymELF = static_cast<const MCSymbolELF &>(Sym);
+ if (SymELF.getType() == ELF::STT_FUNC && SymELF.isDefined() &&
+ !FunctionsWithResourceUsage.contains(&Sym))
+ emitResourceUsageEntry(const_cast<MCSymbol *>(&Sym), NumVGPR, NumAGPR,
+ NumSGPR, NumNamedBarrier, PrivateSegmentSize,
+ Flags);
+ }
+ }
+
ELFObjectWriter &W = getStreamer().getWriter();
W.setELFHeaderEFlags(getEFlags());
W.setOverrideABIVersion(
@@ -1080,6 +1114,8 @@ void AMDGPUTargetELFStreamer::EmitAmdhsaKernelDescriptor(
void AMDGPUTargetELFStreamer::emitResourceUsageEntry(
MCSymbol *FnSym, uint32_t NumVGPR, uint32_t NumAGPR, uint32_t NumSGPR,
uint32_t NumNamedBarrier, uint32_t PrivateSegmentSize, uint32_t Flags) {
+ FunctionsWithResourceUsage.insert(FnSym);
+
auto &S = getStreamer();
auto &Context = S.getContext();
const unsigned ResourceInfoEntrySize = 24;
diff --git a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.h b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.h
index d9cc6470fa50a..680b35337bcd7 100644
--- a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.h
+++ b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.h
@@ -12,6 +12,7 @@
#include "Utils/AMDGPUBaseInfo.h"
#include "Utils/AMDGPUPALMetadata.h"
#include "llvm/ADT/ArrayRef.h"
+#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/MC/MCStreamer.h"
namespace llvm {
@@ -192,6 +193,9 @@ class AMDGPUTargetELFStreamer final : public AMDGPUTargetStreamer {
// Counter for computing relocation offsets.
unsigned ResourceInfoEntryCount = 0;
+ // Track functions with explicit resource usage entries.
+ SmallPtrSet<const MCSymbol *, 8> FunctionsWithResourceUsage;
+
void EmitNote(StringRef Name, const MCExpr *DescSize, unsigned NoteType,
function_ref<void(MCELFStreamer &)> EmitDesc);
diff --git a/llvm/test/MC/AMDGPU/amdgpu-resource-usage-err.s b/llvm/test/MC/AMDGPU/amdgpu-resource-usage-err.s
index 77e1f30597ebd..7a27a73f857a8 100644
--- a/llvm/test/MC/AMDGPU/amdgpu-resource-usage-err.s
+++ b/llvm/test/MC/AMDGPU/amdgpu-resource-usage-err.s
@@ -31,14 +31,3 @@
// CHECK: :[[@LINE-1]]:{{[0-9]+}}: error: unknown field '.bogus_field' in .amdgpu_resource_usage
.end_amdgpu_resource_usage
-// Missing required field (.num_sgpr omitted).
- .amdgpu_resource_usage fn5
- .num_vgpr 0
- .num_agpr 0
- .named_barrier 0
- .private_seg_size 0
- .uses_vcc 0
- .uses_flat_scratch 0
- .has_dyn_sized_stack 0
- .end_amdgpu_resource_usage
-// CHECK: :[[@LINE-1]]:{{[0-9]+}}: error: missing required .num_sgpr directive in .amdgpu_resource_usage
diff --git a/llvm/test/MC/AMDGPU/amdgpu-resource-usage.s b/llvm/test/MC/AMDGPU/amdgpu-resource-usage.s
index c0483ec63a50a..68e41f0288f3d 100644
--- a/llvm/test/MC/AMDGPU/amdgpu-resource-usage.s
+++ b/llvm/test/MC/AMDGPU/amdgpu-resource-usage.s
@@ -19,6 +19,10 @@ baz:
foo:
s_endpgm
+.type quux, at function
+quux:
+ s_endpgm
+
.extern external_fn
// ASM: .amdgpu_resource_usage bar
@@ -84,10 +88,13 @@ foo:
.has_dyn_sized_stack 0
.end_amdgpu_resource_usage
-// ELF-SEC: .AMDGPU.resource_usage PROGBITS {{[0-9a-f]+}} {{[0-9a-f]+}} 000048 18 E 0 0 1
+// ASM-NOT: .amdgpu_resource_usage quux
+
+// ELF-SEC: .AMDGPU.resource_usage PROGBITS {{[0-9a-f]+}} {{[0-9a-f]+}} 000060 18 E 0 0 1
// ELF-RELOC: Relocation section '.rela.AMDGPU.resource_usage'
// ELF-RELOC: 0000000000000000 {{[0-9a-f]+}} R_AMDGPU_NONE {{[0-9a-f]+}} bar + 0
// ELF-RELOC-NEXT: 0000000000000018 {{[0-9a-f]+}} R_AMDGPU_NONE {{[0-9a-f]+}} foo + 0
// ELF-RELOC-NEXT: 0000000000000030 {{[0-9a-f]+}} R_AMDGPU_NONE {{[0-9a-f]+}} baz + 0
+// ELF-RELOC-NEXT: 0000000000000048 {{[0-9a-f]+}} R_AMDGPU_NONE {{[0-9a-f]+}} quux + 0
More information about the llvm-commits
mailing list