[llvm-branch-commits] [llvm] AMDGPU: Replace amdgpu-no-agpr with amdgpu-num-agpr (PR #129893)
via llvm-branch-commits
llvm-branch-commits at lists.llvm.org
Wed Mar 5 07:46:50 PST 2025
llvmbot wrote:
<!--LLVM PR SUMMARY COMMENT-->
@llvm/pr-subscribers-backend-amdgpu
Author: Matt Arsenault (arsenm)
<details>
<summary>Changes</summary>
This performs the minimal replacment of amdgpu-no-agpr to
amdgpu-num-agpr=0. Most of the test diffs are due to the new
attribute sorting later alphabetically.
We could do better by trying to perform range merging in the attributor,
and trying to pick non-0 values.
---
Patch is 168.24 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/129893.diff
45 Files Affected:
- (modified) llvm/docs/AMDGPUUsage.rst (+1-6)
- (modified) llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp (+7-2)
- (modified) llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp (+4-1)
- (modified) llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp (+1-7)
- (modified) llvm/test/CodeGen/AMDGPU/addrspacecast-constantexpr.ll (+2-2)
- (modified) llvm/test/CodeGen/AMDGPU/agpr-copy-no-free-registers-assertion-after-ra-failure.ll (+1-1)
- (modified) llvm/test/CodeGen/AMDGPU/agpr-copy-no-free-registers.ll (+2-2)
- (modified) llvm/test/CodeGen/AMDGPU/amdgpu-attributor-no-agpr.ll (+3-3)
- (modified) llvm/test/CodeGen/AMDGPU/amdgpu-no-agprs-violations.ll (+7-6)
- (modified) llvm/test/CodeGen/AMDGPU/amdgpu-num-agpr.ll (+6-6)
- (modified) llvm/test/CodeGen/AMDGPU/amdhsa-kernarg-preload-num-sgprs.ll (+1-1)
- (modified) llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa-call.ll (+21-21)
- (modified) llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa.ll (+13-13)
- (modified) llvm/test/CodeGen/AMDGPU/annotate-kernel-features.ll (+9-9)
- (modified) llvm/test/CodeGen/AMDGPU/attributor-flatscratchinit.ll (+6-6)
- (modified) llvm/test/CodeGen/AMDGPU/captured-frame-index.ll (+1-1)
- (modified) llvm/test/CodeGen/AMDGPU/copy-vgpr-clobber-spill-vgpr.mir (+1-1)
- (modified) llvm/test/CodeGen/AMDGPU/direct-indirect-call.ll (+1-1)
- (modified) llvm/test/CodeGen/AMDGPU/duplicate-attribute-indirect.ll (+1-1)
- (modified) llvm/test/CodeGen/AMDGPU/implicitarg-offset-attributes.ll (+13-13)
- (modified) llvm/test/CodeGen/AMDGPU/indirect-call-set-from-other-function.ll (+1-1)
- (modified) llvm/test/CodeGen/AMDGPU/invalid-hidden-kernarg-in-kernel-signature.ll (+1-1)
- (modified) llvm/test/CodeGen/AMDGPU/issue120256-annotate-constexpr-addrspacecast.ll (+2-2)
- (modified) llvm/test/CodeGen/AMDGPU/mfma-bf16-vgpr-cd-select.ll (+1-1)
- (modified) llvm/test/CodeGen/AMDGPU/mfma-cd-select.ll (+2-2)
- (modified) llvm/test/CodeGen/AMDGPU/mfma-vgpr-cd-select-gfx942.ll (+1-1)
- (modified) llvm/test/CodeGen/AMDGPU/mfma-vgpr-cd-select.ll (+1-1)
- (modified) llvm/test/CodeGen/AMDGPU/preload-implicit-kernargs.ll (+1-1)
- (modified) llvm/test/CodeGen/AMDGPU/preload-kernargs.ll (+1-1)
- (modified) llvm/test/CodeGen/AMDGPU/propagate-flat-work-group-size.ll (+9-9)
- (modified) llvm/test/CodeGen/AMDGPU/propagate-waves-per-eu.ll (+21-21)
- (modified) llvm/test/CodeGen/AMDGPU/recursive_global_initializer.ll (+1-1)
- (modified) llvm/test/CodeGen/AMDGPU/remove-no-kernel-id-attribute.ll (+5-5)
- (modified) llvm/test/CodeGen/AMDGPU/simple-indirect-call-2.ll (+3-3)
- (modified) llvm/test/CodeGen/AMDGPU/simple-indirect-call.ll (+1-1)
- (modified) llvm/test/CodeGen/AMDGPU/smfmac_no_agprs.ll (+1-1)
- (modified) llvm/test/CodeGen/AMDGPU/spill-regpressure-less.mir (+1-1)
- (modified) llvm/test/CodeGen/AMDGPU/uniform-work-group-attribute-missing.ll (+1-1)
- (modified) llvm/test/CodeGen/AMDGPU/uniform-work-group-multistep.ll (+3-3)
- (modified) llvm/test/CodeGen/AMDGPU/uniform-work-group-nested-function-calls.ll (+2-2)
- (modified) llvm/test/CodeGen/AMDGPU/uniform-work-group-prevent-attribute-propagation.ll (+2-2)
- (modified) llvm/test/CodeGen/AMDGPU/uniform-work-group-propagate-attribute.ll (+2-2)
- (modified) llvm/test/CodeGen/AMDGPU/uniform-work-group-recursion-test.ll (+3-3)
- (modified) llvm/test/CodeGen/AMDGPU/uniform-work-group-test.ll (+1-1)
- (modified) llvm/test/CodeGen/AMDGPU/vgpr-agpr-limit-gfx90a.ll (+6-6)
``````````diff
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index c317223f49d7c..def6addd595e8 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -1698,11 +1698,6 @@ The AMDGPU backend supports the following LLVM IR attributes.
``amdgpu_max_num_work_groups`` CLANG attribute [CLANG-ATTR]_. Clang only
emits this attribute when all the three numbers are >= 1.
- "amdgpu-no-agpr" Indicates the function will not require allocating AGPRs. This is only
- relevant on subtargets with AGPRs. The behavior is undefined if a
- function which requires AGPRs is reached through any function marked
- with this attribute.
-
"amdgpu-hidden-argument" This attribute is used internally by the backend to mark function arguments
as hidden. Hidden arguments are managed by the compiler and are not part of
the explicit arguments supplied by the user.
@@ -1721,7 +1716,7 @@ The AMDGPU backend supports the following LLVM IR attributes.
The behavior is undefined if a function which requires more AGPRs than the
lower bound is reached through any function marked with a higher value of this
attribute. A minimum value of 0 indicates the function does not require
- any AGPRs. A minimum of 0 is equivalent to "amdgpu-no-agpr".
+ any AGPRs.
This is only relevant on targets with AGPRs which support accum_offset (gfx90a+).
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
index 546db318c17d5..cfff66fa07f98 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
@@ -1235,6 +1235,8 @@ static bool inlineAsmUsesAGPRs(const InlineAsm *IA) {
return false;
}
+// TODO: Migrate to range merge of amdgpu-agpr-alloc.
+// FIXME: Why is this using Attribute::NoUnwind?
struct AAAMDGPUNoAGPR
: public IRAttribute<Attribute::NoUnwind,
StateWrapper<BooleanState, AbstractAttribute>,
@@ -1250,7 +1252,10 @@ struct AAAMDGPUNoAGPR
void initialize(Attributor &A) override {
Function *F = getAssociatedFunction();
- if (F->hasFnAttribute("amdgpu-no-agpr"))
+ auto [MinNumAGPR, MaxNumAGPR] =
+ AMDGPU::getIntegerPairAttribute(*F, "amdgpu-agpr-alloc", {~0u, ~0u},
+ /*OnlyFirstRequired=*/true);
+ if (MinNumAGPR == 0)
indicateOptimisticFixpoint();
}
@@ -1297,7 +1302,7 @@ struct AAAMDGPUNoAGPR
return ChangeStatus::UNCHANGED;
LLVMContext &Ctx = getAssociatedFunction()->getContext();
return A.manifestAttrs(getIRPosition(),
- {Attribute::get(Ctx, "amdgpu-no-agpr")});
+ {Attribute::get(Ctx, "amdgpu-agpr-alloc", "0")});
}
const std::string getName() const override { return "AAAMDGPUNoAGPR"; }
diff --git a/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp b/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
index a83fc2d188de2..abd19c988a7eb 100644
--- a/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
@@ -780,5 +780,8 @@ bool SIMachineFunctionInfo::initializeBaseYamlFields(
}
bool SIMachineFunctionInfo::mayUseAGPRs(const Function &F) const {
- return !F.hasFnAttribute("amdgpu-no-agpr");
+ auto [MinNumAGPR, MaxNumAGPR] =
+ AMDGPU::getIntegerPairAttribute(F, "amdgpu-agpr-alloc", {~0u, ~0u},
+ /*OnlyFirstRequired=*/true);
+ return MinNumAGPR != 0u;
}
diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp b/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
index 669495f1c3185..adadf8e4e4e65 100644
--- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
@@ -571,7 +571,6 @@ MCRegister SIRegisterInfo::reservedPrivateSegmentBufferReg(
std::pair<unsigned, unsigned>
SIRegisterInfo::getMaxNumVectorRegs(const MachineFunction &MF) const {
- const SIMachineFunctionInfo *MFI = MF.getInfo<SIMachineFunctionInfo>();
const unsigned MaxVectorRegs = ST.getMaxNumVGPRs(MF);
unsigned MaxNumVGPRs = MaxVectorRegs;
@@ -592,7 +591,6 @@ SIRegisterInfo::getMaxNumVectorRegs(const MachineFunction &MF) const {
const std::pair<unsigned, unsigned> DefaultNumAGPR = {~0u, ~0u};
- // TODO: Replace amdgpu-no-agpr with amdgpu-agpr-alloc=0
// TODO: Move this logic into subtarget on IR function
//
// TODO: The lower bound should probably force the number of required
@@ -603,11 +601,7 @@ SIRegisterInfo::getMaxNumVectorRegs(const MachineFunction &MF) const {
if (MinNumAGPRs == DefaultNumAGPR.first) {
// Default to splitting half the registers if AGPRs are required.
-
- if (MFI->mayNeedAGPRs())
- MinNumAGPRs = MaxNumAGPRs = MaxVectorRegs / 2;
- else
- MinNumAGPRs = 0;
+ MinNumAGPRs = MaxNumAGPRs = MaxVectorRegs / 2;
} else {
// Align to accum_offset's allocation granularity.
MinNumAGPRs = alignTo(MinNumAGPRs, 4);
diff --git a/llvm/test/CodeGen/AMDGPU/addrspacecast-constantexpr.ll b/llvm/test/CodeGen/AMDGPU/addrspacecast-constantexpr.ll
index d316e10037757..0f5028fd82296 100644
--- a/llvm/test/CodeGen/AMDGPU/addrspacecast-constantexpr.ll
+++ b/llvm/test/CodeGen/AMDGPU/addrspacecast-constantexpr.ll
@@ -233,8 +233,8 @@ attributes #1 = { nounwind }
; AKF_HSA: attributes #[[ATTR1]] = { nounwind }
;.
; ATTRIBUTOR_HSA: attributes #[[ATTR0:[0-9]+]] = { nocallback nofree nounwind willreturn memory(argmem: readwrite) }
-; ATTRIBUTOR_HSA: attributes #[[ATTR1]] = { nounwind "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "uniform-work-group-size"="false" }
-; ATTRIBUTOR_HSA: attributes #[[ATTR2]] = { nounwind "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "uniform-work-group-size"="false" }
+; ATTRIBUTOR_HSA: attributes #[[ATTR1]] = { nounwind "amdgpu-agpr-alloc"="0" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "uniform-work-group-size"="false" }
+; ATTRIBUTOR_HSA: attributes #[[ATTR2]] = { nounwind "amdgpu-agpr-alloc"="0" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "uniform-work-group-size"="false" }
;.
; AKF_HSA: [[META0:![0-9]+]] = !{i32 1, !"amdhsa_code_object_version", i32 500}
;.
diff --git a/llvm/test/CodeGen/AMDGPU/agpr-copy-no-free-registers-assertion-after-ra-failure.ll b/llvm/test/CodeGen/AMDGPU/agpr-copy-no-free-registers-assertion-after-ra-failure.ll
index f3eb7a42cb823..cea1fe49f4d8b 100644
--- a/llvm/test/CodeGen/AMDGPU/agpr-copy-no-free-registers-assertion-after-ra-failure.ll
+++ b/llvm/test/CodeGen/AMDGPU/agpr-copy-no-free-registers-assertion-after-ra-failure.ll
@@ -17,6 +17,6 @@ define void @no_free_vgprs_at_agpr_to_agpr_copy(float %v0, float %v1) #0 {
declare <16 x float> @llvm.amdgcn.mfma.f32.16x16x1f32(float, float, <16 x float>, i32 immarg, i32 immarg, i32 immarg) #1
declare noundef i32 @llvm.amdgcn.workitem.id.x() #2
-attributes #0 = { "amdgpu-no-agpr" "amdgpu-waves-per-eu"="6,6" }
+attributes #0 = { "amdgpu-agpr-alloc"="0" "amdgpu-waves-per-eu"="6,6" }
attributes #1 = { convergent nocallback nofree nosync nounwind willreturn memory(none) }
attributes #2 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }
diff --git a/llvm/test/CodeGen/AMDGPU/agpr-copy-no-free-registers.ll b/llvm/test/CodeGen/AMDGPU/agpr-copy-no-free-registers.ll
index d1b01eeee11a4..e70e34fa0ba5d 100644
--- a/llvm/test/CodeGen/AMDGPU/agpr-copy-no-free-registers.ll
+++ b/llvm/test/CodeGen/AMDGPU/agpr-copy-no-free-registers.ll
@@ -1144,6 +1144,6 @@ declare i32 @llvm.amdgcn.workitem.id.x() #2
attributes #0 = { "amdgpu-waves-per-eu"="6,6" }
attributes #1 = { convergent nounwind readnone willreturn }
attributes #2 = { nounwind readnone willreturn }
-attributes #3 = { "amdgpu-waves-per-eu"="7,7" "amdgpu-no-agpr" }
+attributes #3 = { "amdgpu-waves-per-eu"="7,7" "amdgpu-agpr-alloc"="0" }
attributes #4 = { "amdgpu-waves-per-eu"="6,6" "amdgpu-flat-work-group-size"="1024,1024" }
-attributes #5 = { "amdgpu-waves-per-eu"="6,6" "amdgpu-no-agpr" }
+attributes #5 = { "amdgpu-waves-per-eu"="6,6" "amdgpu-agpr-alloc"="0" }
diff --git a/llvm/test/CodeGen/AMDGPU/amdgpu-attributor-no-agpr.ll b/llvm/test/CodeGen/AMDGPU/amdgpu-attributor-no-agpr.ll
index 33e7e7a7a019e..7e9cb7adf4fc2 100644
--- a/llvm/test/CodeGen/AMDGPU/amdgpu-attributor-no-agpr.ll
+++ b/llvm/test/CodeGen/AMDGPU/amdgpu-attributor-no-agpr.ll
@@ -252,13 +252,13 @@ define amdgpu_kernel void @indirect_calls_none_agpr(i1 %cond) {
}
-attributes #0 = { "amdgpu-no-agpr" }
+attributes #0 = { "amdgpu-agpr-alloc"="0" }
;.
; CHECK: attributes #[[ATTR0]] = { "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
-; CHECK: attributes #[[ATTR1]] = { "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
+; CHECK: attributes #[[ATTR1]] = { "amdgpu-agpr-alloc"="0" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
; CHECK: attributes #[[ATTR2]] = { "target-cpu"="gfx90a" "uniform-work-group-size"="false" }
; CHECK: attributes #[[ATTR3:[0-9]+]] = { convergent nocallback nofree nosync nounwind willreturn memory(none) "target-cpu"="gfx90a" }
; CHECK: attributes #[[ATTR4:[0-9]+]] = { nocallback nofree nosync nounwind speculatable willreturn memory(none) "target-cpu"="gfx90a" }
; CHECK: attributes #[[ATTR5:[0-9]+]] = { nocallback nofree nounwind willreturn memory(argmem: readwrite) "target-cpu"="gfx90a" }
-; CHECK: attributes #[[ATTR6]] = { "amdgpu-no-agpr" }
+; CHECK: attributes #[[ATTR6]] = { "amdgpu-agpr-alloc"="0" }
;.
diff --git a/llvm/test/CodeGen/AMDGPU/amdgpu-no-agprs-violations.ll b/llvm/test/CodeGen/AMDGPU/amdgpu-no-agprs-violations.ll
index d0bf8d3920a98..7bf9a29e9ff44 100644
--- a/llvm/test/CodeGen/AMDGPU/amdgpu-no-agprs-violations.ll
+++ b/llvm/test/CodeGen/AMDGPU/amdgpu-no-agprs-violations.ll
@@ -1,6 +1,6 @@
; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 < %s | FileCheck -check-prefixes=CHECK,GFX908 %s
; RUN: not llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx90a < %s 2> %t.err | FileCheck -check-prefixes=CHECK,GFX90A %s
-; RUN: FileCheck -check-prefix=ERR < %t.err %s
+; RUN: FileCheck --implicit-check-not=error -check-prefix=ERR < %t.err %s
; Test undefined behavior where a function ends up needing AGPRs that
; was marked with "amdgpu-agpr-alloc="="0". There should be no asserts.
@@ -9,7 +9,6 @@
; ERR: error: <unknown>:0:0: no registers from class available to allocate in function 'kernel_illegal_agpr_use_asm'
; ERR: error: <unknown>:0:0: no registers from class available to allocate in function 'func_illegal_agpr_use_asm'
-; ERR: error: <unknown>:0:0: no registers from class available to allocate in function 'kernel_calls_mfma.f32.32x32x1f32'
; CHECK: {{^}}kernel_illegal_agpr_use_asm:
; CHECK: ; use a0
@@ -32,14 +31,16 @@ define void @func_illegal_agpr_use_asm() #0 {
}
; CHECK-LABEL: {{^}}kernel_calls_mfma.f32.32x32x1f32:
-; CHECK: v_accvgpr_write_b32
+; GFX908: v_accvgpr_write_b32
+; GFX90A-NOT: v_accvgpr_write_b32
; GFX908: NumVgprs: 5
-; GFX90A: NumVgprs: 36
-; CHECK: NumAgprs: 32
+; GFX908: NumAgprs: 32
+; GFX90A: NumVgprs: 35
+; GFX90A: NumAgprs: 0
; GFX908: TotalNumVgprs: 32
-; GFX90A: TotalNumVgprs: 68
+; GFX90A: TotalNumVgprs: 35
define amdgpu_kernel void @kernel_calls_mfma.f32.32x32x1f32(ptr addrspace(1) %out, float %a, float %b, <32 x float> %c) #0 {
%result = call <32 x float> @llvm.amdgcn.mfma.f32.32x32x1f32(float %a, float %b, <32 x float> %c, i32 0, i32 0, i32 0)
store <32 x float> %result, ptr addrspace(1) %out
diff --git a/llvm/test/CodeGen/AMDGPU/amdgpu-num-agpr.ll b/llvm/test/CodeGen/AMDGPU/amdgpu-num-agpr.ll
index 15a442f85ebca..1f6ffe076822c 100644
--- a/llvm/test/CodeGen/AMDGPU/amdgpu-num-agpr.ll
+++ b/llvm/test/CodeGen/AMDGPU/amdgpu-num-agpr.ll
@@ -15,7 +15,7 @@ define amdgpu_kernel void @min_num_agpr_0_0__amdgpu_no_agpr() #0 {
ret void
}
-attributes #0 = { "amdgpu-waves-per-eu"="8,8" "amdgpu-flat-work-group-size"="64,64" "amdgpu-agpr-alloc"="0,0" "amdgpu-no-agpr" }
+attributes #0 = { "amdgpu-waves-per-eu"="8,8" "amdgpu-flat-work-group-size"="64,64" "amdgpu-agpr-alloc"="0,0" }
; Check parse of single entry 0
@@ -26,16 +26,16 @@ define amdgpu_kernel void @min_num_agpr_0__amdgpu_no_agpr() #1 {
ret void
}
-attributes #1 = { "amdgpu-waves-per-eu"="8,8" "amdgpu-flat-work-group-size"="64,64" "amdgpu-agpr-alloc"="0" "amdgpu-no-agpr" }
+attributes #1 = { "amdgpu-waves-per-eu"="8,8" "amdgpu-flat-work-group-size"="64,64" "amdgpu-agpr-alloc"="0" }
; Undefined use
-define amdgpu_kernel void @min_num_agpr_1_1__amdgpu_no_agpr() #2 {
+define amdgpu_kernel void @min_num_agpr_1_1() #2 {
call void asm sideeffect "; clobber $0","~{a0}"(), !srcloc !{i32 3}
ret void
}
-attributes #2 = { "amdgpu-waves-per-eu"="8,8" "amdgpu-flat-work-group-size"="64,64" "amdgpu-agpr-alloc"="1,1" "amdgpu-no-agpr" }
+attributes #2 = { "amdgpu-waves-per-eu"="8,8" "amdgpu-flat-work-group-size"="64,64" "amdgpu-agpr-alloc"="1,1" }
; Check parse of single entry 4, interpreted as the minimum. Total budget is 64.
; WARN: warning: <unknown>:0:0: failed to meet occupancy target given by 'amdgpu-waves-per-eu' in 'min_num_agpr_4__amdgpu_no_agpr': desired occupancy was 8, final occupancy is 7
@@ -48,7 +48,7 @@ define amdgpu_kernel void @min_num_agpr_4__amdgpu_no_agpr() #3 {
ret void
}
-attributes #3 = { "amdgpu-waves-per-eu"="8,8" "amdgpu-flat-work-group-size"="64,64" "amdgpu-agpr-alloc"="4" "amdgpu-no-agpr" }
+attributes #3 = { "amdgpu-waves-per-eu"="8,8" "amdgpu-flat-work-group-size"="64,64" "amdgpu-agpr-alloc"="4" }
; Allocation granularity requires rounding this to use 4 AGPRs, so the
@@ -79,7 +79,7 @@ define amdgpu_kernel void @min_num_agpr_64_64__amdgpu_no_agpr() #5 {
ret void
}
-attributes #5 = { "amdgpu-waves-per-eu"="8,8" "amdgpu-flat-work-group-size"="64,64" "amdgpu-agpr-alloc"="64,64" "amdgpu-no-agpr" }
+attributes #5 = { "amdgpu-waves-per-eu"="8,8" "amdgpu-flat-work-group-size"="64,64" "amdgpu-agpr-alloc"="64,64" }
; No free VGPRs
; WARN: warning: inline asm clobber list contains reserved registers: v0 at line 7
diff --git a/llvm/test/CodeGen/AMDGPU/amdhsa-kernarg-preload-num-sgprs.ll b/llvm/test/CodeGen/AMDGPU/amdhsa-kernarg-preload-num-sgprs.ll
index 0114de738ce84..dd760c2a215ca 100644
--- a/llvm/test/CodeGen/AMDGPU/amdhsa-kernarg-preload-num-sgprs.ll
+++ b/llvm/test/CodeGen/AMDGPU/amdhsa-kernarg-preload-num-sgprs.ll
@@ -70,4 +70,4 @@ define amdgpu_kernel void @amdhsa_kernarg_preload_1_implicit_2(i32 inreg) #0 { r
define amdgpu_kernel void @amdhsa_kernarg_preload_0_implicit_2(i32) #0 { ret void }
-attributes #0 = { "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "uniform-work-group-size"="false" }
+attributes #0 = { "amdgpu-agpr-alloc"="0" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-x" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "uniform-work-group-size"="false" }
diff --git a/llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa-call.ll b/llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa-call.ll
index ea3f08ede2c5d..f7bf0c4448c0f 100644
--- a/llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa-call.ll
+++ b/llvm/test/CodeGen/AMDGPU/annotate-kernel-features-hsa-call.ll
@@ -1025,31 +1025,31 @@ attributes #6 = { "enqueued-block" }
; AKF_HSA: attributes #[[ATTR8]] = { "amdgpu-calls" }
;.
; ATTRIBUTOR_HSA: attributes #[[ATTR0:[0-9]+]] = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }
-; ATTRIBUTOR_HSA: attributes #[[ATTR1]] = { nounwind "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitarg-ptr" "amdgpu-no-lds-kernel-id" "amdgpu-no-multigrid-sync-arg" "amdgpu-no-queue-ptr" "amdgpu-no-workgroup-id-x" "amdgpu-no-workgroup-id-y" "amdgpu-no-workgroup-id-z" "amdgpu-no-workitem-id-y" "amdgpu-no-workitem-id-z" "target-cpu"="fiji" "uniform-work-group-size"="false" }
-; ATTRIBUTOR_HSA: attributes #[[ATTR2]] = { nounwind "amdgpu-no-agpr" "amdgpu-no-completion-action" "amdgpu-no-default-queue" "amdgpu-no-dispatch-id" "amdgpu-no-dispatch-ptr" "amdgpu-no-flat-scratch-init" "amdgpu-no-heap-ptr" "amdgpu-no-hostcall-ptr" "amdgpu-no-implicitar...
[truncated]
``````````
</details>
https://github.com/llvm/llvm-project/pull/129893
More information about the llvm-branch-commits
mailing list