[llvm] [AMDGPU] Update compute program resource registers for GFX12 (PR #75911)
via llvm-commits
llvm-commits at lists.llvm.org
Tue Dec 19 01:04:10 PST 2023
llvmbot wrote:
<!--LLVM PR SUMMARY COMMENT-->
@llvm/pr-subscribers-mc
@llvm/pr-subscribers-backend-amdgpu
Author: Jay Foad (jayfoad)
<details>
<summary>Changes</summary>
---
Patch is 25.09 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/75911.diff
8 Files Affected:
- (modified) llvm/docs/AMDGPUUsage.rst (+50-12)
- (modified) llvm/include/llvm/Support/AMDHSAKernelDescriptor.h (+33-11)
- (modified) llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp (+5-4)
- (modified) llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp (+39-13)
- (modified) llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp (+3-1)
- (modified) llvm/test/MC/AMDGPU/hsa-diag-v4.s (+18-8)
- (modified) llvm/test/MC/AMDGPU/hsa-gfx12-v4.s (-2)
- (added) llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-gfx12.s (+54)
``````````diff
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index d49d1cd3812512..db346cbfbd27fb 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -4405,7 +4405,15 @@ The fields used by CP for code objects before V3 also match those specified in
``COMPUTE_PGM_RSRC3``
configuration
register. See
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc3-gfx10-gfx12-table`.
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc3-gfx10-gfx11-table`.
+ GFX12
+ Compute Shader (CS)
+ program settings used by
+ CP to set up
+ ``COMPUTE_PGM_RSRC3``
+ configuration
+ register. See
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc3-gfx12-table`.
415:384 4 bytes COMPUTE_PGM_RSRC1 Compute Shader (CS)
program settings used by
CP to set up
@@ -4830,13 +4838,16 @@ The fields used by CP for code objects before V3 also match those specified in
Used by CP to set up
``COMPUTE_PGM_RSRC2.USER_SGPR``.
- 6 1 bit ENABLE_TRAP_HANDLER Must be 0.
+ 6 1 bit ENABLE_TRAP_HANDLER GFX6-GFX11
+ Must be 0.
- This bit represents
- ``COMPUTE_PGM_RSRC2.TRAP_PRESENT``,
- which is set by the CP if
- the runtime has installed a
- trap handler.
+ This bit represents
+ ``COMPUTE_PGM_RSRC2.TRAP_PRESENT``,
+ which is set by the CP if
+ the runtime has installed a
+ trap handler.
+ GFX12
+ Reserved, must be 0.
7 1 bit ENABLE_SGPR_WORKGROUP_ID_X Enable the setup of the
system SGPR register for
the work-group id in the X
@@ -4956,7 +4967,7 @@ The fields used by CP for code objects before V3 also match those specified in
30 1 bit ENABLE_EXCEPTION_INT_DIVIDE_BY Integer Division by Zero
_ZERO (rcp_iflag_f32 instruction
only)
- 31 1 bit Reserved, must be 0.
+ 31 1 bit RESERVED Reserved, must be 0.
32 **Total size 4 bytes.**
======= ===================================================================================================================
@@ -4991,10 +5002,11 @@ The fields used by CP for code objects before V3 also match those specified in
======= ======= =============================== ===========================================================================
Bits Size Field Name Description
======= ======= =============================== ===========================================================================
- 3:0 4 bits SHARED_VGPR_COUNT Number of shared VGPR blocks when executing in subvector mode. For
- wavefront size 64 the value is 0-15, representing 0-120 VGPRs (granularity
- of 8), such that (compute_pgm_rsrc1.vgprs +1)*4 + shared_vgpr_count*8 does
- not exceed 256. For wavefront size 32 shared_vgpr_count must be 0.
+ 3:0 4 bits SHARED_VGPR_COUNT GFX10-GFX11
+ Number of shared VGPR blocks when executing in subvector mode. For
+ wavefront size 64 the value is 0-15, representing 0-120 VGPRs (granularity
+ of 8), such that (compute_pgm_rsrc1.vgprs +1)*4 + shared_vgpr_count*8 does
+ not exceed 256. For wavefront size 32 shared_vgpr_count must be 0.
9:4 6 bits INST_PREF_SIZE GFX10
Reserved, must be 0.
GFX11
@@ -5035,6 +5047,32 @@ The fields used by CP for code objects before V3 also match those specified in
32 **Total size 4 bytes.**
======= ===================================================================================================================
+..
+
+ .. table:: compute_pgm_rsrc3 for GFX12
+ :name: amdgpu-amdhsa-compute_pgm_rsrc3-gfx12-table
+
+ ======= ======= =============================== ===========================================================================
+ Bits Size Field Name Description
+ ======= ======= =============================== ===========================================================================
+ 3:0 4 bits RESERVED Reserved, must be 0.
+ 11:4 8 bits INST_PREF_SIZE Number of instruction bytes to prefetch, starting at the kernel's entry
+ point instruction, before wavefront starts execution. The value is 0..255
+ with a granularity of 128 bytes.
+ 12 1 bit RESERVED Reserved, must be 0.
+ 13 1 bit GLG_EN If 1, group launch guarantee will be enabled for this dispatch
+ 30:14 17 bits RESERVED Reserved, must be 0.
+ 31 1 bit IMAGE_OP If 1, the kernel execution contains image instructions. If executed as
+ part of a graphics pipeline, image read instructions will stall waiting
+ for any necessary ``WAIT_SYNC`` fence to be performed in order to
+ indicate that earlier pipeline stages have completed writing to the
+ image.
+
+ Not used for compute kernels that are not part of a graphics pipeline and
+ must be 0.
+ 32 **Total size 4 bytes.**
+ ======= ===================================================================================================================
+
..
.. table:: Floating Point Rounding Mode Enumeration Values
diff --git a/llvm/include/llvm/Support/AMDHSAKernelDescriptor.h b/llvm/include/llvm/Support/AMDHSAKernelDescriptor.h
index 2de2cf4185d86e..84cac3ef700e05 100644
--- a/llvm/include/llvm/Support/AMDHSAKernelDescriptor.h
+++ b/llvm/include/llvm/Support/AMDHSAKernelDescriptor.h
@@ -127,12 +127,20 @@ enum : int32_t {
#undef COMPUTE_PGM_RSRC1
// Compute program resource register 2. Must match hardware definition.
+// GFX6+.
#define COMPUTE_PGM_RSRC2(NAME, SHIFT, WIDTH) \
AMDHSA_BITS_ENUM_ENTRY(COMPUTE_PGM_RSRC2_ ## NAME, SHIFT, WIDTH)
+// [GFX6-GFX11].
+#define COMPUTE_PGM_RSRC2_GFX6_GFX11(NAME, SHIFT, WIDTH) \
+ AMDHSA_BITS_ENUM_ENTRY(COMPUTE_PGM_RSRC2_GFX6_GFX11_##NAME, SHIFT, WIDTH)
+// GFX12+.
+#define COMPUTE_PGM_RSRC2_GFX12_PLUS(NAME, SHIFT, WIDTH) \
+ AMDHSA_BITS_ENUM_ENTRY(COMPUTE_PGM_RSRC2_GFX12_PLUS_##NAME, SHIFT, WIDTH)
enum : int32_t {
COMPUTE_PGM_RSRC2(ENABLE_PRIVATE_SEGMENT, 0, 1),
COMPUTE_PGM_RSRC2(USER_SGPR_COUNT, 1, 5),
- COMPUTE_PGM_RSRC2(ENABLE_TRAP_HANDLER, 6, 1),
+ COMPUTE_PGM_RSRC2_GFX6_GFX11(ENABLE_TRAP_HANDLER, 6, 1),
+ COMPUTE_PGM_RSRC2_GFX12_PLUS(RESERVED1, 6, 1),
COMPUTE_PGM_RSRC2(ENABLE_SGPR_WORKGROUP_ID_X, 7, 1),
COMPUTE_PGM_RSRC2(ENABLE_SGPR_WORKGROUP_ID_Y, 8, 1),
COMPUTE_PGM_RSRC2(ENABLE_SGPR_WORKGROUP_ID_Z, 9, 1),
@@ -166,23 +174,37 @@ enum : int32_t {
// Compute program resource register 3 for GFX10+. Must match hardware
// definition.
-// [GFX10].
-#define COMPUTE_PGM_RSRC3_GFX10(NAME, SHIFT, WIDTH) \
- AMDHSA_BITS_ENUM_ENTRY(COMPUTE_PGM_RSRC3_GFX10_ ## NAME, SHIFT, WIDTH)
// GFX10+.
#define COMPUTE_PGM_RSRC3_GFX10_PLUS(NAME, SHIFT, WIDTH) \
AMDHSA_BITS_ENUM_ENTRY(COMPUTE_PGM_RSRC3_GFX10_PLUS_ ## NAME, SHIFT, WIDTH)
+// [GFX10].
+#define COMPUTE_PGM_RSRC3_GFX10(NAME, SHIFT, WIDTH) \
+ AMDHSA_BITS_ENUM_ENTRY(COMPUTE_PGM_RSRC3_GFX10_##NAME, SHIFT, WIDTH)
+// [GFX10-GFX11].
+#define COMPUTE_PGM_RSRC3_GFX10_GFX11(NAME, SHIFT, WIDTH) \
+ AMDHSA_BITS_ENUM_ENTRY(COMPUTE_PGM_RSRC3_GFX10_GFX11_##NAME, SHIFT, WIDTH)
// GFX11+.
#define COMPUTE_PGM_RSRC3_GFX11_PLUS(NAME, SHIFT, WIDTH) \
AMDHSA_BITS_ENUM_ENTRY(COMPUTE_PGM_RSRC3_GFX11_PLUS_ ## NAME, SHIFT, WIDTH)
+// [GFX11].
+#define COMPUTE_PGM_RSRC3_GFX11(NAME, SHIFT, WIDTH) \
+ AMDHSA_BITS_ENUM_ENTRY(COMPUTE_PGM_RSRC3_GFX11_##NAME, SHIFT, WIDTH)
+// GFX12+.
+#define COMPUTE_PGM_RSRC3_GFX12_PLUS(NAME, SHIFT, WIDTH) \
+ AMDHSA_BITS_ENUM_ENTRY(COMPUTE_PGM_RSRC3_GFX12_PLUS_##NAME, SHIFT, WIDTH)
enum : int32_t {
- COMPUTE_PGM_RSRC3_GFX10_PLUS(SHARED_VGPR_COUNT, 0, 4),
- COMPUTE_PGM_RSRC3_GFX10(RESERVED0, 4, 8),
- COMPUTE_PGM_RSRC3_GFX11_PLUS(INST_PREF_SIZE, 4, 6),
- COMPUTE_PGM_RSRC3_GFX11_PLUS(TRAP_ON_START, 10, 1),
- COMPUTE_PGM_RSRC3_GFX11_PLUS(TRAP_ON_END, 11, 1),
- COMPUTE_PGM_RSRC3_GFX10_PLUS(RESERVED1, 12, 19),
- COMPUTE_PGM_RSRC3_GFX10(RESERVED2, 31, 1),
+ COMPUTE_PGM_RSRC3_GFX10_GFX11(SHARED_VGPR_COUNT, 0, 4),
+ COMPUTE_PGM_RSRC3_GFX12_PLUS(RESERVED0, 0, 4),
+ COMPUTE_PGM_RSRC3_GFX10(RESERVED1, 4, 8),
+ COMPUTE_PGM_RSRC3_GFX11(INST_PREF_SIZE, 4, 6),
+ COMPUTE_PGM_RSRC3_GFX11(TRAP_ON_START, 10, 1),
+ COMPUTE_PGM_RSRC3_GFX11(TRAP_ON_END, 11, 1),
+ COMPUTE_PGM_RSRC3_GFX12_PLUS(INST_PREF_SIZE, 4, 8),
+ COMPUTE_PGM_RSRC3_GFX10_PLUS(RESERVED2, 12, 1),
+ COMPUTE_PGM_RSRC3_GFX10_GFX11(RESERVED3, 13, 1),
+ COMPUTE_PGM_RSRC3_GFX12_PLUS(GLG_EN, 13, 1),
+ COMPUTE_PGM_RSRC3_GFX10_PLUS(RESERVED4, 14, 17),
+ COMPUTE_PGM_RSRC3_GFX10(RESERVED5, 31, 1),
COMPUTE_PGM_RSRC3_GFX11_PLUS(IMAGE_OP, 31, 1),
};
#undef COMPUTE_PGM_RSRC3_GFX10_PLUS
diff --git a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
index 3b69a37728ea1c..abd7e911beef3f 100644
--- a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
+++ b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
@@ -5416,11 +5416,12 @@ bool AMDGPUAsmParser::ParseDirectiveAMDHSAKernel() {
PARSE_BITS_ENTRY(KD.compute_pgm_rsrc1, COMPUTE_PGM_RSRC1_GFX10_PLUS_FWD_PROGRESS, Val,
ValRange);
} else if (ID == ".amdhsa_shared_vgpr_count") {
- if (IVersion.Major < 10)
- return Error(IDRange.Start, "directive requires gfx10+", IDRange);
+ if (IVersion.Major < 10 || IVersion.Major >= 12)
+ return Error(IDRange.Start, "directive requires gfx10 or gfx11",
+ IDRange);
SharedVGPRCount = Val;
PARSE_BITS_ENTRY(KD.compute_pgm_rsrc3,
- COMPUTE_PGM_RSRC3_GFX10_PLUS_SHARED_VGPR_COUNT, Val,
+ COMPUTE_PGM_RSRC3_GFX10_GFX11_SHARED_VGPR_COUNT, Val,
ValRange);
} else if (ID == ".amdhsa_exception_fp_ieee_invalid_op") {
PARSE_BITS_ENTRY(
@@ -5522,7 +5523,7 @@ bool AMDGPUAsmParser::ParseDirectiveAMDHSAKernel() {
(AccumOffset / 4 - 1));
}
- if (IVersion.Major >= 10) {
+ if (IVersion.Major >= 10 && IVersion.Major < 12) {
// SharedVGPRCount < 16 checked by PARSE_ENTRY_BITS
if (SharedVGPRCount && EnableWavefrontSize32 && *EnableWavefrontSize32) {
return TokError("shared_vgpr_count directive not valid on "
diff --git a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
index ed2e7e4f189e01..d3dec339683592 100644
--- a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
+++ b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
@@ -1995,34 +1995,60 @@ MCDisassembler::DecodeStatus AMDGPUDisassembler::decodeCOMPUTE_PGM_RSRC3(
if (FourByteBuffer & COMPUTE_PGM_RSRC3_GFX90A_RESERVED1)
return MCDisassembler::Fail;
} else if (isGFX10Plus()) {
- if (!EnableWavefrontSize32 || !*EnableWavefrontSize32) {
- PRINT_DIRECTIVE(".amdhsa_shared_vgpr_count",
- COMPUTE_PGM_RSRC3_GFX10_PLUS_SHARED_VGPR_COUNT);
+ // Bits [0-3].
+ if (!isGFX12Plus()) {
+ if (!EnableWavefrontSize32 || !*EnableWavefrontSize32) {
+ PRINT_DIRECTIVE(".amdhsa_shared_vgpr_count",
+ COMPUTE_PGM_RSRC3_GFX10_GFX11_SHARED_VGPR_COUNT);
+ } else {
+ PRINT_PSEUDO_DIRECTIVE_COMMENT(
+ "SHARED_VGPR_COUNT",
+ COMPUTE_PGM_RSRC3_GFX10_GFX11_SHARED_VGPR_COUNT);
+ }
} else {
- PRINT_PSEUDO_DIRECTIVE_COMMENT(
- "SHARED_VGPR_COUNT", COMPUTE_PGM_RSRC3_GFX10_PLUS_SHARED_VGPR_COUNT);
+ if (FourByteBuffer & COMPUTE_PGM_RSRC3_GFX12_PLUS_RESERVED0)
+ return MCDisassembler::Fail;
}
- if (isGFX11Plus()) {
+ // Bits [4-11].
+ if (isGFX11()) {
PRINT_PSEUDO_DIRECTIVE_COMMENT("INST_PREF_SIZE",
- COMPUTE_PGM_RSRC3_GFX11_PLUS_INST_PREF_SIZE);
+ COMPUTE_PGM_RSRC3_GFX11_INST_PREF_SIZE);
PRINT_PSEUDO_DIRECTIVE_COMMENT("TRAP_ON_START",
- COMPUTE_PGM_RSRC3_GFX11_PLUS_TRAP_ON_START);
+ COMPUTE_PGM_RSRC3_GFX11_TRAP_ON_START);
PRINT_PSEUDO_DIRECTIVE_COMMENT("TRAP_ON_END",
- COMPUTE_PGM_RSRC3_GFX11_PLUS_TRAP_ON_END);
+ COMPUTE_PGM_RSRC3_GFX11_TRAP_ON_END);
+ } else if (isGFX12Plus()) {
+ PRINT_PSEUDO_DIRECTIVE_COMMENT(
+ "INST_PREF_SIZE", COMPUTE_PGM_RSRC3_GFX12_PLUS_INST_PREF_SIZE);
+ } else {
+ if (FourByteBuffer & COMPUTE_PGM_RSRC3_GFX10_RESERVED1)
+ return MCDisassembler::Fail;
+ }
+
+ // Bits [12].
+ if (FourByteBuffer & COMPUTE_PGM_RSRC3_GFX10_PLUS_RESERVED2)
+ return MCDisassembler::Fail;
+
+ // Bits [13].
+ if (isGFX12Plus()) {
+ PRINT_PSEUDO_DIRECTIVE_COMMENT("GLG_EN",
+ COMPUTE_PGM_RSRC3_GFX12_PLUS_GLG_EN);
} else {
- if (FourByteBuffer & COMPUTE_PGM_RSRC3_GFX10_RESERVED0)
+ if (FourByteBuffer & COMPUTE_PGM_RSRC3_GFX10_GFX11_RESERVED3)
return MCDisassembler::Fail;
}
- if (FourByteBuffer & COMPUTE_PGM_RSRC3_GFX10_PLUS_RESERVED1)
+ // Bits [14-30].
+ if (FourByteBuffer & COMPUTE_PGM_RSRC3_GFX10_PLUS_RESERVED4)
return MCDisassembler::Fail;
+ // Bits [31].
if (isGFX11Plus()) {
PRINT_PSEUDO_DIRECTIVE_COMMENT("IMAGE_OP",
- COMPUTE_PGM_RSRC3_GFX11_PLUS_TRAP_ON_START);
+ COMPUTE_PGM_RSRC3_GFX11_PLUS_IMAGE_OP);
} else {
- if (FourByteBuffer & COMPUTE_PGM_RSRC3_GFX10_RESERVED2)
+ if (FourByteBuffer & COMPUTE_PGM_RSRC3_GFX10_RESERVED5)
return MCDisassembler::Fail;
}
} else if (FourByteBuffer) {
diff --git a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp
index a855cf585205bc..e135a4e25dd15a 100644
--- a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp
+++ b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp
@@ -475,8 +475,10 @@ void AMDGPUTargetAsmStreamer::EmitAmdhsaKernelDescriptor(
PRINT_FIELD(OS, ".amdhsa_forward_progress", KD,
compute_pgm_rsrc1,
amdhsa::COMPUTE_PGM_RSRC1_GFX10_PLUS_FWD_PROGRESS);
+ }
+ if (IVersion.Major >= 10 && IVersion.Major < 12) {
PRINT_FIELD(OS, ".amdhsa_shared_vgpr_count", KD, compute_pgm_rsrc3,
- amdhsa::COMPUTE_PGM_RSRC3_GFX10_PLUS_SHARED_VGPR_COUNT);
+ amdhsa::COMPUTE_PGM_RSRC3_GFX10_GFX11_SHARED_VGPR_COUNT);
}
if (IVersion.Major >= 12)
PRINT_FIELD(OS, ".amdhsa_round_robin_scheduling", KD, compute_pgm_rsrc1,
diff --git a/llvm/test/MC/AMDGPU/hsa-diag-v4.s b/llvm/test/MC/AMDGPU/hsa-diag-v4.s
index f7a554aedb746b..069b71b7229cdd 100644
--- a/llvm/test/MC/AMDGPU/hsa-diag-v4.s
+++ b/llvm/test/MC/AMDGPU/hsa-diag-v4.s
@@ -1,6 +1,7 @@
// RUN: not llvm-mc --amdhsa-code-object-version=4 -triple amdgcn-amd-amdhsa -mcpu=gfx810 -mattr=+xnack -show-encoding %s 2>&1 >/dev/null | FileCheck %s --check-prefixes=GCN,GFX8,PREGFX10,AMDHSA
// RUN: not llvm-mc --amdhsa-code-object-version=4 -triple amdgcn-amd-amdhsa -mcpu=gfx1010 -mattr=+xnack -show-encoding %s 2>&1 >/dev/null | FileCheck %s --check-prefixes=GCN,GFX10PLUS,GFX10,AMDHSA
// RUN: not llvm-mc --amdhsa-code-object-version=4 -triple amdgcn-amd-amdhsa -mcpu=gfx1100 -show-encoding %s 2>&1 >/dev/null | FileCheck %s --check-prefixes=GCN,GFX10PLUS,GFX11,AMDHSA
+// RUN: not llvm-mc --amdhsa-code-object-version=4 -triple amdgcn-amd-amdhsa -mcpu=gfx1200 -show-encoding %s 2>&1 >/dev/null | FileCheck %s --check-prefixes=GCN,GFX10PLUS,GFX12,AMDHSA
// RUN: not llvm-mc --amdhsa-code-object-version=4 -triple amdgcn-amd- -mcpu=gfx810 -mattr=+xnack -show-encoding %s 2>&1 >/dev/null | FileCheck %s --check-prefixes=GCN,NONAMDHSA
// RUN: not llvm-mc --amdhsa-code-object-version=4 -triple amdgcn-amd-amdhsa -mcpu=gfx90a -mattr=+xnack -show-encoding %s 2>&1 >/dev/null | FileCheck %s --check-prefixes=GFX90A,PREGFX10,AMDHSA,ALL
@@ -10,6 +11,7 @@
// GFX8-NOT: error:
// GFX10: error: .amdgcn_target directive's target id amdgcn-amd-amdhsa--gfx810:xnack+ does not match the specified target id amdgcn-amd-amdhsa--gfx1010:xnack+
// GFX11: error: .amdgcn_target directive's target id amdgcn-amd-amdhsa--gfx810:xnack+ does not match the specified target id amdgcn-amd-amdhsa--gfx1100
+// GFX12: error: .amdgcn_target directive's target id amdgcn-amd-amdhsa--gfx810:xnack+ does not match the specified target id amdgcn-amd-amdhsa--gfx1200
// NONAMDHSA: error: .amdgcn_target directive's target id amdgcn-amd-amdhsa--gfx810:xnack+ does not match the specified target id amdgcn-amd-unknown--gfx810
.warning "test_target"
.amdgcn_target "amdgcn-amd-amdhsa--gfx810:xnack+"
@@ -228,8 +230,10 @@
.end_amdhsa_kernel
// GCN-LABEL: warning: test_amdhsa_shared_vgpr_count_invalid1
-// PREGFX10: error: directive requires gfx10+
-// GFX10PLUS: error: .amdhsa_next_free_vgpr directive is required
+// PREGFX10: error: directive requires gfx10 or gfx11
+// GFX10: error: .amdhsa_next_free_vgpr directive is required
+// GFX11: error: .amdhsa_next_free_vgpr directive is required
+// GFX12: error: directive requires gfx10 or gfx11
// NONAMDHSA: error: unknown directive
.warning "test_amdhsa_shared_vgpr_count_invalid1"
.amdhsa_kernel test_amdhsa_shared_vgpr_count_invalid1
@@ -237,8 +241,10 @@
.end_amdhsa_kernel
// GCN-LABEL: warning: test_amdhsa_shared_vgpr_count_invalid2
-// PREGFX10: error: directive requires gfx10+
-// GFX...
[truncated]
``````````
</details>
https://github.com/llvm/llvm-project/pull/75911
More information about the llvm-commits
mailing list