[llvm] c01e844 - [AMDGPU] Update compute program resource registers for GFX12 (#75911)
via llvm-commits
llvm-commits at lists.llvm.org
Tue Jan 2 05:24:46 PST 2024
Author: Jay Foad
Date: 2024-01-02T13:24:42Z
New Revision: c01e844a7ea7cce4d9477b04d2c9ccaff3606f04
URL: https://github.com/llvm/llvm-project/commit/c01e844a7ea7cce4d9477b04d2c9ccaff3606f04
DIFF: https://github.com/llvm/llvm-project/commit/c01e844a7ea7cce4d9477b04d2c9ccaff3606f04.diff
LOG: [AMDGPU] Update compute program resource registers for GFX12 (#75911)
Co-authored-by: Konstantin Zhuravlyov <kzhuravl at amd.com>
Added:
llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-gfx12.s
Modified:
llvm/docs/AMDGPUUsage.rst
llvm/include/llvm/Support/AMDHSAKernelDescriptor.h
llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp
llvm/test/MC/AMDGPU/hsa-diag-v4.s
llvm/test/MC/AMDGPU/hsa-gfx12-v4.s
Removed:
################################################################################
diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst
index f0c81bf878f7ac..371e583c22a835 100644
--- a/llvm/docs/AMDGPUUsage.rst
+++ b/llvm/docs/AMDGPUUsage.rst
@@ -4406,7 +4406,15 @@ The fields used by CP for code objects before V3 also match those specified in
``COMPUTE_PGM_RSRC3``
configuration
register. See
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc3-gfx10-gfx12-table`.
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc3-gfx10-gfx11-table`.
+ GFX12
+ Compute Shader (CS)
+ program settings used by
+ CP to set up
+ ``COMPUTE_PGM_RSRC3``
+ configuration
+ register. See
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc3-gfx12-table`.
415:384 4 bytes COMPUTE_PGM_RSRC1 Compute Shader (CS)
program settings used by
CP to set up
@@ -4831,13 +4839,16 @@ The fields used by CP for code objects before V3 also match those specified in
Used by CP to set up
``COMPUTE_PGM_RSRC2.USER_SGPR``.
- 6 1 bit ENABLE_TRAP_HANDLER Must be 0.
+ 6 1 bit ENABLE_TRAP_HANDLER GFX6-GFX11
+ Must be 0.
- This bit represents
- ``COMPUTE_PGM_RSRC2.TRAP_PRESENT``,
- which is set by the CP if
- the runtime has installed a
- trap handler.
+ This bit represents
+ ``COMPUTE_PGM_RSRC2.TRAP_PRESENT``,
+ which is set by the CP if
+ the runtime has installed a
+ trap handler.
+ GFX12
+ Reserved, must be 0.
7 1 bit ENABLE_SGPR_WORKGROUP_ID_X Enable the setup of the
system SGPR register for
the work-group id in the X
@@ -4957,7 +4968,7 @@ The fields used by CP for code objects before V3 also match those specified in
30 1 bit ENABLE_EXCEPTION_INT_DIVIDE_BY Integer Division by Zero
_ZERO (rcp_iflag_f32 instruction
only)
- 31 1 bit Reserved, must be 0.
+ 31 1 bit RESERVED Reserved, must be 0.
32 **Total size 4 bytes.**
======= ===================================================================================================================
@@ -4986,8 +4997,8 @@ The fields used by CP for code objects before V3 also match those specified in
..
- .. table:: compute_pgm_rsrc3 for GFX10-GFX12
- :name: amdgpu-amdhsa-compute_pgm_rsrc3-gfx10-gfx12-table
+ .. table:: compute_pgm_rsrc3 for GFX10-GFX11
+ :name: amdgpu-amdhsa-compute_pgm_rsrc3-gfx10-gfx11-table
======= ======= =============================== ===========================================================================
Bits Size Field Name Description
@@ -5036,6 +5047,32 @@ The fields used by CP for code objects before V3 also match those specified in
32 **Total size 4 bytes.**
======= ===================================================================================================================
+..
+
+ .. table:: compute_pgm_rsrc3 for GFX12
+ :name: amdgpu-amdhsa-compute_pgm_rsrc3-gfx12-table
+
+ ======= ======= =============================== ===========================================================================
+ Bits Size Field Name Description
+ ======= ======= =============================== ===========================================================================
+ 3:0 4 bits RESERVED Reserved, must be 0.
+ 11:4 8 bits INST_PREF_SIZE Number of instruction bytes to prefetch, starting at the kernel's entry
+ point instruction, before wavefront starts execution. The value is 0..255
+ with a granularity of 128 bytes.
+ 12 1 bit RESERVED Reserved, must be 0.
+ 13 1 bit GLG_EN If 1, group launch guarantee will be enabled for this dispatch
+ 30:14 17 bits RESERVED Reserved, must be 0.
+ 31 1 bit IMAGE_OP If 1, the kernel execution contains image instructions. If executed as
+ part of a graphics pipeline, image read instructions will stall waiting
+ for any necessary ``WAIT_SYNC`` fence to be performed in order to
+ indicate that earlier pipeline stages have completed writing to the
+ image.
+
+ Not used for compute kernels that are not part of a graphics pipeline and
+ must be 0.
+ 32 **Total size 4 bytes.**
+ ======= ===================================================================================================================
+
..
.. table:: Floating Point Rounding Mode Enumeration Values
@@ -15508,7 +15545,7 @@ terminated by an ``.end_amdhsa_kernel`` directive.
``.amdhsa_forward_progress`` 0 GFX10-GFX12 Controls FWD_PROGRESS in
:ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx12-table`.
``.amdhsa_shared_vgpr_count`` 0 GFX10-GFX11 Controls SHARED_VGPR_COUNT in
- :ref:`amdgpu-amdhsa-compute_pgm_rsrc3-gfx10-gfx12-table`.
+ :ref:`amdgpu-amdhsa-compute_pgm_rsrc3-gfx10-gfx11-table`.
``.amdhsa_exception_fp_ieee_invalid_op`` 0 GFX6-GFX12 Controls ENABLE_EXCEPTION_IEEE_754_FP_INVALID_OPERATION in
:ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx12-table`.
``.amdhsa_exception_fp_denorm_src`` 0 GFX6-GFX12 Controls ENABLE_EXCEPTION_FP_DENORMAL_SOURCE in
diff --git a/llvm/include/llvm/Support/AMDHSAKernelDescriptor.h b/llvm/include/llvm/Support/AMDHSAKernelDescriptor.h
index 2de2cf4185d86e..84cac3ef700e05 100644
--- a/llvm/include/llvm/Support/AMDHSAKernelDescriptor.h
+++ b/llvm/include/llvm/Support/AMDHSAKernelDescriptor.h
@@ -127,12 +127,20 @@ enum : int32_t {
#undef COMPUTE_PGM_RSRC1
// Compute program resource register 2. Must match hardware definition.
+// GFX6+.
#define COMPUTE_PGM_RSRC2(NAME, SHIFT, WIDTH) \
AMDHSA_BITS_ENUM_ENTRY(COMPUTE_PGM_RSRC2_ ## NAME, SHIFT, WIDTH)
+// [GFX6-GFX11].
+#define COMPUTE_PGM_RSRC2_GFX6_GFX11(NAME, SHIFT, WIDTH) \
+ AMDHSA_BITS_ENUM_ENTRY(COMPUTE_PGM_RSRC2_GFX6_GFX11_##NAME, SHIFT, WIDTH)
+// GFX12+.
+#define COMPUTE_PGM_RSRC2_GFX12_PLUS(NAME, SHIFT, WIDTH) \
+ AMDHSA_BITS_ENUM_ENTRY(COMPUTE_PGM_RSRC2_GFX12_PLUS_##NAME, SHIFT, WIDTH)
enum : int32_t {
COMPUTE_PGM_RSRC2(ENABLE_PRIVATE_SEGMENT, 0, 1),
COMPUTE_PGM_RSRC2(USER_SGPR_COUNT, 1, 5),
- COMPUTE_PGM_RSRC2(ENABLE_TRAP_HANDLER, 6, 1),
+ COMPUTE_PGM_RSRC2_GFX6_GFX11(ENABLE_TRAP_HANDLER, 6, 1),
+ COMPUTE_PGM_RSRC2_GFX12_PLUS(RESERVED1, 6, 1),
COMPUTE_PGM_RSRC2(ENABLE_SGPR_WORKGROUP_ID_X, 7, 1),
COMPUTE_PGM_RSRC2(ENABLE_SGPR_WORKGROUP_ID_Y, 8, 1),
COMPUTE_PGM_RSRC2(ENABLE_SGPR_WORKGROUP_ID_Z, 9, 1),
@@ -166,23 +174,37 @@ enum : int32_t {
// Compute program resource register 3 for GFX10+. Must match hardware
// definition.
-// [GFX10].
-#define COMPUTE_PGM_RSRC3_GFX10(NAME, SHIFT, WIDTH) \
- AMDHSA_BITS_ENUM_ENTRY(COMPUTE_PGM_RSRC3_GFX10_ ## NAME, SHIFT, WIDTH)
// GFX10+.
#define COMPUTE_PGM_RSRC3_GFX10_PLUS(NAME, SHIFT, WIDTH) \
AMDHSA_BITS_ENUM_ENTRY(COMPUTE_PGM_RSRC3_GFX10_PLUS_ ## NAME, SHIFT, WIDTH)
+// [GFX10].
+#define COMPUTE_PGM_RSRC3_GFX10(NAME, SHIFT, WIDTH) \
+ AMDHSA_BITS_ENUM_ENTRY(COMPUTE_PGM_RSRC3_GFX10_##NAME, SHIFT, WIDTH)
+// [GFX10-GFX11].
+#define COMPUTE_PGM_RSRC3_GFX10_GFX11(NAME, SHIFT, WIDTH) \
+ AMDHSA_BITS_ENUM_ENTRY(COMPUTE_PGM_RSRC3_GFX10_GFX11_##NAME, SHIFT, WIDTH)
// GFX11+.
#define COMPUTE_PGM_RSRC3_GFX11_PLUS(NAME, SHIFT, WIDTH) \
AMDHSA_BITS_ENUM_ENTRY(COMPUTE_PGM_RSRC3_GFX11_PLUS_ ## NAME, SHIFT, WIDTH)
+// [GFX11].
+#define COMPUTE_PGM_RSRC3_GFX11(NAME, SHIFT, WIDTH) \
+ AMDHSA_BITS_ENUM_ENTRY(COMPUTE_PGM_RSRC3_GFX11_##NAME, SHIFT, WIDTH)
+// GFX12+.
+#define COMPUTE_PGM_RSRC3_GFX12_PLUS(NAME, SHIFT, WIDTH) \
+ AMDHSA_BITS_ENUM_ENTRY(COMPUTE_PGM_RSRC3_GFX12_PLUS_##NAME, SHIFT, WIDTH)
enum : int32_t {
- COMPUTE_PGM_RSRC3_GFX10_PLUS(SHARED_VGPR_COUNT, 0, 4),
- COMPUTE_PGM_RSRC3_GFX10(RESERVED0, 4, 8),
- COMPUTE_PGM_RSRC3_GFX11_PLUS(INST_PREF_SIZE, 4, 6),
- COMPUTE_PGM_RSRC3_GFX11_PLUS(TRAP_ON_START, 10, 1),
- COMPUTE_PGM_RSRC3_GFX11_PLUS(TRAP_ON_END, 11, 1),
- COMPUTE_PGM_RSRC3_GFX10_PLUS(RESERVED1, 12, 19),
- COMPUTE_PGM_RSRC3_GFX10(RESERVED2, 31, 1),
+ COMPUTE_PGM_RSRC3_GFX10_GFX11(SHARED_VGPR_COUNT, 0, 4),
+ COMPUTE_PGM_RSRC3_GFX12_PLUS(RESERVED0, 0, 4),
+ COMPUTE_PGM_RSRC3_GFX10(RESERVED1, 4, 8),
+ COMPUTE_PGM_RSRC3_GFX11(INST_PREF_SIZE, 4, 6),
+ COMPUTE_PGM_RSRC3_GFX11(TRAP_ON_START, 10, 1),
+ COMPUTE_PGM_RSRC3_GFX11(TRAP_ON_END, 11, 1),
+ COMPUTE_PGM_RSRC3_GFX12_PLUS(INST_PREF_SIZE, 4, 8),
+ COMPUTE_PGM_RSRC3_GFX10_PLUS(RESERVED2, 12, 1),
+ COMPUTE_PGM_RSRC3_GFX10_GFX11(RESERVED3, 13, 1),
+ COMPUTE_PGM_RSRC3_GFX12_PLUS(GLG_EN, 13, 1),
+ COMPUTE_PGM_RSRC3_GFX10_PLUS(RESERVED4, 14, 17),
+ COMPUTE_PGM_RSRC3_GFX10(RESERVED5, 31, 1),
COMPUTE_PGM_RSRC3_GFX11_PLUS(IMAGE_OP, 31, 1),
};
#undef COMPUTE_PGM_RSRC3_GFX10_PLUS
diff --git a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
index 3b69a37728ea1c..abd7e911beef3f 100644
--- a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
+++ b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
@@ -5416,11 +5416,12 @@ bool AMDGPUAsmParser::ParseDirectiveAMDHSAKernel() {
PARSE_BITS_ENTRY(KD.compute_pgm_rsrc1, COMPUTE_PGM_RSRC1_GFX10_PLUS_FWD_PROGRESS, Val,
ValRange);
} else if (ID == ".amdhsa_shared_vgpr_count") {
- if (IVersion.Major < 10)
- return Error(IDRange.Start, "directive requires gfx10+", IDRange);
+ if (IVersion.Major < 10 || IVersion.Major >= 12)
+ return Error(IDRange.Start, "directive requires gfx10 or gfx11",
+ IDRange);
SharedVGPRCount = Val;
PARSE_BITS_ENTRY(KD.compute_pgm_rsrc3,
- COMPUTE_PGM_RSRC3_GFX10_PLUS_SHARED_VGPR_COUNT, Val,
+ COMPUTE_PGM_RSRC3_GFX10_GFX11_SHARED_VGPR_COUNT, Val,
ValRange);
} else if (ID == ".amdhsa_exception_fp_ieee_invalid_op") {
PARSE_BITS_ENTRY(
@@ -5522,7 +5523,7 @@ bool AMDGPUAsmParser::ParseDirectiveAMDHSAKernel() {
(AccumOffset / 4 - 1));
}
- if (IVersion.Major >= 10) {
+ if (IVersion.Major >= 10 && IVersion.Major < 12) {
// SharedVGPRCount < 16 checked by PARSE_ENTRY_BITS
if (SharedVGPRCount && EnableWavefrontSize32 && *EnableWavefrontSize32) {
return TokError("shared_vgpr_count directive not valid on "
diff --git a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
index 6017634a73d1a7..67be7b0fd64217 100644
--- a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
+++ b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
@@ -1999,34 +1999,60 @@ MCDisassembler::DecodeStatus AMDGPUDisassembler::decodeCOMPUTE_PGM_RSRC3(
if (FourByteBuffer & COMPUTE_PGM_RSRC3_GFX90A_RESERVED1)
return MCDisassembler::Fail;
} else if (isGFX10Plus()) {
- if (!EnableWavefrontSize32 || !*EnableWavefrontSize32) {
- PRINT_DIRECTIVE(".amdhsa_shared_vgpr_count",
- COMPUTE_PGM_RSRC3_GFX10_PLUS_SHARED_VGPR_COUNT);
+ // Bits [0-3].
+ if (!isGFX12Plus()) {
+ if (!EnableWavefrontSize32 || !*EnableWavefrontSize32) {
+ PRINT_DIRECTIVE(".amdhsa_shared_vgpr_count",
+ COMPUTE_PGM_RSRC3_GFX10_GFX11_SHARED_VGPR_COUNT);
+ } else {
+ PRINT_PSEUDO_DIRECTIVE_COMMENT(
+ "SHARED_VGPR_COUNT",
+ COMPUTE_PGM_RSRC3_GFX10_GFX11_SHARED_VGPR_COUNT);
+ }
} else {
- PRINT_PSEUDO_DIRECTIVE_COMMENT(
- "SHARED_VGPR_COUNT", COMPUTE_PGM_RSRC3_GFX10_PLUS_SHARED_VGPR_COUNT);
+ if (FourByteBuffer & COMPUTE_PGM_RSRC3_GFX12_PLUS_RESERVED0)
+ return MCDisassembler::Fail;
}
- if (isGFX11Plus()) {
+ // Bits [4-11].
+ if (isGFX11()) {
PRINT_PSEUDO_DIRECTIVE_COMMENT("INST_PREF_SIZE",
- COMPUTE_PGM_RSRC3_GFX11_PLUS_INST_PREF_SIZE);
+ COMPUTE_PGM_RSRC3_GFX11_INST_PREF_SIZE);
PRINT_PSEUDO_DIRECTIVE_COMMENT("TRAP_ON_START",
- COMPUTE_PGM_RSRC3_GFX11_PLUS_TRAP_ON_START);
+ COMPUTE_PGM_RSRC3_GFX11_TRAP_ON_START);
PRINT_PSEUDO_DIRECTIVE_COMMENT("TRAP_ON_END",
- COMPUTE_PGM_RSRC3_GFX11_PLUS_TRAP_ON_END);
+ COMPUTE_PGM_RSRC3_GFX11_TRAP_ON_END);
+ } else if (isGFX12Plus()) {
+ PRINT_PSEUDO_DIRECTIVE_COMMENT(
+ "INST_PREF_SIZE", COMPUTE_PGM_RSRC3_GFX12_PLUS_INST_PREF_SIZE);
+ } else {
+ if (FourByteBuffer & COMPUTE_PGM_RSRC3_GFX10_RESERVED1)
+ return MCDisassembler::Fail;
+ }
+
+ // Bits [12].
+ if (FourByteBuffer & COMPUTE_PGM_RSRC3_GFX10_PLUS_RESERVED2)
+ return MCDisassembler::Fail;
+
+ // Bits [13].
+ if (isGFX12Plus()) {
+ PRINT_PSEUDO_DIRECTIVE_COMMENT("GLG_EN",
+ COMPUTE_PGM_RSRC3_GFX12_PLUS_GLG_EN);
} else {
- if (FourByteBuffer & COMPUTE_PGM_RSRC3_GFX10_RESERVED0)
+ if (FourByteBuffer & COMPUTE_PGM_RSRC3_GFX10_GFX11_RESERVED3)
return MCDisassembler::Fail;
}
- if (FourByteBuffer & COMPUTE_PGM_RSRC3_GFX10_PLUS_RESERVED1)
+ // Bits [14-30].
+ if (FourByteBuffer & COMPUTE_PGM_RSRC3_GFX10_PLUS_RESERVED4)
return MCDisassembler::Fail;
+ // Bits [31].
if (isGFX11Plus()) {
PRINT_PSEUDO_DIRECTIVE_COMMENT("IMAGE_OP",
- COMPUTE_PGM_RSRC3_GFX11_PLUS_TRAP_ON_START);
+ COMPUTE_PGM_RSRC3_GFX11_PLUS_IMAGE_OP);
} else {
- if (FourByteBuffer & COMPUTE_PGM_RSRC3_GFX10_RESERVED2)
+ if (FourByteBuffer & COMPUTE_PGM_RSRC3_GFX10_RESERVED5)
return MCDisassembler::Fail;
}
} else if (FourByteBuffer) {
diff --git a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp
index a855cf585205bc..e135a4e25dd15a 100644
--- a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp
+++ b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp
@@ -475,8 +475,10 @@ void AMDGPUTargetAsmStreamer::EmitAmdhsaKernelDescriptor(
PRINT_FIELD(OS, ".amdhsa_forward_progress", KD,
compute_pgm_rsrc1,
amdhsa::COMPUTE_PGM_RSRC1_GFX10_PLUS_FWD_PROGRESS);
+ }
+ if (IVersion.Major >= 10 && IVersion.Major < 12) {
PRINT_FIELD(OS, ".amdhsa_shared_vgpr_count", KD, compute_pgm_rsrc3,
- amdhsa::COMPUTE_PGM_RSRC3_GFX10_PLUS_SHARED_VGPR_COUNT);
+ amdhsa::COMPUTE_PGM_RSRC3_GFX10_GFX11_SHARED_VGPR_COUNT);
}
if (IVersion.Major >= 12)
PRINT_FIELD(OS, ".amdhsa_round_robin_scheduling", KD, compute_pgm_rsrc1,
diff --git a/llvm/test/MC/AMDGPU/hsa-diag-v4.s b/llvm/test/MC/AMDGPU/hsa-diag-v4.s
index f7a554aedb746b..069b71b7229cdd 100644
--- a/llvm/test/MC/AMDGPU/hsa-diag-v4.s
+++ b/llvm/test/MC/AMDGPU/hsa-diag-v4.s
@@ -1,6 +1,7 @@
// RUN: not llvm-mc --amdhsa-code-object-version=4 -triple amdgcn-amd-amdhsa -mcpu=gfx810 -mattr=+xnack -show-encoding %s 2>&1 >/dev/null | FileCheck %s --check-prefixes=GCN,GFX8,PREGFX10,AMDHSA
// RUN: not llvm-mc --amdhsa-code-object-version=4 -triple amdgcn-amd-amdhsa -mcpu=gfx1010 -mattr=+xnack -show-encoding %s 2>&1 >/dev/null | FileCheck %s --check-prefixes=GCN,GFX10PLUS,GFX10,AMDHSA
// RUN: not llvm-mc --amdhsa-code-object-version=4 -triple amdgcn-amd-amdhsa -mcpu=gfx1100 -show-encoding %s 2>&1 >/dev/null | FileCheck %s --check-prefixes=GCN,GFX10PLUS,GFX11,AMDHSA
+// RUN: not llvm-mc --amdhsa-code-object-version=4 -triple amdgcn-amd-amdhsa -mcpu=gfx1200 -show-encoding %s 2>&1 >/dev/null | FileCheck %s --check-prefixes=GCN,GFX10PLUS,GFX12,AMDHSA
// RUN: not llvm-mc --amdhsa-code-object-version=4 -triple amdgcn-amd- -mcpu=gfx810 -mattr=+xnack -show-encoding %s 2>&1 >/dev/null | FileCheck %s --check-prefixes=GCN,NONAMDHSA
// RUN: not llvm-mc --amdhsa-code-object-version=4 -triple amdgcn-amd-amdhsa -mcpu=gfx90a -mattr=+xnack -show-encoding %s 2>&1 >/dev/null | FileCheck %s --check-prefixes=GFX90A,PREGFX10,AMDHSA,ALL
@@ -10,6 +11,7 @@
// GFX8-NOT: error:
// GFX10: error: .amdgcn_target directive's target id amdgcn-amd-amdhsa--gfx810:xnack+ does not match the specified target id amdgcn-amd-amdhsa--gfx1010:xnack+
// GFX11: error: .amdgcn_target directive's target id amdgcn-amd-amdhsa--gfx810:xnack+ does not match the specified target id amdgcn-amd-amdhsa--gfx1100
+// GFX12: error: .amdgcn_target directive's target id amdgcn-amd-amdhsa--gfx810:xnack+ does not match the specified target id amdgcn-amd-amdhsa--gfx1200
// NONAMDHSA: error: .amdgcn_target directive's target id amdgcn-amd-amdhsa--gfx810:xnack+ does not match the specified target id amdgcn-amd-unknown--gfx810
.warning "test_target"
.amdgcn_target "amdgcn-amd-amdhsa--gfx810:xnack+"
@@ -228,8 +230,10 @@
.end_amdhsa_kernel
// GCN-LABEL: warning: test_amdhsa_shared_vgpr_count_invalid1
-// PREGFX10: error: directive requires gfx10+
-// GFX10PLUS: error: .amdhsa_next_free_vgpr directive is required
+// PREGFX10: error: directive requires gfx10 or gfx11
+// GFX10: error: .amdhsa_next_free_vgpr directive is required
+// GFX11: error: .amdhsa_next_free_vgpr directive is required
+// GFX12: error: directive requires gfx10 or gfx11
// NONAMDHSA: error: unknown directive
.warning "test_amdhsa_shared_vgpr_count_invalid1"
.amdhsa_kernel test_amdhsa_shared_vgpr_count_invalid1
@@ -237,8 +241,10 @@
.end_amdhsa_kernel
// GCN-LABEL: warning: test_amdhsa_shared_vgpr_count_invalid2
-// PREGFX10: error: directive requires gfx10+
-// GFX10PLUS: error: shared_vgpr_count directive not valid on wavefront size 32
+// PREGFX10: error: directive requires gfx10 or gfx11
+// GFX10: error: shared_vgpr_count directive not valid on wavefront size 32
+// GFX11: error: shared_vgpr_count directive not valid on wavefront size 32
+// GFX12: error: directive requires gfx10 or gfx11
// NONAMDHSA: error: unknown directive
.warning "test_amdhsa_shared_vgpr_count_invalid2"
.amdhsa_kernel test_amdhsa_shared_vgpr_count_invalid2
@@ -249,8 +255,10 @@
.end_amdhsa_kernel
// GCN-LABEL: warning: test_amdhsa_shared_vgpr_count_invalid3
-// PREGFX10: error: directive requires gfx10+
-// GFX10PLUS: error: value out of range
+// PREGFX10: error: directive requires gfx10 or gfx11
+// GFX10: error: value out of range
+// GFX11: error: value out of range
+// GFX12: error: directive requires gfx10 or gfx11
// NONAMDHSA: error: unknown directive
.warning "test_amdhsa_shared_vgpr_count_invalid3"
.amdhsa_kernel test_amdhsa_shared_vgpr_count_invalid3
@@ -260,8 +268,10 @@
.end_amdhsa_kernel
// GCN-LABEL: warning: test_amdhsa_shared_vgpr_count_invalid4
-// PREGFX10: error: directive requires gfx10+
-// GFX10PLUS: error: shared_vgpr_count*2 + compute_pgm_rsrc1.GRANULATED_WORKITEM_VGPR_COUNT cannot exceed 63
+// PREGFX10: error: directive requires gfx10 or gfx11
+// GFX10: error: shared_vgpr_count*2 + compute_pgm_rsrc1.GRANULATED_WORKITEM_VGPR_COUNT cannot exceed 63
+// GFX11: error: shared_vgpr_count*2 + compute_pgm_rsrc1.GRANULATED_WORKITEM_VGPR_COUNT cannot exceed 63
+// GFX12: error: directive requires gfx10 or gfx11
// NONAMDHSA: error: unknown directive
.warning "test_amdhsa_shared_vgpr_count_invalid4"
.amdhsa_kernel test_amdhsa_shared_vgpr_count_invalid4
diff --git a/llvm/test/MC/AMDGPU/hsa-gfx12-v4.s b/llvm/test/MC/AMDGPU/hsa-gfx12-v4.s
index efbcec21f586b9..186d98f78b986c 100644
--- a/llvm/test/MC/AMDGPU/hsa-gfx12-v4.s
+++ b/llvm/test/MC/AMDGPU/hsa-gfx12-v4.s
@@ -118,7 +118,6 @@ disabled_user_sgpr:
.amdhsa_workgroup_processor_mode 1
.amdhsa_memory_ordered 1
.amdhsa_forward_progress 1
- .amdhsa_shared_vgpr_count 0
.amdhsa_round_robin_scheduling 1
.amdhsa_exception_fp_ieee_invalid_op 1
.amdhsa_exception_fp_denorm_src 1
@@ -157,7 +156,6 @@ disabled_user_sgpr:
// ASM-NEXT: .amdhsa_workgroup_processor_mode 1
// ASM-NEXT: .amdhsa_memory_ordered 1
// ASM-NEXT: .amdhsa_forward_progress 1
-// ASM-NEXT: .amdhsa_shared_vgpr_count 0
// ASM-NEXT: .amdhsa_round_robin_scheduling 1
// ASM-NEXT: .amdhsa_exception_fp_ieee_invalid_op 1
// ASM-NEXT: .amdhsa_exception_fp_denorm_src 1
diff --git a/llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-gfx12.s b/llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-gfx12.s
new file mode 100644
index 00000000000000..e1d312d6035cb7
--- /dev/null
+++ b/llvm/test/tools/llvm-objdump/ELF/AMDGPU/kd-gfx12.s
@@ -0,0 +1,105 @@
+;; Test disassembly for gfx12 kernel descriptor.
+
+; RUN: rm -rf %t && split-file %s %t && cd %t
+
+;--- 1.s
+; RUN: llvm-mc --triple=amdgcn-amd-amdhsa -mattr=+wavefrontsize32,-wavefrontsize64 -filetype=obj -mcpu=gfx1200 < 1.s > 1.o
+; RUN: llvm-objdump --disassemble-symbols=kernel.kd 1.o | tail -n +7 | tee 1-disasm.s | FileCheck 1.s
+; RUN: llvm-mc --triple=amdgcn-amd-amdhsa -mattr=+wavefrontsize32,-wavefrontsize64 -filetype=obj -mcpu=gfx1200 < 1-disasm.s > 1-disasm.o
+; RUN: cmp 1.o 1-disasm.o
+; CHECK: .amdhsa_kernel kernel
+; CHECK-NEXT: .amdhsa_group_segment_fixed_size 0
+; CHECK-NEXT: .amdhsa_private_segment_fixed_size 0
+; CHECK-NEXT: .amdhsa_kernarg_size 0
+; CHECK-NEXT: ; INST_PREF_SIZE 0
+; CHECK-NEXT: ; GLG_EN 0
+; CHECK-NEXT: ; IMAGE_OP 0
+; CHECK-NEXT: .amdhsa_next_free_vgpr 32
+; CHECK-NEXT: .amdhsa_reserve_vcc 0
+; CHECK-NEXT: .amdhsa_reserve_xnack_mask 0
+; CHECK-NEXT: .amdhsa_next_free_sgpr 8
+; CHECK-NEXT: .amdhsa_float_round_mode_32 0
+; CHECK-NEXT: .amdhsa_float_round_mode_16_64 0
+; CHECK-NEXT: .amdhsa_float_denorm_mode_32 0
+; CHECK-NEXT: .amdhsa_float_denorm_mode_16_64 3
+; CHECK-NEXT: .amdhsa_fp16_overflow 0
+; CHECK-NEXT: .amdhsa_workgroup_processor_mode 1
+; CHECK-NEXT: .amdhsa_memory_ordered 1
+; CHECK-NEXT: .amdhsa_forward_progress 0
+; CHECK-NEXT: .amdhsa_round_robin_scheduling 0
+; CHECK-NEXT: .amdhsa_enable_private_segment 0
+; CHECK-NEXT: .amdhsa_system_sgpr_workgroup_id_x 1
+; CHECK-NEXT: .amdhsa_system_sgpr_workgroup_id_y 0
+; CHECK-NEXT: .amdhsa_system_sgpr_workgroup_id_z 0
+; CHECK-NEXT: .amdhsa_system_sgpr_workgroup_info 0
+; CHECK-NEXT: .amdhsa_system_vgpr_workitem_id 0
+; CHECK-NEXT: .amdhsa_exception_fp_ieee_invalid_op 0
+; CHECK-NEXT: .amdhsa_exception_fp_denorm_src 0
+; CHECK-NEXT: .amdhsa_exception_fp_ieee_div_zero 0
+; CHECK-NEXT: .amdhsa_exception_fp_ieee_overflow 0
+; CHECK-NEXT: .amdhsa_exception_fp_ieee_underflow 0
+; CHECK-NEXT: .amdhsa_exception_fp_ieee_inexact 0
+; CHECK-NEXT: .amdhsa_exception_int_div_zero 0
+; CHECK-NEXT: .amdhsa_user_sgpr_dispatch_ptr 0
+; CHECK-NEXT: .amdhsa_user_sgpr_queue_ptr 0
+; CHECK-NEXT: .amdhsa_user_sgpr_kernarg_segment_ptr 0
+; CHECK-NEXT: .amdhsa_user_sgpr_dispatch_id 0
+; CHECK-NEXT: .amdhsa_user_sgpr_private_segment_size 0
+; CHECK-NEXT: .amdhsa_wavefront_size32 1
+; CHECK-NEXT: .end_amdhsa_kernel
+.amdhsa_kernel kernel
+ .amdhsa_next_free_vgpr 32
+ .amdhsa_next_free_sgpr 32
+ .amdhsa_wavefront_size32 1
+.end_amdhsa_kernel
+
+;--- 2.s
+; RUN: llvm-mc --triple=amdgcn-amd-amdhsa -mattr=-wavefrontsize32,+wavefrontsize64 -filetype=obj -mcpu=gfx1200 < 2.s > 2.o
+; RUN: llvm-objdump --disassemble-symbols=kernel.kd 2.o | tail -n +7 | tee 2-disasm.s | FileCheck 2.s
+; RUN: llvm-mc --triple=amdgcn-amd-amdhsa -mattr=-wavefrontsize32,+wavefrontsize64 -filetype=obj -mcpu=gfx1200 < 2-disasm.s > 2-disasm.o
+; RUN: cmp 2.o 2-disasm.o
+; CHECK: .amdhsa_kernel kernel
+; CHECK-NEXT: .amdhsa_group_segment_fixed_size 0
+; CHECK-NEXT: .amdhsa_private_segment_fixed_size 0
+; CHECK-NEXT: .amdhsa_kernarg_size 0
+; CHECK-NEXT: ; INST_PREF_SIZE 0
+; CHECK-NEXT: ; GLG_EN 0
+; CHECK-NEXT: ; IMAGE_OP 0
+; CHECK-NEXT: .amdhsa_next_free_vgpr 32
+; CHECK-NEXT: .amdhsa_reserve_vcc 0
+; CHECK-NEXT: .amdhsa_reserve_xnack_mask 0
+; CHECK-NEXT: .amdhsa_next_free_sgpr 8
+; CHECK-NEXT: .amdhsa_float_round_mode_32 0
+; CHECK-NEXT: .amdhsa_float_round_mode_16_64 0
+; CHECK-NEXT: .amdhsa_float_denorm_mode_32 0
+; CHECK-NEXT: .amdhsa_float_denorm_mode_16_64 3
+; CHECK-NEXT: .amdhsa_fp16_overflow 0
+; CHECK-NEXT: .amdhsa_workgroup_processor_mode 1
+; CHECK-NEXT: .amdhsa_memory_ordered 1
+; CHECK-NEXT: .amdhsa_forward_progress 0
+; CHECK-NEXT: .amdhsa_round_robin_scheduling 0
+; CHECK-NEXT: .amdhsa_enable_private_segment 0
+; CHECK-NEXT: .amdhsa_system_sgpr_workgroup_id_x 1
+; CHECK-NEXT: .amdhsa_system_sgpr_workgroup_id_y 0
+; CHECK-NEXT: .amdhsa_system_sgpr_workgroup_id_z 0
+; CHECK-NEXT: .amdhsa_system_sgpr_workgroup_info 0
+; CHECK-NEXT: .amdhsa_system_vgpr_workitem_id 0
+; CHECK-NEXT: .amdhsa_exception_fp_ieee_invalid_op 0
+; CHECK-NEXT: .amdhsa_exception_fp_denorm_src 0
+; CHECK-NEXT: .amdhsa_exception_fp_ieee_div_zero 0
+; CHECK-NEXT: .amdhsa_exception_fp_ieee_overflow 0
+; CHECK-NEXT: .amdhsa_exception_fp_ieee_underflow 0
+; CHECK-NEXT: .amdhsa_exception_fp_ieee_inexact 0
+; CHECK-NEXT: .amdhsa_exception_int_div_zero 0
+; CHECK-NEXT: .amdhsa_user_sgpr_dispatch_ptr 0
+; CHECK-NEXT: .amdhsa_user_sgpr_queue_ptr 0
+; CHECK-NEXT: .amdhsa_user_sgpr_kernarg_segment_ptr 0
+; CHECK-NEXT: .amdhsa_user_sgpr_dispatch_id 0
+; CHECK-NEXT: .amdhsa_user_sgpr_private_segment_size 0
+; CHECK-NEXT: .amdhsa_wavefront_size32 0
+; CHECK-NEXT: .end_amdhsa_kernel
+.amdhsa_kernel kernel
+ .amdhsa_next_free_vgpr 32
+ .amdhsa_next_free_sgpr 32
+ .amdhsa_wavefront_size32 0
+.end_amdhsa_kernel
More information about the llvm-commits
mailing list