[llvm] r334519 - AMDHSA: Code object v3 updates
Konstantin Zhuravlyov via llvm-commits
llvm-commits at lists.llvm.org
Tue Jun 12 11:02:46 PDT 2018
Author: kzhuravl
Date: Tue Jun 12 11:02:46 2018
New Revision: 334519
URL: http://llvm.org/viewvc/llvm-project?rev=334519&view=rev
Log:
AMDHSA: Code object v3 updates
- Do not emit following assembler directives:
- .hsa_code_object_version
- .hsa_code_object_isa
- .amd_amdgpu_isa
- .amd_amdgpu_hsa_metadata
- .amd_amdgpu_pal_metadata
- Do not emit .note entries
- Cleanup and bring in sync kernel descriptor header file
- Emit kernel descriptor into .rodata with appropriate relocations and
alignments
Added:
llvm/trunk/include/llvm/Support/AMDHSAKernelDescriptor.h
llvm/trunk/test/CodeGen/AMDGPU/code-object-v3.ll
Removed:
llvm/trunk/include/llvm/Support/AMDGPUKernelDescriptor.h
Modified:
llvm/trunk/docs/AMDGPUUsage.rst
llvm/trunk/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
llvm/trunk/lib/Target/AMDGPU/AMDGPUAsmPrinter.h
llvm/trunk/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp
llvm/trunk/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.h
llvm/trunk/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
llvm/trunk/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
llvm/trunk/test/CodeGen/AMDGPU/elf-notes.ll
Modified: llvm/trunk/docs/AMDGPUUsage.rst
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/AMDGPUUsage.rst?rev=334519&r1=334518&r2=334519&view=diff
==============================================================================
--- llvm/trunk/docs/AMDGPUUsage.rst (original)
+++ llvm/trunk/docs/AMDGPUUsage.rst Tue Jun 12 11:02:46 2018
@@ -686,7 +686,7 @@ Symbols include the following:
*link-name* ``STT_OBJECT`` - ``.data`` Global variable
- ``.rodata``
- ``.bss``
- *link-name*\ ``@kd`` ``STT_OBJECT`` - ``.rodata`` Kernel descriptor
+ *link-name*\ ``.kd`` ``STT_OBJECT`` - ``.rodata`` Kernel descriptor
*link-name* ``STT_FUNC`` - ``.text`` Kernel entry point
===================== ============== ============= ==================
@@ -1578,7 +1578,7 @@ CP microcode requires the Kernel descrit
======= ======= =============================== ============================
Bits Size Field Name Description
======= ======= =============================== ============================
- 31:0 4 bytes GroupSegmentFixedSize The amount of fixed local
+ 31:0 4 bytes GROUP_SEGMENT_FIXED_SIZE The amount of fixed local
address space memory
required for a work-group
in bytes. This does not
@@ -1587,7 +1587,7 @@ CP microcode requires the Kernel descrit
space memory that may be
added when the kernel is
dispatched.
- 63:32 4 bytes PrivateSegmentFixedSize The amount of fixed
+ 63:32 4 bytes PRIVATE_SEGMENT_FIXED_SIZE The amount of fixed
private address space
memory required for a
work-item in bytes. If
@@ -1596,7 +1596,7 @@ CP microcode requires the Kernel descrit
be added to this value for
the call stack.
127:64 8 bytes Reserved, must be 0.
- 191:128 8 bytes KernelCodeEntryByteOffset Byte offset (possibly
+ 191:128 8 bytes KERNEL_CODE_ENTRY_BYTE_OFFSET Byte offset (possibly
negative) from base
address of kernel
descriptor to kernel's
@@ -1605,22 +1605,22 @@ CP microcode requires the Kernel descrit
aligned.
383:192 24 Reserved, must be 0.
bytes
- 415:384 4 bytes ComputePgmRsrc1 Compute Shader (CS)
+ 415:384 4 bytes COMPUTE_PGM_RSRC1 Compute Shader (CS)
program settings used by
CP to set up
``COMPUTE_PGM_RSRC1``
configuration
register. See
:ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx9-table`.
- 447:416 4 bytes ComputePgmRsrc2 Compute Shader (CS)
+ 447:416 4 bytes COMPUTE_PGM_RSRC2 Compute Shader (CS)
program settings used by
CP to set up
``COMPUTE_PGM_RSRC2``
configuration
register. See
:ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx9-table`.
- 448 1 bit EnableSGPRPrivateSegmentBuffer Enable the setup of the
- SGPR user data registers
+ 448 1 bit ENABLE_SGPR_PRIVATE_SEGMENT Enable the setup of the
+ _BUFFER SGPR user data registers
(see
:ref:`amdgpu-amdhsa-initial-kernel-execution-state`).
@@ -1631,18 +1631,19 @@ CP microcode requires the Kernel descrit
``compute_pgm_rsrc2.user_sgpr.user_sgpr_count``.
Any requests beyond 16
will be ignored.
- 449 1 bit EnableSGPRDispatchPtr *see above*
- 450 1 bit EnableSGPRQueuePtr *see above*
- 451 1 bit EnableSGPRKernargSegmentPtr *see above*
- 452 1 bit EnableSGPRDispatchID *see above*
- 453 1 bit EnableSGPRFlatScratchInit *see above*
- 454 1 bit EnableSGPRPrivateSegmentSize *see above*
- 455 1 bit EnableSGPRGridWorkgroupCountX Not implemented in CP and
- should always be 0.
- 456 1 bit EnableSGPRGridWorkgroupCountY Not implemented in CP and
- should always be 0.
- 457 1 bit EnableSGPRGridWorkgroupCountZ Not implemented in CP and
- should always be 0.
+ 449 1 bit ENABLE_SGPR_DISPATCH_PTR *see above*
+ 450 1 bit ENABLE_SGPR_QUEUE_PTR *see above*
+ 451 1 bit ENABLE_SGPR_KERNARG_SEGMENT_PTR *see above*
+ 452 1 bit ENABLE_SGPR_DISPATCH_ID *see above*
+ 453 1 bit ENABLE_SGPR_FLAT_SCRATCH_INIT *see above*
+ 454 1 bit ENABLE_SGPR_PRIVATE_SEGMENT *see above*
+ _SIZE
+ 455 1 bit ENABLE_SGPR_GRID_WORKGROUP Not implemented in CP and
+ _COUNT_X should always be 0.
+ 456 1 bit ENABLE_SGPR_GRID_WORKGROUP Not implemented in CP and
+ _COUNT_Y should always be 0.
+ 457 1 bit ENABLE_SGPR_GRID_WORKGROUP Not implemented in CP and
+ _COUNT_Z should always be 0.
463:458 6 bits Reserved, must be 0.
511:464 6 Reserved, must be 0.
bytes
@@ -1996,10 +1997,10 @@ CP microcode requires the Kernel descrit
====================================== ===== ==============================
Enumeration Name Value Description
====================================== ===== ==============================
- AMDGPU_FLOAT_ROUND_MODE_NEAR_EVEN 0 Round Ties To Even
- AMDGPU_FLOAT_ROUND_MODE_PLUS_INFINITY 1 Round Toward +infinity
- AMDGPU_FLOAT_ROUND_MODE_MINUS_INFINITY 2 Round Toward -infinity
- AMDGPU_FLOAT_ROUND_MODE_ZERO 3 Round Toward 0
+ FLOAT_ROUND_MODE_NEAR_EVEN 0 Round Ties To Even
+ FLOAT_ROUND_MODE_PLUS_INFINITY 1 Round Toward +infinity
+ FLOAT_ROUND_MODE_MINUS_INFINITY 2 Round Toward -infinity
+ FLOAT_ROUND_MODE_ZERO 3 Round Toward 0
====================================== ===== ==============================
..
@@ -2010,11 +2011,11 @@ CP microcode requires the Kernel descrit
====================================== ===== ==============================
Enumeration Name Value Description
====================================== ===== ==============================
- AMDGPU_FLOAT_DENORM_MODE_FLUSH_SRC_DST 0 Flush Source and Destination
+ FLOAT_DENORM_MODE_FLUSH_SRC_DST 0 Flush Source and Destination
Denorms
- AMDGPU_FLOAT_DENORM_MODE_FLUSH_DST 1 Flush Output Denorms
- AMDGPU_FLOAT_DENORM_MODE_FLUSH_SRC 2 Flush Source Denorms
- AMDGPU_FLOAT_DENORM_MODE_FLUSH_NONE 3 No Flush
+ FLOAT_DENORM_MODE_FLUSH_DST 1 Flush Output Denorms
+ FLOAT_DENORM_MODE_FLUSH_SRC 2 Flush Source Denorms
+ FLOAT_DENORM_MODE_FLUSH_NONE 3 No Flush
====================================== ===== ==============================
..
@@ -2025,13 +2026,13 @@ CP microcode requires the Kernel descrit
======================================== ===== ============================
Enumeration Name Value Description
======================================== ===== ============================
- AMDGPU_SYSTEM_VGPR_WORKITEM_ID_X 0 Set work-item X dimension
+ SYSTEM_VGPR_WORKITEM_ID_X 0 Set work-item X dimension
ID.
- AMDGPU_SYSTEM_VGPR_WORKITEM_ID_X_Y 1 Set work-item X and Y
+ SYSTEM_VGPR_WORKITEM_ID_X_Y 1 Set work-item X and Y
dimensions ID.
- AMDGPU_SYSTEM_VGPR_WORKITEM_ID_X_Y_Z 2 Set work-item X, Y and Z
+ SYSTEM_VGPR_WORKITEM_ID_X_Y_Z 2 Set work-item X, Y and Z
dimensions ID.
- AMDGPU_SYSTEM_VGPR_WORKITEM_ID_UNDEFINED 3 Undefined.
+ SYSTEM_VGPR_WORKITEM_ID_UNDEFINED 3 Undefined.
======================================== ===== ============================
.. _amdgpu-amdhsa-initial-kernel-execution-state:
Removed: llvm/trunk/include/llvm/Support/AMDGPUKernelDescriptor.h
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Support/AMDGPUKernelDescriptor.h?rev=334518&view=auto
==============================================================================
--- llvm/trunk/include/llvm/Support/AMDGPUKernelDescriptor.h (original)
+++ llvm/trunk/include/llvm/Support/AMDGPUKernelDescriptor.h (removed)
@@ -1,139 +0,0 @@
-//===--- AMDGPUKernelDescriptor.h -------------------------------*- C++ -*-===//
-//
-// The LLVM Compiler Infrastructure
-//
-// This file is distributed under the University of Illinois Open Source
-// License. See LICENSE.TXT for details.
-//
-//===----------------------------------------------------------------------===//
-//
-/// \file
-/// AMDGPU kernel descriptor definitions. For more information, visit
-/// https://llvm.org/docs/AMDGPUUsage.html#kernel-descriptor-for-gfx6-gfx9
-//
-//===----------------------------------------------------------------------===//
-
-#ifndef LLVM_SUPPORT_AMDGPUKERNELDESCRIPTOR_H
-#define LLVM_SUPPORT_AMDGPUKERNELDESCRIPTOR_H
-
-#include <cstdint>
-
-// Creates enumeration entries used for packing bits into integers. Enumeration
-// entries include bit shift amount, bit width, and bit mask.
-#define AMDGPU_BITS_ENUM_ENTRY(name, shift, width) \
- name ## _SHIFT = (shift), \
- name ## _WIDTH = (width), \
- name = (((1 << (width)) - 1) << (shift)) \
-
-// Gets bits for specified bit mask from specified source.
-#define AMDGPU_BITS_GET(src, mask) \
- ((src & mask) >> mask ## _SHIFT) \
-
-// Sets bits for specified bit mask in specified destination.
-#define AMDGPU_BITS_SET(dst, mask, val) \
- dst &= (~(1 << mask ## _SHIFT) & ~mask); \
- dst |= (((val) << mask ## _SHIFT) & mask) \
-
-namespace llvm {
-namespace AMDGPU {
-namespace HSAKD {
-
-/// Floating point rounding modes.
-enum : uint8_t {
- AMDGPU_FLOAT_ROUND_MODE_NEAR_EVEN = 0,
- AMDGPU_FLOAT_ROUND_MODE_PLUS_INFINITY = 1,
- AMDGPU_FLOAT_ROUND_MODE_MINUS_INFINITY = 2,
- AMDGPU_FLOAT_ROUND_MODE_ZERO = 3,
-};
-
-/// Floating point denorm modes.
-enum : uint8_t {
- AMDGPU_FLOAT_DENORM_MODE_FLUSH_SRC_DST = 0,
- AMDGPU_FLOAT_DENORM_MODE_FLUSH_DST = 1,
- AMDGPU_FLOAT_DENORM_MODE_FLUSH_SRC = 2,
- AMDGPU_FLOAT_DENORM_MODE_FLUSH_NONE = 3,
-};
-
-/// System VGPR workitem IDs.
-enum : uint8_t {
- AMDGPU_SYSTEM_VGPR_WORKITEM_ID_X = 0,
- AMDGPU_SYSTEM_VGPR_WORKITEM_ID_X_Y = 1,
- AMDGPU_SYSTEM_VGPR_WORKITEM_ID_X_Y_Z = 2,
- AMDGPU_SYSTEM_VGPR_WORKITEM_ID_UNDEFINED = 3,
-};
-
-/// Compute program resource register one layout.
-enum ComputePgmRsrc1 {
- AMDGPU_BITS_ENUM_ENTRY(GRANULATED_WORKITEM_VGPR_COUNT, 0, 6),
- AMDGPU_BITS_ENUM_ENTRY(GRANULATED_WAVEFRONT_SGPR_COUNT, 6, 4),
- AMDGPU_BITS_ENUM_ENTRY(PRIORITY, 10, 2),
- AMDGPU_BITS_ENUM_ENTRY(FLOAT_ROUND_MODE_32, 12, 2),
- AMDGPU_BITS_ENUM_ENTRY(FLOAT_ROUND_MODE_16_64, 14, 2),
- AMDGPU_BITS_ENUM_ENTRY(FLOAT_DENORM_MODE_32, 16, 2),
- AMDGPU_BITS_ENUM_ENTRY(FLOAT_DENORM_MODE_16_64, 18, 2),
- AMDGPU_BITS_ENUM_ENTRY(PRIV, 20, 1),
- AMDGPU_BITS_ENUM_ENTRY(ENABLE_DX10_CLAMP, 21, 1),
- AMDGPU_BITS_ENUM_ENTRY(DEBUG_MODE, 22, 1),
- AMDGPU_BITS_ENUM_ENTRY(ENABLE_IEEE_MODE, 23, 1),
- AMDGPU_BITS_ENUM_ENTRY(BULKY, 24, 1),
- AMDGPU_BITS_ENUM_ENTRY(CDBG_USER, 25, 1),
- AMDGPU_BITS_ENUM_ENTRY(FP16_OVFL, 26, 1),
- AMDGPU_BITS_ENUM_ENTRY(RESERVED0, 27, 5),
-};
-
-/// Compute program resource register two layout.
-enum ComputePgmRsrc2 {
- AMDGPU_BITS_ENUM_ENTRY(ENABLE_SGPR_PRIVATE_SEGMENT_WAVE_OFFSET, 0, 1),
- AMDGPU_BITS_ENUM_ENTRY(USER_SGPR_COUNT, 1, 5),
- AMDGPU_BITS_ENUM_ENTRY(ENABLE_TRAP_HANDLER, 6, 1),
- AMDGPU_BITS_ENUM_ENTRY(ENABLE_SGPR_WORKGROUP_ID_X, 7, 1),
- AMDGPU_BITS_ENUM_ENTRY(ENABLE_SGPR_WORKGROUP_ID_Y, 8, 1),
- AMDGPU_BITS_ENUM_ENTRY(ENABLE_SGPR_WORKGROUP_ID_Z, 9, 1),
- AMDGPU_BITS_ENUM_ENTRY(ENABLE_SGPR_WORKGROUP_INFO, 10, 1),
- AMDGPU_BITS_ENUM_ENTRY(ENABLE_VGPR_WORKITEM_ID, 11, 2),
- AMDGPU_BITS_ENUM_ENTRY(ENABLE_EXCEPTION_ADDRESS_WATCH, 13, 1),
- AMDGPU_BITS_ENUM_ENTRY(ENABLE_EXCEPTION_MEMORY, 14, 1),
- AMDGPU_BITS_ENUM_ENTRY(GRANULATED_LDS_SIZE, 15, 9),
- AMDGPU_BITS_ENUM_ENTRY(ENABLE_EXCEPTION_IEEE_754_FP_INVALID_OPERATION, 24, 1),
- AMDGPU_BITS_ENUM_ENTRY(ENABLE_EXCEPTION_FP_DENORMAL_SOURCE, 25, 1),
- AMDGPU_BITS_ENUM_ENTRY(ENABLE_EXCEPTION_IEEE_754_FP_DIVISION_BY_ZERO, 26, 1),
- AMDGPU_BITS_ENUM_ENTRY(ENABLE_EXCEPTION_IEEE_754_FP_OVERFLOW, 27, 1),
- AMDGPU_BITS_ENUM_ENTRY(ENABLE_EXCEPTION_IEEE_754_FP_UNDERFLOW, 28, 1),
- AMDGPU_BITS_ENUM_ENTRY(ENABLE_EXCEPTION_IEEE_754_FP_INEXACT, 29, 1),
- AMDGPU_BITS_ENUM_ENTRY(ENABLE_EXCEPTION_INT_DIVIDE_BY_ZERO, 30, 1),
- AMDGPU_BITS_ENUM_ENTRY(RESERVED1, 31, 1),
-};
-
-/// Kernel descriptor layout. This layout should be kept backwards
-/// compatible as it is consumed by the command processor.
-struct KernelDescriptor final {
- uint32_t GroupSegmentFixedSize;
- uint32_t PrivateSegmentFixedSize;
- uint32_t MaxFlatWorkGroupSize;
- uint64_t IsDynamicCallStack : 1;
- uint64_t IsXNACKEnabled : 1;
- uint64_t Reserved0 : 30;
- int64_t KernelCodeEntryByteOffset;
- uint64_t Reserved1[3];
- uint32_t ComputePgmRsrc1;
- uint32_t ComputePgmRsrc2;
- uint64_t EnableSGPRPrivateSegmentBuffer : 1;
- uint64_t EnableSGPRDispatchPtr : 1;
- uint64_t EnableSGPRQueuePtr : 1;
- uint64_t EnableSGPRKernargSegmentPtr : 1;
- uint64_t EnableSGPRDispatchID : 1;
- uint64_t EnableSGPRFlatScratchInit : 1;
- uint64_t EnableSGPRPrivateSegmentSize : 1;
- uint64_t EnableSGPRGridWorkgroupCountX : 1;
- uint64_t EnableSGPRGridWorkgroupCountY : 1;
- uint64_t EnableSGPRGridWorkgroupCountZ : 1;
- uint64_t Reserved2 : 54;
-
- KernelDescriptor() = default;
-};
-
-} // end namespace HSAKD
-} // end namespace AMDGPU
-} // end namespace llvm
-
-#endif // LLVM_SUPPORT_AMDGPUKERNELDESCRIPTOR_H
Added: llvm/trunk/include/llvm/Support/AMDHSAKernelDescriptor.h
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Support/AMDHSAKernelDescriptor.h?rev=334519&view=auto
==============================================================================
--- llvm/trunk/include/llvm/Support/AMDHSAKernelDescriptor.h (added)
+++ llvm/trunk/include/llvm/Support/AMDHSAKernelDescriptor.h Tue Jun 12 11:02:46 2018
@@ -0,0 +1,187 @@
+//===--- AMDHSAKernelDescriptor.h -----------------------------*- C++ -*---===//
+//
+// The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
+//
+/// \file
+/// AMDHSA kernel descriptor definitions. For more information, visit
+/// https://llvm.org/docs/AMDGPUUsage.html#kernel-descriptor
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_SUPPORT_AMDHSAKERNELDESCRIPTOR_H
+#define LLVM_SUPPORT_AMDHSAKERNELDESCRIPTOR_H
+
+#include <cstdint>
+
+// Gets offset of specified member in specified type.
+#ifndef offsetof
+#define offsetof(TYPE, MEMBER) ((size_t)&((TYPE*)0)->MEMBER)
+#endif // offsetof
+
+// Creates enumeration entries used for packing bits into integers. Enumeration
+// entries include bit shift amount, bit width, and bit mask.
+#ifndef AMDHSA_BITS_ENUM_ENTRY
+#define AMDHSA_BITS_ENUM_ENTRY(NAME, SHIFT, WIDTH) \
+ NAME ## _SHIFT = (SHIFT), \
+ NAME ## _WIDTH = (WIDTH), \
+ NAME = (((1 << (WIDTH)) - 1) << (SHIFT))
+#endif // AMDHSA_BITS_ENUM_ENTRY
+
+// Gets bits for specified bit mask from specified source.
+#ifndef AMDHSA_BITS_GET
+#define AMDHSA_BITS_GET(SRC, MSK) ((SRC & MSK) >> MSK ## _SHIFT)
+#endif // AMDHSA_BITS_GET
+
+// Sets bits for specified bit mask in specified destination.
+#ifndef AMDHSA_BITS_SET
+#define AMDHSA_BITS_SET(DST, MSK, VAL) \
+ DST &= ~MSK; \
+ DST |= ((VAL << MSK ## _SHIFT) & MSK)
+#endif // AMDHSA_BITS_SET
+
+namespace llvm {
+namespace amdhsa {
+
+// Floating point rounding modes. Must be kept backwards compatible.
+enum : uint8_t {
+ FLOAT_ROUND_MODE_NEAR_EVEN = 0,
+ FLOAT_ROUND_MODE_PLUS_INFINITY = 1,
+ FLOAT_ROUND_MODE_MINUS_INFINITY = 2,
+ FLOAT_ROUND_MODE_ZERO = 3,
+};
+
+// Floating point denorm modes. Must be kept backwards compatible.
+enum : uint8_t {
+ FLOAT_DENORM_MODE_FLUSH_SRC_DST = 0,
+ FLOAT_DENORM_MODE_FLUSH_DST = 1,
+ FLOAT_DENORM_MODE_FLUSH_SRC = 2,
+ FLOAT_DENORM_MODE_FLUSH_NONE = 3,
+};
+
+// System VGPR workitem IDs. Must be kept backwards compatible.
+enum : uint8_t {
+ SYSTEM_VGPR_WORKITEM_ID_X = 0,
+ SYSTEM_VGPR_WORKITEM_ID_X_Y = 1,
+ SYSTEM_VGPR_WORKITEM_ID_X_Y_Z = 2,
+ SYSTEM_VGPR_WORKITEM_ID_UNDEFINED = 3,
+};
+
+// Compute program resource register 1. Must be kept backwards compatible.
+#define COMPUTE_PGM_RSRC1(NAME, SHIFT, WIDTH) \
+ AMDHSA_BITS_ENUM_ENTRY(COMPUTE_PGM_RSRC1_ ## NAME, SHIFT, WIDTH)
+enum : int32_t {
+ COMPUTE_PGM_RSRC1(GRANULATED_WORKITEM_VGPR_COUNT, 0, 6),
+ COMPUTE_PGM_RSRC1(GRANULATED_WAVEFRONT_SGPR_COUNT, 6, 4),
+ COMPUTE_PGM_RSRC1(PRIORITY, 10, 2),
+ COMPUTE_PGM_RSRC1(FLOAT_ROUND_MODE_32, 12, 2),
+ COMPUTE_PGM_RSRC1(FLOAT_ROUND_MODE_16_64, 14, 2),
+ COMPUTE_PGM_RSRC1(FLOAT_DENORM_MODE_32, 16, 2),
+ COMPUTE_PGM_RSRC1(FLOAT_DENORM_MODE_16_64, 18, 2),
+ COMPUTE_PGM_RSRC1(PRIV, 20, 1),
+ COMPUTE_PGM_RSRC1(ENABLE_DX10_CLAMP, 21, 1),
+ COMPUTE_PGM_RSRC1(DEBUG_MODE, 22, 1),
+ COMPUTE_PGM_RSRC1(ENABLE_IEEE_MODE, 23, 1),
+ COMPUTE_PGM_RSRC1(BULKY, 24, 1),
+ COMPUTE_PGM_RSRC1(CDBG_USER, 25, 1),
+ COMPUTE_PGM_RSRC1(FP16_OVFL, 26, 1),
+ COMPUTE_PGM_RSRC1(RESERVED, 27, 5),
+};
+#undef COMPUTE_PGM_RSRC1
+
+// Compute program resource register 2. Must be kept backwards compatible.
+#define COMPUTE_PGM_RSRC2(NAME, SHIFT, WIDTH) \
+ AMDHSA_BITS_ENUM_ENTRY(COMPUTE_PGM_RSRC2_ ## NAME, SHIFT, WIDTH)
+enum : int32_t {
+ COMPUTE_PGM_RSRC2(ENABLE_SGPR_PRIVATE_SEGMENT_WAVEFRONT_OFFSET, 0, 1),
+ COMPUTE_PGM_RSRC2(USER_SGPR_COUNT, 1, 5),
+ COMPUTE_PGM_RSRC2(ENABLE_TRAP_HANDLER, 6, 1),
+ COMPUTE_PGM_RSRC2(ENABLE_SGPR_WORKGROUP_ID_X, 7, 1),
+ COMPUTE_PGM_RSRC2(ENABLE_SGPR_WORKGROUP_ID_Y, 8, 1),
+ COMPUTE_PGM_RSRC2(ENABLE_SGPR_WORKGROUP_ID_Z, 9, 1),
+ COMPUTE_PGM_RSRC2(ENABLE_SGPR_WORKGROUP_INFO, 10, 1),
+ COMPUTE_PGM_RSRC2(ENABLE_VGPR_WORKITEM_ID, 11, 2),
+ COMPUTE_PGM_RSRC2(ENABLE_EXCEPTION_ADDRESS_WATCH, 13, 1),
+ COMPUTE_PGM_RSRC2(ENABLE_EXCEPTION_MEMORY, 14, 1),
+ COMPUTE_PGM_RSRC2(GRANULATED_LDS_SIZE, 15, 9),
+ COMPUTE_PGM_RSRC2(ENABLE_EXCEPTION_IEEE_754_FP_INVALID_OPERATION, 24, 1),
+ COMPUTE_PGM_RSRC2(ENABLE_EXCEPTION_FP_DENORMAL_SOURCE, 25, 1),
+ COMPUTE_PGM_RSRC2(ENABLE_EXCEPTION_IEEE_754_FP_DIVISION_BY_ZERO, 26, 1),
+ COMPUTE_PGM_RSRC2(ENABLE_EXCEPTION_IEEE_754_FP_OVERFLOW, 27, 1),
+ COMPUTE_PGM_RSRC2(ENABLE_EXCEPTION_IEEE_754_FP_UNDERFLOW, 28, 1),
+ COMPUTE_PGM_RSRC2(ENABLE_EXCEPTION_IEEE_754_FP_INEXACT, 29, 1),
+ COMPUTE_PGM_RSRC2(ENABLE_EXCEPTION_INT_DIVIDE_BY_ZERO, 30, 1),
+ COMPUTE_PGM_RSRC2(RESERVED, 31, 1),
+};
+#undef COMPUTE_PGM_RSRC2
+
+// Kernel code properties. Must be kept backwards compatible.
+#define KERNEL_CODE_PROPERTY(NAME, SHIFT, WIDTH) \
+ AMDHSA_BITS_ENUM_ENTRY(KERNEL_CODE_PROPERTY_ ## NAME, SHIFT, WIDTH)
+enum : int32_t {
+ KERNEL_CODE_PROPERTY(ENABLE_SGPR_PRIVATE_SEGMENT_BUFFER, 0, 1),
+ KERNEL_CODE_PROPERTY(ENABLE_SGPR_DISPATCH_PTR, 1, 1),
+ KERNEL_CODE_PROPERTY(ENABLE_SGPR_QUEUE_PTR, 2, 1),
+ KERNEL_CODE_PROPERTY(ENABLE_SGPR_KERNARG_SEGMENT_PTR, 3, 1),
+ KERNEL_CODE_PROPERTY(ENABLE_SGPR_DISPATCH_ID, 4, 1),
+ KERNEL_CODE_PROPERTY(ENABLE_SGPR_FLAT_SCRATCH_INIT, 5, 1),
+ KERNEL_CODE_PROPERTY(ENABLE_SGPR_PRIVATE_SEGMENT_SIZE, 6, 1),
+ KERNEL_CODE_PROPERTY(ENABLE_SGPR_GRID_WORKGROUP_COUNT_X, 7, 1),
+ KERNEL_CODE_PROPERTY(ENABLE_SGPR_GRID_WORKGROUP_COUNT_Y, 8, 1),
+ KERNEL_CODE_PROPERTY(ENABLE_SGPR_GRID_WORKGROUP_COUNT_Z, 9, 1),
+ KERNEL_CODE_PROPERTY(RESERVED, 10, 6),
+};
+#undef KERNEL_CODE_PROPERTY
+
+// Kernel descriptor. Must be kept backwards compatible.
+struct kernel_descriptor_t {
+ uint32_t group_segment_fixed_size;
+ uint32_t private_segment_fixed_size;
+ uint8_t reserved0[8];
+ int64_t kernel_code_entry_byte_offset;
+ uint8_t reserved1[24];
+ uint32_t compute_pgm_rsrc1;
+ uint32_t compute_pgm_rsrc2;
+ uint16_t kernel_code_properties;
+ uint8_t reserved2[6];
+};
+
+static_assert(
+ sizeof(kernel_descriptor_t) == 64,
+ "invalid size for kernel_descriptor_t");
+static_assert(
+ offsetof(kernel_descriptor_t, group_segment_fixed_size) == 0,
+ "invalid offset for group_segment_fixed_size");
+static_assert(
+ offsetof(kernel_descriptor_t, private_segment_fixed_size) == 4,
+ "invalid offset for private_segment_fixed_size");
+static_assert(
+ offsetof(kernel_descriptor_t, reserved0) == 8,
+ "invalid offset for reserved0");
+static_assert(
+ offsetof(kernel_descriptor_t, kernel_code_entry_byte_offset) == 16,
+ "invalid offset for kernel_code_entry_byte_offset");
+static_assert(
+ offsetof(kernel_descriptor_t, reserved1) == 24,
+ "invalid offset for reserved1");
+static_assert(
+ offsetof(kernel_descriptor_t, compute_pgm_rsrc1) == 48,
+ "invalid offset for compute_pgm_rsrc1");
+static_assert(
+ offsetof(kernel_descriptor_t, compute_pgm_rsrc2) == 52,
+ "invalid offset for compute_pgm_rsrc2");
+static_assert(
+ offsetof(kernel_descriptor_t, kernel_code_properties) == 56,
+ "invalid offset for kernel_code_properties");
+static_assert(
+ offsetof(kernel_descriptor_t, reserved2) == 58,
+ "invalid offset for reserved2");
+
+} // end namespace amdhsa
+} // end namespace llvm
+
+#endif // LLVM_SUPPORT_AMDHSAKERNELDESCRIPTOR_H
Modified: llvm/trunk/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp?rev=334519&r1=334518&r2=334519&view=diff
==============================================================================
--- llvm/trunk/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp (original)
+++ llvm/trunk/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp Tue Jun 12 11:02:46 2018
@@ -116,6 +116,10 @@ AMDGPUTargetStreamer* AMDGPUAsmPrinter::
}
void AMDGPUAsmPrinter::EmitStartOfAsmFile(Module &M) {
+ if (IsaInfo::hasCodeObjectV3(getSTI()) &&
+ TM.getTargetTriple().getOS() == Triple::AMDHSA)
+ return;
+
if (TM.getTargetTriple().getOS() != Triple::AMDHSA &&
TM.getTargetTriple().getOS() != Triple::AMDPAL)
return;
@@ -126,10 +130,6 @@ void AMDGPUAsmPrinter::EmitStartOfAsmFil
if (TM.getTargetTriple().getOS() == Triple::AMDPAL)
readPALMetadata(M);
- // Deprecated notes are not emitted for code object v3.
- if (IsaInfo::hasCodeObjectV3(getSTI()->getFeatureBits()))
- return;
-
// HSA emits NT_AMDGPU_HSA_CODE_OBJECT_VERSION for code objects v2.
if (TM.getTargetTriple().getOS() == Triple::AMDHSA)
getTargetStreamer()->EmitDirectiveHSACodeObjectVersion(2, 1);
@@ -141,6 +141,10 @@ void AMDGPUAsmPrinter::EmitStartOfAsmFil
}
void AMDGPUAsmPrinter::EmitEndOfAsmFile(Module &M) {
+ // TODO: Add metadata to code object v3.
+ if (IsaInfo::hasCodeObjectV3(getSTI()) &&
+ TM.getTargetTriple().getOS() == Triple::AMDHSA)
+ return;
// Following code requires TargetStreamer to be present.
if (!getTargetStreamer())
@@ -186,8 +190,11 @@ bool AMDGPUAsmPrinter::isBlockOnlyReacha
}
void AMDGPUAsmPrinter::EmitFunctionBodyStart() {
- const AMDGPUMachineFunction *MFI = MF->getInfo<AMDGPUMachineFunction>();
- if (!MFI->isEntryFunction())
+ const SIMachineFunctionInfo &MFI = *MF->getInfo<SIMachineFunctionInfo>();
+ if (!MFI.isEntryFunction())
+ return;
+ if (IsaInfo::hasCodeObjectV3(getSTI()) &&
+ TM.getTargetTriple().getOS() == Triple::AMDHSA)
return;
const AMDGPUSubtarget &STM = MF->getSubtarget<AMDGPUSubtarget>();
@@ -205,7 +212,27 @@ void AMDGPUAsmPrinter::EmitFunctionBodyS
getHSADebugProps(*MF, CurrentProgramInfo));
}
+void AMDGPUAsmPrinter::EmitFunctionBodyEnd() {
+ const SIMachineFunctionInfo &MFI = *MF->getInfo<SIMachineFunctionInfo>();
+ if (!MFI.isEntryFunction())
+ return;
+ if (!IsaInfo::hasCodeObjectV3(getSTI()) ||
+ TM.getTargetTriple().getOS() != Triple::AMDHSA)
+ return;
+
+ SmallString<128> KernelName;
+ getNameWithPrefix(KernelName, &MF->getFunction());
+ getTargetStreamer()->EmitAmdhsaKernelDescriptor(
+ KernelName, getAmdhsaKernelDescriptor(*MF, CurrentProgramInfo));
+}
+
void AMDGPUAsmPrinter::EmitFunctionEntryLabel() {
+ if (IsaInfo::hasCodeObjectV3(getSTI()) &&
+ TM.getTargetTriple().getOS() == Triple::AMDHSA) {
+ AsmPrinter::EmitFunctionEntryLabel();
+ return;
+ }
+
const SIMachineFunctionInfo *MFI = MF->getInfo<SIMachineFunctionInfo>();
const AMDGPUSubtarget &STM = MF->getSubtarget<AMDGPUSubtarget>();
if (MFI->isEntryFunction() && STM.isAmdCodeObjectV2(MF->getFunction())) {
@@ -288,6 +315,70 @@ void AMDGPUAsmPrinter::emitCommonFunctio
false);
}
+uint16_t AMDGPUAsmPrinter::getAmdhsaKernelCodeProperties(
+ const MachineFunction &MF) const {
+ const SIMachineFunctionInfo &MFI = *MF.getInfo<SIMachineFunctionInfo>();
+ uint16_t KernelCodeProperties = 0;
+
+ if (MFI.hasPrivateSegmentBuffer()) {
+ KernelCodeProperties |=
+ amdhsa::KERNEL_CODE_PROPERTY_ENABLE_SGPR_PRIVATE_SEGMENT_BUFFER;
+ }
+ if (MFI.hasDispatchPtr()) {
+ KernelCodeProperties |=
+ amdhsa::KERNEL_CODE_PROPERTY_ENABLE_SGPR_DISPATCH_PTR;
+ }
+ if (MFI.hasQueuePtr()) {
+ KernelCodeProperties |=
+ amdhsa::KERNEL_CODE_PROPERTY_ENABLE_SGPR_QUEUE_PTR;
+ }
+ if (MFI.hasKernargSegmentPtr()) {
+ KernelCodeProperties |=
+ amdhsa::KERNEL_CODE_PROPERTY_ENABLE_SGPR_KERNARG_SEGMENT_PTR;
+ }
+ if (MFI.hasDispatchID()) {
+ KernelCodeProperties |=
+ amdhsa::KERNEL_CODE_PROPERTY_ENABLE_SGPR_DISPATCH_ID;
+ }
+ if (MFI.hasFlatScratchInit()) {
+ KernelCodeProperties |=
+ amdhsa::KERNEL_CODE_PROPERTY_ENABLE_SGPR_FLAT_SCRATCH_INIT;
+ }
+ if (MFI.hasGridWorkgroupCountX()) {
+ KernelCodeProperties |=
+ amdhsa::KERNEL_CODE_PROPERTY_ENABLE_SGPR_GRID_WORKGROUP_COUNT_X;
+ }
+ if (MFI.hasGridWorkgroupCountY()) {
+ KernelCodeProperties |=
+ amdhsa::KERNEL_CODE_PROPERTY_ENABLE_SGPR_GRID_WORKGROUP_COUNT_Y;
+ }
+ if (MFI.hasGridWorkgroupCountZ()) {
+ KernelCodeProperties |=
+ amdhsa::KERNEL_CODE_PROPERTY_ENABLE_SGPR_GRID_WORKGROUP_COUNT_Z;
+ }
+
+ return KernelCodeProperties;
+}
+
+amdhsa::kernel_descriptor_t AMDGPUAsmPrinter::getAmdhsaKernelDescriptor(
+ const MachineFunction &MF,
+ const SIProgramInfo &PI) const {
+ amdhsa::kernel_descriptor_t KernelDescriptor;
+ memset(&KernelDescriptor, 0x0, sizeof(KernelDescriptor));
+
+ assert(isUInt<32>(PI.ScratchSize));
+ assert(isUInt<32>(PI.ComputePGMRSrc1));
+ assert(isUInt<32>(PI.ComputePGMRSrc2));
+
+ KernelDescriptor.group_segment_fixed_size = PI.LDSSize;
+ KernelDescriptor.private_segment_fixed_size = PI.ScratchSize;
+ KernelDescriptor.compute_pgm_rsrc1 = PI.ComputePGMRSrc1;
+ KernelDescriptor.compute_pgm_rsrc2 = PI.ComputePGMRSrc2;
+ KernelDescriptor.kernel_code_properties = getAmdhsaKernelCodeProperties(MF);
+
+ return KernelDescriptor;
+}
+
bool AMDGPUAsmPrinter::runOnMachineFunction(MachineFunction &MF) {
CurrentProgramInfo = SIProgramInfo();
Modified: llvm/trunk/lib/Target/AMDGPU/AMDGPUAsmPrinter.h
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/AMDGPU/AMDGPUAsmPrinter.h?rev=334519&r1=334518&r2=334519&view=diff
==============================================================================
--- llvm/trunk/lib/Target/AMDGPU/AMDGPUAsmPrinter.h (original)
+++ llvm/trunk/lib/Target/AMDGPU/AMDGPUAsmPrinter.h Tue Jun 12 11:02:46 2018
@@ -20,6 +20,7 @@
#include "MCTargetDesc/AMDGPUHSAMetadataStreamer.h"
#include "llvm/ADT/StringRef.h"
#include "llvm/CodeGen/AsmPrinter.h"
+#include "llvm/Support/AMDHSAKernelDescriptor.h"
#include <cstddef>
#include <cstdint>
#include <limits>
@@ -148,6 +149,13 @@ private:
uint64_t CodeSize,
const AMDGPUMachineFunction* MFI);
+ uint16_t getAmdhsaKernelCodeProperties(
+ const MachineFunction &MF) const;
+
+ amdhsa::kernel_descriptor_t getAmdhsaKernelDescriptor(
+ const MachineFunction &MF,
+ const SIProgramInfo &PI) const;
+
public:
explicit AMDGPUAsmPrinter(TargetMachine &TM,
std::unique_ptr<MCStreamer> Streamer);
@@ -180,6 +188,8 @@ public:
void EmitFunctionBodyStart() override;
+ void EmitFunctionBodyEnd() override;
+
void EmitFunctionEntryLabel() override;
void EmitBasicBlockStart(const MachineBasicBlock &MBB) const override;
Modified: llvm/trunk/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp?rev=334519&r1=334518&r2=334519&view=diff
==============================================================================
--- llvm/trunk/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp (original)
+++ llvm/trunk/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp Tue Jun 12 11:02:46 2018
@@ -196,6 +196,12 @@ bool AMDGPUTargetAsmStreamer::EmitPALMet
return true;
}
+void AMDGPUTargetAsmStreamer::EmitAmdhsaKernelDescriptor(
+ StringRef KernelName,
+ const amdhsa::kernel_descriptor_t &KernelDescriptor) {
+ // FIXME: not supported yet.
+}
+
//===----------------------------------------------------------------------===//
// AMDGPUTargetELFStreamer
//===----------------------------------------------------------------------===//
@@ -362,3 +368,57 @@ bool AMDGPUTargetELFStreamer::EmitPALMet
);
return true;
}
+
+void AMDGPUTargetELFStreamer::EmitAmdhsaKernelDescriptor(
+ StringRef KernelName,
+ const amdhsa::kernel_descriptor_t &KernelDescriptor) {
+ auto &Streamer = getStreamer();
+ auto &Context = Streamer.getContext();
+ auto &ObjectFileInfo = *Context.getObjectFileInfo();
+ auto &ReadOnlySection = *ObjectFileInfo.getReadOnlySection();
+
+ Streamer.PushSection();
+ Streamer.SwitchSection(&ReadOnlySection);
+
+ // CP microcode requires the kernel descriptor to be allocated on 64 byte
+ // alignment.
+ Streamer.EmitValueToAlignment(64, 0, 1, 0);
+ if (ReadOnlySection.getAlignment() < 64)
+ ReadOnlySection.setAlignment(64);
+
+ MCSymbolELF *KernelDescriptorSymbol = cast<MCSymbolELF>(
+ Context.getOrCreateSymbol(Twine(KernelName) + Twine(".kd")));
+ KernelDescriptorSymbol->setBinding(ELF::STB_GLOBAL);
+ KernelDescriptorSymbol->setType(ELF::STT_OBJECT);
+ KernelDescriptorSymbol->setSize(
+ MCConstantExpr::create(sizeof(KernelDescriptor), Context));
+
+ MCSymbolELF *KernelCodeSymbol = cast<MCSymbolELF>(
+ Context.getOrCreateSymbol(Twine(KernelName)));
+ KernelCodeSymbol->setBinding(ELF::STB_LOCAL);
+
+ Streamer.EmitLabel(KernelDescriptorSymbol);
+ Streamer.EmitBytes(StringRef(
+ (const char*)&(KernelDescriptor),
+ offsetof(amdhsa::kernel_descriptor_t, kernel_code_entry_byte_offset)));
+ // FIXME: Remove the use of VK_AMDGPU_REL64 in the expression below. The
+ // expression being created is:
+ // (start of kernel code) - (start of kernel descriptor)
+ // It implies R_AMDGPU_REL64, but ends up being R_AMDGPU_ABS64.
+ Streamer.EmitValue(MCBinaryExpr::createSub(
+ MCSymbolRefExpr::create(
+ KernelCodeSymbol, MCSymbolRefExpr::VK_AMDGPU_REL64, Context),
+ MCSymbolRefExpr::create(
+ KernelDescriptorSymbol, MCSymbolRefExpr::VK_None, Context),
+ Context),
+ sizeof(KernelDescriptor.kernel_code_entry_byte_offset));
+ Streamer.EmitBytes(StringRef(
+ (const char*)&(KernelDescriptor) +
+ offsetof(amdhsa::kernel_descriptor_t, kernel_code_entry_byte_offset) +
+ sizeof(KernelDescriptor.kernel_code_entry_byte_offset),
+ sizeof(KernelDescriptor) -
+ offsetof(amdhsa::kernel_descriptor_t, kernel_code_entry_byte_offset) -
+ sizeof(KernelDescriptor.kernel_code_entry_byte_offset)));
+
+ Streamer.PopSection();
+}
Modified: llvm/trunk/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.h
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.h?rev=334519&r1=334518&r2=334519&view=diff
==============================================================================
--- llvm/trunk/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.h (original)
+++ llvm/trunk/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.h Tue Jun 12 11:02:46 2018
@@ -14,6 +14,7 @@
#include "llvm/MC/MCStreamer.h"
#include "llvm/MC/MCSubtargetInfo.h"
#include "llvm/Support/AMDGPUMetadata.h"
+#include "llvm/Support/AMDHSAKernelDescriptor.h"
namespace llvm {
#include "AMDGPUPTNote.h"
@@ -62,6 +63,10 @@ public:
/// \returns True on success, false on failure.
virtual bool EmitPALMetadata(const AMDGPU::PALMD::Metadata &PALMetadata) = 0;
+
+ virtual void EmitAmdhsaKernelDescriptor(
+ StringRef KernelName,
+ const amdhsa::kernel_descriptor_t &KernelDescriptor) = 0;
};
class AMDGPUTargetAsmStreamer final : public AMDGPUTargetStreamer {
@@ -87,6 +92,10 @@ public:
/// \returns True on success, false on failure.
bool EmitPALMetadata(const AMDGPU::PALMD::Metadata &PALMetadata) override;
+
+ void EmitAmdhsaKernelDescriptor(
+ StringRef KernelName,
+ const amdhsa::kernel_descriptor_t &KernelDescriptor) override;
};
class AMDGPUTargetELFStreamer final : public AMDGPUTargetStreamer {
@@ -119,6 +128,10 @@ public:
/// \returns True on success, false on failure.
bool EmitPALMetadata(const AMDGPU::PALMD::Metadata &PALMetadata) override;
+
+ void EmitAmdhsaKernelDescriptor(
+ StringRef KernelName,
+ const amdhsa::kernel_descriptor_t &KernelDescriptor) override;
};
}
Modified: llvm/trunk/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp?rev=334519&r1=334518&r2=334519&view=diff
==============================================================================
--- llvm/trunk/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp (original)
+++ llvm/trunk/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp Tue Jun 12 11:02:46 2018
@@ -248,8 +248,8 @@ void streamIsaVersion(const MCSubtargetI
Stream.flush();
}
-bool hasCodeObjectV3(const FeatureBitset &Features) {
- return Features.test(FeatureCodeObjectV3);
+bool hasCodeObjectV3(const MCSubtargetInfo *STI) {
+ return STI->getFeatureBits().test(FeatureCodeObjectV3);
}
unsigned getWavefrontSize(const FeatureBitset &Features) {
Modified: llvm/trunk/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h?rev=334519&r1=334518&r2=334519&view=diff
==============================================================================
--- llvm/trunk/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h (original)
+++ llvm/trunk/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h Tue Jun 12 11:02:46 2018
@@ -59,9 +59,9 @@ IsaVersion getIsaVersion(const FeatureBi
/// Streams isa version string for given subtarget \p STI into \p Stream.
void streamIsaVersion(const MCSubtargetInfo *STI, raw_ostream &Stream);
-/// \returns True if given subtarget \p Features support code object version 3,
+/// \returns True if given subtarget \p STI supports code object version 3,
/// false otherwise.
-bool hasCodeObjectV3(const FeatureBitset &Features);
+bool hasCodeObjectV3(const MCSubtargetInfo *STI);
/// \returns Wavefront size for given subtarget \p Features.
unsigned getWavefrontSize(const FeatureBitset &Features);
Added: llvm/trunk/test/CodeGen/AMDGPU/code-object-v3.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AMDGPU/code-object-v3.ll?rev=334519&view=auto
==============================================================================
--- llvm/trunk/test/CodeGen/AMDGPU/code-object-v3.ll (added)
+++ llvm/trunk/test/CodeGen/AMDGPU/code-object-v3.ll Tue Jun 12 11:02:46 2018
@@ -0,0 +1,48 @@
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx803 -mattr=+code-object-v3 < %s | FileCheck --check-prefixes=ALL-ASM,OSABI-AMDHSA-ASM %s
+; RUN: llc -filetype=obj -mtriple=amdgcn-amd-amdhsa -mcpu=gfx803 -mattr=+code-object-v3 < %s | llvm-readobj -elf-output-style=GNU -notes -relocations -sections -symbols | FileCheck --check-prefixes=ALL-ELF,OSABI-AMDHSA-ELF %s
+
+; OSABI-AMDHSA-ASM-NOT: .hsa_code_object_version
+; OSABI-AMDHSA-ASM-NOT: .hsa_code_object_isa
+; OSABI-AMDHSA-ASM-NOT: .amd_amdgpu_isa
+; OSABI-AMDHSA-ASM-NOT: .amd_amdgpu_hsa_metadata
+; OSABI-AMDHSA-ASM-NOT: .amd_amdgpu_pal_metadata
+
+; OSABI-AMDHSA-ELF: Section Headers
+; OSABI-AMDHSA-ELF: .text PROGBITS {{[0-9]+}} {{[0-9]+}} {{[0-9a-f]+}} {{[0-9]+}} AX {{[0-9]+}} {{[0-9]+}} 256
+; OSABI-AMDHSA-ELF: .rodata PROGBITS {{[0-9]+}} {{[0-9]+}} {{[0-9a-f]+}} {{[0-9]+}} A {{[0-9]+}} {{[0-9]+}} 64
+
+; OSABI-AMDHSA-ELF: Relocation section '.rela.rodata' at offset
+; OSABI-AMDHSA-ELF: 0000000000000010 0000000300000005 R_AMDGPU_REL64 0000000000000000 .text + 10
+; OSABI-AMDHSA-ELF: 0000000000000050 0000000300000005 R_AMDGPU_REL64 0000000000000000 .text + 110
+
+; OSABI-AMDHSA-ELF: Symbol table '.symtab' contains {{[0-9]+}} entries
+; OSABI-AMDHSA-ELF: {{[0-9]+}}: 0000000000000000 {{[0-9]+}} FUNC LOCAL DEFAULT {{[0-9]+}} fadd
+; OSABI-AMDHSA-ELF: {{[0-9]+}}: 0000000000000100 {{[0-9]+}} FUNC LOCAL DEFAULT {{[0-9]+}} fsub
+; OSABI-AMDHSA-ELF: {{[0-9]+}}: 0000000000000000 64 OBJECT GLOBAL DEFAULT {{[0-9]+}} fadd.kd
+; OSABI-AMDHSA-ELF: {{[0-9]+}}: 0000000000000040 64 OBJECT GLOBAL DEFAULT {{[0-9]+}} fsub.kd
+
+; OSABI-AMDHSA-ELF-NOT: Displaying notes found
+
+define amdgpu_kernel void @fadd(
+ float addrspace(1)* %r,
+ float addrspace(1)* %a,
+ float addrspace(1)* %b) {
+entry:
+ %a.val = load float, float addrspace(1)* %a
+ %b.val = load float, float addrspace(1)* %b
+ %r.val = fadd float %a.val, %b.val
+ store float %r.val, float addrspace(1)* %r
+ ret void
+}
+
+define amdgpu_kernel void @fsub(
+ float addrspace(1)* %r,
+ float addrspace(1)* %a,
+ float addrspace(1)* %b) {
+entry:
+ %a.val = load float, float addrspace(1)* %a
+ %b.val = load float, float addrspace(1)* %b
+ %r.val = fsub float %a.val, %b.val
+ store float %r.val, float addrspace(1)* %r
+ ret void
+}
Modified: llvm/trunk/test/CodeGen/AMDGPU/elf-notes.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AMDGPU/elf-notes.ll?rev=334519&r1=334518&r2=334519&view=diff
==============================================================================
--- llvm/trunk/test/CodeGen/AMDGPU/elf-notes.ll (original)
+++ llvm/trunk/test/CodeGen/AMDGPU/elf-notes.ll Tue Jun 12 11:02:46 2018
@@ -1,13 +1,13 @@
-; RUN: llc -mtriple=amdgcn-amd-unknown -mcpu=gfx802 -mattr=+code-object-v3 < %s | FileCheck --check-prefix=GCN --check-prefix=OSABI-UNK --check-prefix=GFX802 %s
-; RUN: llc -mtriple=amdgcn-amd-unknown -mcpu=iceland -mattr=+code-object-v3 < %s | FileCheck --check-prefix=GCN --check-prefix=OSABI-UNK --check-prefix=GFX802 %s
-; RUN: llc -mtriple=amdgcn-amd-unknown -mcpu=gfx802 -mattr=+code-object-v3 -filetype=obj < %s | llvm-readobj -elf-output-style=GNU -notes | FileCheck --check-prefix=GCN --check-prefix=OSABI-UNK-ELF --check-prefix=GFX802 %s
-; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx802 -mattr=+code-object-v3 < %s | FileCheck --check-prefix=GCN --check-prefix=OSABI-HSA --check-prefix=GFX802 %s
-; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=iceland -mattr=+code-object-v3 < %s | FileCheck --check-prefix=GCN --check-prefix=OSABI-HSA --check-prefix=GFX802 %s
-; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx802 -mattr=+code-object-v3 -filetype=obj < %s | llvm-readobj -elf-output-style=GNU -notes | FileCheck --check-prefix=GCN --check-prefix=OSABI-HSA-ELF --check-prefix=GFX802 %s
-; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx802 -mattr=+code-object-v3 < %s | FileCheck --check-prefix=GCN --check-prefix=OSABI-PAL --check-prefix=GFX802 %s
-; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=iceland -mattr=+code-object-v3 < %s | FileCheck --check-prefix=GCN --check-prefix=OSABI-PAL --check-prefix=GFX802 %s
-; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx802 -mattr=+code-object-v3 -filetype=obj < %s | llvm-readobj -elf-output-style=GNU -notes | FileCheck --check-prefix=GCN --check-prefix=OSABI-PAL-ELF --check-prefix=GFX802 %s
-; RUN: llc -march=r600 -mattr=+code-object-v3 < %s | FileCheck --check-prefix=R600 %s
+; RUN: llc -mtriple=amdgcn-amd-unknown -mcpu=gfx802 < %s | FileCheck --check-prefix=GCN --check-prefix=OSABI-UNK --check-prefix=GFX802 %s
+; RUN: llc -mtriple=amdgcn-amd-unknown -mcpu=iceland < %s | FileCheck --check-prefix=GCN --check-prefix=OSABI-UNK --check-prefix=GFX802 %s
+; RUN: llc -mtriple=amdgcn-amd-unknown -mcpu=gfx802 -filetype=obj < %s | llvm-readobj -elf-output-style=GNU -notes | FileCheck --check-prefix=GCN --check-prefix=OSABI-UNK-ELF --check-prefix=GFX802 %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx802 < %s | FileCheck --check-prefix=GCN --check-prefix=OSABI-HSA --check-prefix=GFX802 %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=iceland < %s | FileCheck --check-prefix=GCN --check-prefix=OSABI-HSA --check-prefix=GFX802 %s
+; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx802 -filetype=obj < %s | llvm-readobj -elf-output-style=GNU -notes | FileCheck --check-prefix=GCN --check-prefix=OSABI-HSA-ELF --check-prefix=GFX802 %s
+; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx802 < %s | FileCheck --check-prefix=GCN --check-prefix=OSABI-PAL --check-prefix=GFX802 %s
+; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=iceland < %s | FileCheck --check-prefix=GCN --check-prefix=OSABI-PAL --check-prefix=GFX802 %s
+; RUN: llc -mtriple=amdgcn-amd-amdpal -mcpu=gfx802 -filetype=obj < %s | llvm-readobj -elf-output-style=GNU -notes | FileCheck --check-prefix=GCN --check-prefix=OSABI-PAL-ELF --check-prefix=GFX802 %s
+; RUN: llc -march=r600 < %s | FileCheck --check-prefix=R600 %s
; OSABI-UNK-NOT: .hsa_code_object_version
; OSABI-UNK-NOT: .hsa_code_object_isa
@@ -25,17 +25,17 @@
; OSABI-UNK-ELF-NOT: NT_AMD_AMDGPU_PAL_METADATA (PAL Metadata)
; OSABI-UNK-ELF-NOT: Unknown note type
-; OSABI-HSA-NOT: .hsa_code_object_version
-; OSABI-HSA-NOT: .hsa_code_object_isa
+; OSABI-HSA: .hsa_code_object_version
+; OSABI-HSA: .hsa_code_object_isa
; OSABI-HSA: .amd_amdgpu_isa "amdgcn-amd-amdhsa--gfx802"
; OSABI-HSA: .amd_amdgpu_hsa_metadata
; OSABI-HSA-NOT: .amd_amdgpu_pal_metadata
-; OSABI-HSA-ELF-NOT: Unknown note type
+; OSABI-HSA-ELF: Unknown note type (0x00000001)
+; OSABI-HSA-ELF: Unknown note type (0x00000003)
; OSABI-HSA-ELF: NT_AMD_AMDGPU_ISA (ISA Version)
; OSABI-HSA-ELF: ISA Version:
; OSABI-HSA-ELF: amdgcn-amd-amdhsa--gfx802
-; OSABI-HSA-ELF-NOT: Unknown note type
; OSABI-HSA-ELF: NT_AMD_AMDGPU_HSA_METADATA (HSA Metadata)
; OSABI-HSA-ELF: HSA Metadata:
; OSABI-HSA-ELF: ---
@@ -51,34 +51,29 @@
; OSABI-HSA-ELF: WavefrontSize: 64
; OSABI-HSA-ELF: NumSGPRs: 96
; OSABI-HSA-ELF: ...
-; OSABI-HSA-ELF-NOT: Unknown note type
; OSABI-HSA-ELF-NOT: NT_AMD_AMDGPU_PAL_METADATA (PAL Metadata)
-; OSABI-HSA-ELF-NOT: Unknown note type
; OSABI-PAL-NOT: .hsa_code_object_version
-; OSABI-PAL-NOT: .hsa_code_object_isa
+; OSABI-PAL: .hsa_code_object_isa
; OSABI-PAL: .amd_amdgpu_isa "amdgcn-amd-amdpal--gfx802"
; OSABI-PAL-NOT: .amd_amdgpu_hsa_metadata
; OSABI-PAL: .amd_amdgpu_pal_metadata
-; OSABI-PAL-ELF-NOT: Unknown note type
+; OSABI-PAL-ELF: Unknown note type (0x00000003)
; OSABI-PAL-ELF: NT_AMD_AMDGPU_ISA (ISA Version)
; OSABI-PAL-ELF: ISA Version:
; OSABI-PAL-ELF: amdgcn-amd-amdpal--gfx802
-; OSABI-PAL-ELF-NOT: Unknown note type
; OSABI-PAL-ELF-NOT: NT_AMD_AMDGPU_HSA_METADATA (HSA Metadata)
-; OSABI-PAL-ELF-NOT: Unknown note type
; OSABI-PAL-ELF: NT_AMD_AMDGPU_PAL_METADATA (PAL Metadata)
; OSABI-PAL-ELF: PAL Metadata:
; TODO: Following check line fails on mips:
; OSABI-PAL-ELF-XXX: 0x2e12,0xac02c0,0x2e13,0x80,0x1000001b,0x1,0x10000022,0x60,0x1000003e,0x0
-; OSABI-PAL-ELF-NOT: Unknown note type
; R600-NOT: .hsa_code_object_version
; R600-NOT: .hsa_code_object_isa
; R600-NOT: .amd_amdgpu_isa
; R600-NOT: .amd_amdgpu_hsa_metadata
-; R600-NOT: .amd_amdgpu_pal_metadatas
+; R600-NOT: .amd_amdgpu_pal_metadata
define amdgpu_kernel void @elf_notes() {
ret void
More information about the llvm-commits
mailing list