[clang] [llvm] [SPIRV] GPU intrinsics (PR #131190)
via llvm-commits
llvm-commits at lists.llvm.org
Thu Mar 13 11:52:48 PDT 2025
llvmbot wrote:
<!--LLVM PR SUMMARY COMMENT-->
@llvm/pr-subscribers-backend-spir-v
Author: Jon Chesterfield (JonChesterfield)
<details>
<summary>Changes</summary>
Introduce __builtin_gpu builtins to clang and corresponding llvm.gpu intrinsics in llvm for abstracting over minor differences between GPU architectures, and use those to implement a gpuintrin.h instantiation to support compiling llvm libc to spirv64--.
Motivated by discussion at https://discourse.llvm.org/t/rfc-spirv-ir-as-a-vendor-agnostic-gpu-representation/85115 and RFC for this specifically as <tbd>.
These are not named llvm.spirv because there are no spirv specific semantics involved. They're deliberately the same small abstraction over targets used by llvm libc already. Essentially this patch allows us to postpone choosing a target architecture for libc until JIT time.
There is some refactoring to be done if this lands - moving some work out of CGBuiltin.cpp, simplifying gpuintrin.h, adjusting openmp's codegen to use these instead of devicertl magic functions.
---
Patch is 100.79 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/131190.diff
18 Files Affected:
- (modified) clang/include/clang/Basic/Builtins.td (+33)
- (modified) clang/lib/Headers/amdgpuintrin.h (+1-1)
- (modified) clang/lib/Headers/gpuintrin.h (+2)
- (added) clang/lib/Headers/spirvintrin.h (+177)
- (added) clang/test/CodeGen/amdgpu-grid-builtins.c (+158)
- (added) clang/test/CodeGen/gpu_builtins.c (+647)
- (modified) clang/test/Headers/gpuintrin.c (+223)
- (modified) llvm/include/llvm/IR/Intrinsics.td (+63)
- (modified) llvm/include/llvm/IR/IntrinsicsAMDGPU.td (+2)
- (modified) llvm/include/llvm/InitializePasses.h (+1)
- (modified) llvm/include/llvm/Transforms/Scalar.h (+6)
- (added) llvm/include/llvm/Transforms/Scalar/LowerGPUIntrinsic.h (+26)
- (modified) llvm/lib/Passes/PassBuilder.cpp (+1)
- (modified) llvm/lib/Passes/PassRegistry.def (+1)
- (modified) llvm/lib/Transforms/Scalar/CMakeLists.txt (+1)
- (added) llvm/lib/Transforms/Scalar/LowerGPUIntrinsic.cpp (+501)
- (modified) llvm/lib/Transforms/Scalar/Scalar.cpp (+1)
- (added) llvm/test/CodeGen/SPIRV/gpu_intrinsics.ll (+427)
``````````diff
diff --git a/clang/include/clang/Basic/Builtins.td b/clang/include/clang/Basic/Builtins.td
index 2fbdfaea57ccd..fb87b5c74c0f8 100644
--- a/clang/include/clang/Basic/Builtins.td
+++ b/clang/include/clang/Basic/Builtins.td
@@ -4770,6 +4770,39 @@ def GetDeviceSideMangledName : LangBuiltin<"CUDA_LANG"> {
let Prototype = "char const*(...)";
}
+// GPU intrinsics
+class GPUBuiltin<string prototype> : Builtin {
+ let Spellings = ["__builtin_" # NAME];
+ let Prototype = prototype;
+ let Attributes = [NoThrow];
+}
+
+multiclass GPUGridBuiltin<string prototype> {
+ def _x : GPUBuiltin<prototype>;
+ def _y : GPUBuiltin<prototype>;
+ def _z : GPUBuiltin<prototype>;
+}
+
+defm gpu_num_blocks : GPUGridBuiltin<"uint32_t()">;
+defm gpu_block_id : GPUGridBuiltin<"uint32_t()">;
+defm gpu_num_threads : GPUGridBuiltin<"uint32_t()">;
+defm gpu_thread_id : GPUGridBuiltin<"uint32_t()">;
+
+def gpu_num_lanes : GPUBuiltin<"uint32_t()">;
+def gpu_lane_id : GPUBuiltin<"uint32_t()">;
+def gpu_lane_mask : GPUBuiltin<"uint64_t()">;
+
+def gpu_read_first_lane_u32 : GPUBuiltin<"uint32_t(uint64_t, uint32_t)">;
+def gpu_shuffle_idx_u32 : GPUBuiltin<"uint32_t(uint64_t, uint32_t, uint32_t, uint32_t)">;
+
+def gpu_ballot : GPUBuiltin<"uint64_t(uint64_t, bool)">;
+
+// Todo, why can't I mark these convergent?
+def gpu_sync_threads : GPUBuiltin<"void()">;
+def gpu_sync_lane : GPUBuiltin<"void(uint64_t)">;
+def gpu_exit : GPUBuiltin<"void()">;
+def gpu_thread_suspend : GPUBuiltin<"void()">;
+
// HLSL
def HLSLAddUint64: LangBuiltin<"HLSL_LANG"> {
let Spellings = ["__builtin_hlsl_adduint64"];
diff --git a/clang/lib/Headers/amdgpuintrin.h b/clang/lib/Headers/amdgpuintrin.h
index f7fb8e2814180..817cfeec896c4 100644
--- a/clang/lib/Headers/amdgpuintrin.h
+++ b/clang/lib/Headers/amdgpuintrin.h
@@ -1,4 +1,4 @@
-//===-- amdgpuintrin.h - AMDPGU intrinsic functions -----------------------===//
+//===-- amdgpuintrin.h - AMDGPU intrinsic functions -----------------------===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
diff --git a/clang/lib/Headers/gpuintrin.h b/clang/lib/Headers/gpuintrin.h
index 0fb3916acac61..a3ce535188a48 100644
--- a/clang/lib/Headers/gpuintrin.h
+++ b/clang/lib/Headers/gpuintrin.h
@@ -60,6 +60,8 @@ _Pragma("omp end declare target");
#include <nvptxintrin.h>
#elif defined(__AMDGPU__)
#include <amdgpuintrin.h>
+#elif defined(__SPIRV64__)
+#include <spirvintrin.h>
#elif !defined(_OPENMP)
#error "This header is only meant to be used on GPU architectures."
#endif
diff --git a/clang/lib/Headers/spirvintrin.h b/clang/lib/Headers/spirvintrin.h
new file mode 100644
index 0000000000000..a5129d3577edf
--- /dev/null
+++ b/clang/lib/Headers/spirvintrin.h
@@ -0,0 +1,177 @@
+//===-- spirvintrin.h - SPIRV intrinsic functions ------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef __SPIRVINTRIN_H
+#define __SPIRVINTRIN_H
+
+#ifndef __SPIRV64__
+// 32 bit SPIRV is currently a stretch goal
+#error "This file is intended for SPIRV64 targets or offloading to SPIRV64"
+#endif
+
+#ifndef __GPUINTRIN_H
+#error "Never use <spirvintrin.h> directly; include <gpuintrin.h> instead"
+#endif
+
+// This is the skeleton of the spirv implementation for gpuintrin
+// Address spaces and kernel attribute are not yet implemented
+// The target-specific functions are declarations waiting for clang support
+
+#if defined(_OPENMP)
+#error "Openmp is not yet available on spirv though gpuintrin header"
+#endif
+
+// Type aliases to the address spaces used by the SPIRV backend.
+#define __gpu_private
+#define __gpu_constant
+#define __gpu_local
+#define __gpu_global
+#define __gpu_generic
+
+// Attribute to declare a function as a kernel.
+#define __gpu_kernel
+
+// Returns the number of workgroups in the 'x' dimension of the grid.
+_DEFAULT_FN_ATTRS static __inline__ uint32_t __gpu_num_blocks_x(void) {
+ return __builtin_gpu_num_blocks_x();
+}
+
+// Returns the number of workgroups in the 'y' dimension of the grid.
+_DEFAULT_FN_ATTRS static __inline__ uint32_t __gpu_num_blocks_y(void) {
+ return __builtin_gpu_num_blocks_y();
+}
+
+// Returns the number of workgroups in the 'z' dimension of the grid.
+_DEFAULT_FN_ATTRS static __inline__ uint32_t __gpu_num_blocks_z(void) {
+ return __builtin_gpu_num_blocks_z();
+}
+
+// Returns the 'x' dimension of the current AMD workgroup's id.
+_DEFAULT_FN_ATTRS static __inline__ uint32_t __gpu_block_id_x(void) {
+ return __builtin_gpu_block_id_x();
+}
+
+// Returns the 'y' dimension of the current AMD workgroup's id.
+_DEFAULT_FN_ATTRS static __inline__ uint32_t __gpu_block_id_y(void) {
+ return __builtin_gpu_block_id_y();
+}
+
+// Returns the 'z' dimension of the current AMD workgroup's id.
+_DEFAULT_FN_ATTRS static __inline__ uint32_t __gpu_block_id_z(void) {
+ return __builtin_gpu_block_id_z();
+}
+
+// Returns the number of workitems in the 'x' dimension.
+_DEFAULT_FN_ATTRS static __inline__ uint32_t __gpu_num_threads_x(void) {
+ return __builtin_gpu_num_threads_x();
+}
+
+// Returns the number of workitems in the 'y' dimension.
+_DEFAULT_FN_ATTRS static __inline__ uint32_t __gpu_num_threads_y(void) {
+ return __builtin_gpu_num_threads_y();
+}
+
+// Returns the number of workitems in the 'z' dimension.
+_DEFAULT_FN_ATTRS static __inline__ uint32_t __gpu_num_threads_z(void) {
+ return __builtin_gpu_num_threads_z();
+}
+
+// Returns the 'x' dimension id of the workitem in the current workgroup.
+_DEFAULT_FN_ATTRS static __inline__ uint32_t __gpu_thread_id_x(void) {
+ return __builtin_gpu_thread_id_x();
+}
+
+// Returns the 'y' dimension id of the workitem in the current workgroup.
+_DEFAULT_FN_ATTRS static __inline__ uint32_t __gpu_thread_id_y(void) {
+ return __builtin_gpu_thread_id_y();
+}
+
+// Returns the 'z' dimension id of the workitem in the current workgroup.
+_DEFAULT_FN_ATTRS static __inline__ uint32_t __gpu_thread_id_z(void) {
+ return __builtin_gpu_thread_id_z();
+}
+
+// Returns the size of the wave.
+_DEFAULT_FN_ATTRS static __inline__ uint32_t __gpu_num_lanes(void) {
+ return __builtin_gpu_num_lanes();
+}
+
+// Returns the id of the thread inside of a wave executing together.
+_DEFAULT_FN_ATTRS static __inline__ uint32_t __gpu_lane_id(void) {
+ return __builtin_gpu_lane_id();
+}
+
+// Returns the bit-mask of active threads in the current wave.
+_DEFAULT_FN_ATTRS static __inline__ uint64_t __gpu_lane_mask(void) {
+ return __builtin_gpu_lane_mask();
+}
+
+// Copies the value from the first active thread in the wave to the rest.
+_DEFAULT_FN_ATTRS static __inline__ uint32_t
+__gpu_read_first_lane_u32(uint64_t __lane_mask, uint32_t __x) {
+ return __builtin_gpu_read_first_lane_u32(__lane_mask, __x);
+}
+
+// Returns a bitmask of threads in the current lane for which \p x is true.
+_DEFAULT_FN_ATTRS static __inline__ uint64_t __gpu_ballot(uint64_t __lane_mask,
+ bool __x) {
+ return __builtin_gpu_ballot(__lane_mask, __x);
+}
+
+// Waits for all the threads in the block to converge and issues a fence.
+_DEFAULT_FN_ATTRS static __inline__ void __gpu_sync_threads(void) {
+ return __builtin_gpu_sync_threads();
+}
+
+// Wait for all threads in the wave to converge
+_DEFAULT_FN_ATTRS static __inline__ void __gpu_sync_lane(uint64_t __lane_mask) {
+ return __builtin_gpu_sync_lane(__lane_mask);
+}
+
+// Shuffles the the lanes inside the wave according to the given index.
+_DEFAULT_FN_ATTRS static __inline__ uint32_t
+__gpu_shuffle_idx_u32(uint64_t __lane_mask, uint32_t __idx, uint32_t __x,
+ uint32_t __width) {
+ return __builtin_gpu_shuffle_idx_u32(__lane_mask, __idx, __x, __width);
+}
+
+// Returns a bitmask marking all lanes that have the same value of __x.
+_DEFAULT_FN_ATTRS static __inline__ uint64_t
+__gpu_match_any_u32(uint64_t __lane_mask, uint32_t __x) {
+ return __gpu_match_any_u32_impl(__lane_mask, __x);
+}
+
+// Returns a bitmask marking all lanes that have the same value of __x.
+_DEFAULT_FN_ATTRS static __inline__ uint64_t
+__gpu_match_any_u64(uint64_t __lane_mask, uint64_t __x) {
+ return __gpu_match_any_u64_impl(__lane_mask, __x);
+}
+
+// Returns the current lane mask if every lane contains __x.
+_DEFAULT_FN_ATTRS static __inline__ uint64_t
+__gpu_match_all_u32(uint64_t __lane_mask, uint32_t __x) {
+ return __gpu_match_all_u32_impl(__lane_mask, __x);
+}
+
+// Returns the current lane mask if every lane contains __x.
+_DEFAULT_FN_ATTRS static __inline__ uint64_t
+__gpu_match_all_u64(uint64_t __lane_mask, uint64_t __x) {
+ return __gpu_match_all_u64_impl(__lane_mask, __x);
+}
+
+// Terminates execution of the associated wave.
+_DEFAULT_FN_ATTRS [[noreturn]] static __inline__ void __gpu_exit(void) {
+ return __builtin_gpu_exit();
+}
+
+// Suspend the thread briefly to assist the scheduler during busy loops.
+_DEFAULT_FN_ATTRS static __inline__ void __gpu_thread_suspend(void) {
+ return __builtin_gpu_thread_suspend();
+}
+
+#endif // __SPIRVINTRIN_H
diff --git a/clang/test/CodeGen/amdgpu-grid-builtins.c b/clang/test/CodeGen/amdgpu-grid-builtins.c
new file mode 100644
index 0000000000000..2104da2dc3cbc
--- /dev/null
+++ b/clang/test/CodeGen/amdgpu-grid-builtins.c
@@ -0,0 +1,158 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 5
+// REQUIRES: amdgpu-registered-target
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -emit-llvm -O1 %s -o - | FileCheck %s
+
+#include <stdint.h>
+
+// CHECK-LABEL: define dso_local noundef i32 @workgroup_id_x(
+// CHECK-SAME: ) local_unnamed_addr #[[ATTR0:[0-9]+]] {
+// CHECK-NEXT: [[ENTRY:.*:]]
+// CHECK-NEXT: [[TMP0:%.*]] = tail call i32 @llvm.amdgcn.workgroup.id.x()
+// CHECK-NEXT: ret i32 [[TMP0]]
+//
+uint32_t workgroup_id_x(void)
+{
+ return __builtin_amdgcn_workgroup_id_x();
+}
+
+// CHECK-LABEL: define dso_local noundef i32 @workgroup_id_y(
+// CHECK-SAME: ) local_unnamed_addr #[[ATTR2:[0-9]+]] {
+// CHECK-NEXT: [[ENTRY:.*:]]
+// CHECK-NEXT: [[TMP0:%.*]] = tail call i32 @llvm.amdgcn.workgroup.id.y()
+// CHECK-NEXT: ret i32 [[TMP0]]
+//
+uint32_t workgroup_id_y(void)
+{
+ return __builtin_amdgcn_workgroup_id_y();
+}
+
+// CHECK-LABEL: define dso_local noundef i32 @workgroup_id_z(
+// CHECK-SAME: ) local_unnamed_addr #[[ATTR3:[0-9]+]] {
+// CHECK-NEXT: [[ENTRY:.*:]]
+// CHECK-NEXT: [[TMP0:%.*]] = tail call i32 @llvm.amdgcn.workgroup.id.z()
+// CHECK-NEXT: ret i32 [[TMP0]]
+//
+uint32_t workgroup_id_z(void)
+{
+ return __builtin_amdgcn_workgroup_id_z();
+}
+
+// CHECK-LABEL: define dso_local noundef range(i32 0, 1024) i32 @workitem_id_x(
+// CHECK-SAME: ) local_unnamed_addr #[[ATTR4:[0-9]+]] {
+// CHECK-NEXT: [[ENTRY:.*:]]
+// CHECK-NEXT: [[TMP0:%.*]] = tail call noundef range(i32 0, 1024) i32 @llvm.amdgcn.workitem.id.x()
+// CHECK-NEXT: ret i32 [[TMP0]]
+//
+uint32_t workitem_id_x(void)
+{
+ return __builtin_amdgcn_workitem_id_x();
+}
+
+// CHECK-LABEL: define dso_local noundef range(i32 0, 1024) i32 @workitem_id_y(
+// CHECK-SAME: ) local_unnamed_addr #[[ATTR5:[0-9]+]] {
+// CHECK-NEXT: [[ENTRY:.*:]]
+// CHECK-NEXT: [[TMP0:%.*]] = tail call noundef range(i32 0, 1024) i32 @llvm.amdgcn.workitem.id.y()
+// CHECK-NEXT: ret i32 [[TMP0]]
+//
+uint32_t workitem_id_y(void)
+{
+ return __builtin_amdgcn_workitem_id_y();
+}
+
+// CHECK-LABEL: define dso_local noundef range(i32 0, 1024) i32 @workitem_id_z(
+// CHECK-SAME: ) local_unnamed_addr #[[ATTR6:[0-9]+]] {
+// CHECK-NEXT: [[ENTRY:.*:]]
+// CHECK-NEXT: [[TMP0:%.*]] = tail call noundef range(i32 0, 1024) i32 @llvm.amdgcn.workitem.id.z()
+// CHECK-NEXT: ret i32 [[TMP0]]
+//
+uint32_t workitem_id_z(void)
+{
+ return __builtin_amdgcn_workitem_id_z();
+}
+
+// CHECK-LABEL: define dso_local range(i32 1, 1025) i32 @workgroup_size_x(
+// CHECK-SAME: ) local_unnamed_addr #[[ATTR7:[0-9]+]] {
+// CHECK-NEXT: [[ENTRY:.*:]]
+// CHECK-NEXT: [[TMP0:%.*]] = tail call align 8 dereferenceable(256) ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
+// CHECK-NEXT: [[TMP1:%.*]] = getelementptr inbounds nuw i8, ptr addrspace(4) [[TMP0]], i64 12
+// CHECK-NEXT: [[TMP2:%.*]] = load i16, ptr addrspace(4) [[TMP1]], align 4, !range [[RNG3:![0-9]+]], !invariant.load [[META4:![0-9]+]], !noundef [[META4]]
+// CHECK-NEXT: [[CONV:%.*]] = zext nneg i16 [[TMP2]] to i32
+// CHECK-NEXT: ret i32 [[CONV]]
+//
+uint32_t workgroup_size_x(void)
+{
+ return __builtin_amdgcn_workgroup_size_x();
+}
+
+// CHECK-LABEL: define dso_local range(i32 1, 1025) i32 @workgroup_size_y(
+// CHECK-SAME: ) local_unnamed_addr #[[ATTR7]] {
+// CHECK-NEXT: [[ENTRY:.*:]]
+// CHECK-NEXT: [[TMP0:%.*]] = tail call align 8 dereferenceable(256) ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
+// CHECK-NEXT: [[TMP1:%.*]] = getelementptr inbounds nuw i8, ptr addrspace(4) [[TMP0]], i64 14
+// CHECK-NEXT: [[TMP2:%.*]] = load i16, ptr addrspace(4) [[TMP1]], align 2, !range [[RNG3]], !invariant.load [[META4]], !noundef [[META4]]
+// CHECK-NEXT: [[CONV:%.*]] = zext nneg i16 [[TMP2]] to i32
+// CHECK-NEXT: ret i32 [[CONV]]
+//
+uint32_t workgroup_size_y(void)
+{
+ return __builtin_amdgcn_workgroup_size_y();
+}
+
+// CHECK-LABEL: define dso_local range(i32 1, 1025) i32 @workgroup_size_z(
+// CHECK-SAME: ) local_unnamed_addr #[[ATTR7]] {
+// CHECK-NEXT: [[ENTRY:.*:]]
+// CHECK-NEXT: [[TMP0:%.*]] = tail call align 8 dereferenceable(256) ptr addrspace(4) @llvm.amdgcn.implicitarg.ptr()
+// CHECK-NEXT: [[TMP1:%.*]] = getelementptr inbounds nuw i8, ptr addrspace(4) [[TMP0]], i64 16
+// CHECK-NEXT: [[TMP2:%.*]] = load i16, ptr addrspace(4) [[TMP1]], align 8, !range [[RNG3]], !invariant.load [[META4]], !noundef [[META4]]
+// CHECK-NEXT: [[CONV:%.*]] = zext nneg i16 [[TMP2]] to i32
+// CHECK-NEXT: ret i32 [[CONV]]
+//
+uint32_t workgroup_size_z(void)
+{
+ return __builtin_amdgcn_workgroup_size_z();
+}
+
+// CHECK-LABEL: define dso_local range(i32 1, 0) i32 @grid_size_x(
+// CHECK-SAME: ) local_unnamed_addr #[[ATTR8:[0-9]+]] {
+// CHECK-NEXT: [[ENTRY:.*:]]
+// CHECK-NEXT: [[TMP0:%.*]] = tail call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
+// CHECK-NEXT: [[TMP1:%.*]] = getelementptr inbounds nuw i8, ptr addrspace(4) [[TMP0]], i64 12
+// CHECK-NEXT: [[TMP2:%.*]] = load i32, ptr addrspace(4) [[TMP1]], align 4, !range [[RNG5:![0-9]+]], !invariant.load [[META4]]
+// CHECK-NEXT: ret i32 [[TMP2]]
+//
+uint32_t grid_size_x(void)
+{
+ return __builtin_amdgcn_grid_size_x();
+}
+
+// CHECK-LABEL: define dso_local range(i32 1, 0) i32 @grid_size_y(
+// CHECK-SAME: ) local_unnamed_addr #[[ATTR8]] {
+// CHECK-NEXT: [[ENTRY:.*:]]
+// CHECK-NEXT: [[TMP0:%.*]] = tail call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
+// CHECK-NEXT: [[TMP1:%.*]] = getelementptr inbounds nuw i8, ptr addrspace(4) [[TMP0]], i64 16
+// CHECK-NEXT: [[TMP2:%.*]] = load i32, ptr addrspace(4) [[TMP1]], align 4, !range [[RNG5]], !invariant.load [[META4]]
+// CHECK-NEXT: ret i32 [[TMP2]]
+//
+uint32_t grid_size_y(void)
+{
+ return __builtin_amdgcn_grid_size_y();
+}
+
+// CHECK-LABEL: define dso_local range(i32 1, 0) i32 @grid_size_z(
+// CHECK-SAME: ) local_unnamed_addr #[[ATTR8]] {
+// CHECK-NEXT: [[ENTRY:.*:]]
+// CHECK-NEXT: [[TMP0:%.*]] = tail call align 4 dereferenceable(64) ptr addrspace(4) @llvm.amdgcn.dispatch.ptr()
+// CHECK-NEXT: [[TMP1:%.*]] = getelementptr inbounds nuw i8, ptr addrspace(4) [[TMP0]], i64 20
+// CHECK-NEXT: [[TMP2:%.*]] = load i32, ptr addrspace(4) [[TMP1]], align 4, !range [[RNG5]], !invariant.load [[META4]]
+// CHECK-NEXT: ret i32 [[TMP2]]
+//
+uint32_t grid_size_z(void)
+{
+ return __builtin_amdgcn_grid_size_z();
+}
+
+//.
+// CHECK: [[RNG3]] = !{i16 1, i16 1025}
+// CHECK: [[META4]] = !{}
+// CHECK: [[RNG5]] = !{i32 1, i32 0}
+//.
diff --git a/clang/test/CodeGen/gpu_builtins.c b/clang/test/CodeGen/gpu_builtins.c
new file mode 100644
index 0000000000000..8231b0952fa5e
--- /dev/null
+++ b/clang/test/CodeGen/gpu_builtins.c
@@ -0,0 +1,647 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --check-attributes
+// RUN: %clang_cc1 -O1 -triple spirv64 %s -emit-llvm -o - | FileCheck %s --check-prefix=SPIRV64
+// RUN: %clang_cc1 -O1 -triple spirv64-amd-amdhsa %s -emit-llvm -o - | FileCheck %s --check-prefix=AMDHSA
+// RUN: %clang_cc1 -O1 -triple nvptx64 -emit-llvm %s -o - | FileCheck %s --check-prefix=NVPTX
+// RUN: %clang_cc1 -O1 -triple amdgcn -emit-llvm %s -o - | FileCheck %s --check-prefix=AMDGCN
+
+#include <stdint.h>
+
+// SPIRV64: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
+// SPIRV64-LABEL: @num_blocks_x(
+// SPIRV64-NEXT: entry:
+// SPIRV64-NEXT: [[TMP0:%.*]] = tail call i32 @llvm.gpu.num.blocks.x()
+// SPIRV64-NEXT: ret i32 [[TMP0]]
+//
+// AMDHSA: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
+// AMDHSA-LABEL: @num_blocks_x(
+// AMDHSA-NEXT: entry:
+// AMDHSA-NEXT: [[TMP0:%.*]] = tail call addrspace(4) i32 @llvm.gpu.num.blocks.x()
+// AMDHSA-NEXT: ret i32 [[TMP0]]
+//
+// NVPTX: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
+// NVPTX-LABEL: @num_blocks_x(
+// NVPTX-NEXT: entry:
+// NVPTX-NEXT: [[TMP0:%.*]] = tail call i32 @llvm.gpu.num.blocks.x()
+// NVPTX-NEXT: ret i32 [[TMP0]]
+//
+// AMDGCN: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
+// AMDGCN-LABEL: @num_blocks_x(
+// AMDGCN-NEXT: entry:
+// AMDGCN-NEXT: [[TMP0:%.*]] = tail call i32 @llvm.gpu.num.blocks.x()
+// AMDGCN-NEXT: ret i32 [[TMP0]]
+//
+uint32_t num_blocks_x(void) {
+ return __builtin_gpu_num_blocks_x();
+}
+
+
+// SPIRV64: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
+// SPIRV64-LABEL: @num_blocks_y(
+// SPIRV64-NEXT: entry:
+// SPIRV64-NEXT: [[TMP0:%.*]] = tail call i32 @llvm.gpu.num.blocks.y()
+// SPIRV64-NEXT: ret i32 [[TMP0]]
+//
+// AMDHSA: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
+// AMDHSA-LABEL: @num_blocks_y(
+// AMDHSA-NEXT: entry:
+// AMDHSA-NEXT: [[TMP0:%.*]] = tail call addrspace(4) i32 @llvm.gpu.num.blocks.y()
+// AMDHSA-NEXT: ret i32 [[TMP0]]
+//
+// NVPTX: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
+// NVPTX-LABEL: @num_blocks_y(
+// NVPTX-NEXT: entry:
+// NVPTX-NEXT: [[TMP0:%.*]] = tail call i32 @llvm.gpu.num.blocks.y()
+// NVPTX-NEXT: ret i32 [[TMP0]]
+//
+// AMDGCN: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
+// AMDGCN-LABEL: @num_blocks_y(
+// AMDGCN-NEXT: entry:
+// AMDGCN-NEXT: [[TMP0:%.*]] = tail call i32 @llvm.gpu.num.blocks.y()
+// AMDGCN-NEXT: ret i32 [[TMP0]]
+//
+uint32_t num_blocks_y(void) {
+ return __builtin_gpu_num_blocks_y();
+}
+
+
+// SPIRV64: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
+// SPIRV64-LABEL: @num_blocks_z(
+// SPIRV64-NEXT: entry:
+// SPIRV64-NEXT: [[TMP0:%.*]] = tail call i32 @llvm.gpu.num.blocks.z()
+// SPIRV64-NEXT: ret i32 [[TMP0]]
+//
+// AMDHSA: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
+// AMDHSA-LABEL: @num_blocks_z(
+// AMDHSA-NEXT: entry:
+// AMDHSA-NEXT: [[TMP0:%.*]] = tail call addrspace(4) i32 @llvm.gpu.num.blocks.z()
+// AMDHSA-NEXT: ret i32 [[TMP0]]
+//
+// NVPTX: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
+// NVPTX-LABEL: @num_blocks_z(
+// NVPTX-NEXT: entry:
+// NVPTX-NEXT: [[TMP0:%.*]] = tail call i32 @llvm.gpu.num.blocks.z()
+// NVPTX-NEXT: ret i32 [[TMP0]]
+//
+// AMDGCN: Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(none)
+// AMDGCN-LABEL: @num_blocks_z(
+// AMDGCN-NEXT: entry:
+// AMDGCN-NEXT: [[TMP0:%.*]] = tail call i32 @llvm.gpu.num.blocks.z()
+// AMDGCN-NEXT: ret i32 [[TMP0]]
+//
+uint32_t num_blocks_z(void) {
+ return __builtin_gpu_num_blocks_z();
+}
+
+
+// SPIRV64: Function Attrs: mustprogress nofree norecurse nosync nounwi...
[truncated]
``````````
</details>
https://github.com/llvm/llvm-project/pull/131190
More information about the llvm-commits
mailing list