[libc-commits] [libc] [libc] Partially implement 'rand' for the GPU (PR #66167)

Joseph Huber via libc-commits libc-commits at lists.llvm.org
Mon Sep 25 08:22:02 PDT 2023


================
@@ -9,11 +9,33 @@
 #ifndef LLVM_LIBC_SRC_STDLIB_RAND_UTIL_H
 #define LLVM_LIBC_SRC_STDLIB_RAND_UTIL_H
 
+#include "src/__support/GPU/utils.h"
 #include "src/__support/macros/attributes.h"
 
 namespace __llvm_libc {
 
+#ifdef LIBC_TARGET_ARCH_IS_GPU
+// Implement thread local storage on the GPU using local memory. Each thread
+// gets its slot in the local memory array and is private to the group.
+// TODO: We need to implement the 'thread_local' keyword on the GPU. This is an
+// inefficient and incomplete stand-in until that is done.
+template <typename T> class ThreadLocal {
+private:
+  static constexpr long MAX_THREADS = 1024;
+  [[clang::loader_uninitialized]] static inline gpu::Local<T>
----------------
jhuber6 wrote:

`gpu::local` is an alias to `opencl_local` which is `AS(5)` in the backend. Global AS(5) memory doesn't work unless it's fully inlined into each calling kernel, which isn't something we can guarantee, especially because the seed is global state. 

https://github.com/llvm/llvm-project/pull/66167


More information about the libc-commits mailing list