[libc-commits] [libc] [libc] Partially implement 'rand' for the GPU (PR #66167)

Mon Sep 25 09:02:11 PDT 2023

================
@@ -9,11 +9,33 @@
 #ifndef LLVM_LIBC_SRC_STDLIB_RAND_UTIL_H
 #define LLVM_LIBC_SRC_STDLIB_RAND_UTIL_H
 
+#include "src/__support/GPU/utils.h"
 #include "src/__support/macros/attributes.h"
 
 namespace __llvm_libc {
 
+#ifdef LIBC_TARGET_ARCH_IS_GPU
+// Implement thread local storage on the GPU using local memory. Each thread
+// gets its slot in the local memory array and is private to the group.
+// TODO: We need to implement the 'thread_local' keyword on the GPU. This is an
+// inefficient and incomplete stand-in until that is done.
+template <typename T> class ThreadLocal {
+private:
+  static constexpr long MAX_THREADS = 1024;
+  [[clang::loader_uninitialized]] static inline gpu::Local<T>
----------------
JonChesterfield wrote:

I want to allocate thread local variables on the kernel stack, using basically the same compile time allocation scheme as LDS. I'm getting push back on that because applications run out of stack space already and using more of it is considered hazardous.

AS(5) globals should be raising an error in the back end. They definitely aren't correctly lowered.

https://github.com/llvm/llvm-project/pull/66167