[libc-commits] [libc] c167a25 - [libc] Fix lane-id utility function not using built-in (#84902)

via libc-commits libc-commits at lists.llvm.org
Tue Mar 12 08:40:39 PDT 2024


Author: Joseph Huber
Date: 2024-03-12T10:40:35-05:00
New Revision: c167a2588737613558bd7be4c9280603e89281ac

URL: https://github.com/llvm/llvm-project/commit/c167a2588737613558bd7be4c9280603e89281ac
DIFF: https://github.com/llvm/llvm-project/commit/c167a2588737613558bd7be4c9280603e89281ac.diff

LOG: [libc] Fix lane-id utility function not using built-in (#84902)

Summary:
Previously we got the lane-id from taking the global thread ID and
taking off the bottom 5 bits. This works but is inefficient compared to
the NVPTX intrinsic simply dedicated to get this value.

Added: 
    

Modified: 
    libc/src/__support/GPU/nvptx/utils.h

Removed: 
    


################################################################################
diff  --git a/libc/src/__support/GPU/nvptx/utils.h b/libc/src/__support/GPU/nvptx/utils.h
index a92c8847b6ecdf..fe9da4e8e6cb01 100644
--- a/libc/src/__support/GPU/nvptx/utils.h
+++ b/libc/src/__support/GPU/nvptx/utils.h
@@ -97,7 +97,7 @@ LIBC_INLINE uint32_t get_lane_size() { return 32; }
 
 /// Returns the id of the thread inside of a CUDA warp executing together.
 [[clang::convergent]] LIBC_INLINE uint32_t get_lane_id() {
-  return get_thread_id() & (get_lane_size() - 1);
+  return __nvvm_read_ptx_sreg_laneid();
 }
 
 /// Returns the bit-mask of active threads in the current warp.


        


More information about the libc-commits mailing list