[libc-commits] [libc] [libc] Fix lane-id utility function not using built-in (PR #84902)
Joseph Huber via libc-commits
libc-commits at lists.llvm.org
Tue Mar 12 05:24:35 PDT 2024
https://github.com/jhuber6 created https://github.com/llvm/llvm-project/pull/84902
Summary:
Previously we got the lane-id from taking the global thread ID and
taking off the bottom 5 bits. This works but is inefficient compared to
the NVPTX intrinsic simply dedicated to get this value.
>From 8f707bf7e28d6387e610defb146ae70413172b31 Mon Sep 17 00:00:00 2001
From: Joseph Huber <huberjn at outlook.com>
Date: Tue, 12 Mar 2024 07:21:14 -0500
Subject: [PATCH] [libc] Fix lane-id utility function not using built-in
Summary:
Previously we got the lane-id from taking the global thread ID and
taking off the bottom 5 bits. This works but is inefficient compared to
the NVPTX intrinsic simply dedicated to get this value.
---
libc/src/__support/GPU/nvptx/utils.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/libc/src/__support/GPU/nvptx/utils.h b/libc/src/__support/GPU/nvptx/utils.h
index a92c8847b6ecdf..fe9da4e8e6cb01 100644
--- a/libc/src/__support/GPU/nvptx/utils.h
+++ b/libc/src/__support/GPU/nvptx/utils.h
@@ -97,7 +97,7 @@ LIBC_INLINE uint32_t get_lane_size() { return 32; }
/// Returns the id of the thread inside of a CUDA warp executing together.
[[clang::convergent]] LIBC_INLINE uint32_t get_lane_id() {
- return get_thread_id() & (get_lane_size() - 1);
+ return __nvvm_read_ptx_sreg_laneid();
}
/// Returns the bit-mask of active threads in the current warp.
More information about the libc-commits
mailing list