[clang] 4ca8ea8 - [Clang] Fix GPU intrinsic helpers incorrectly sign extending (#129560)
via cfe-commits
cfe-commits at lists.llvm.org
Mon Mar 3 12:26:46 PST 2025
Author: Joseph Huber
Date: 2025-03-03T14:26:43-06:00
New Revision: 4ca8ea8c972ae05a891687eda6704ec607184fae
URL: https://github.com/llvm/llvm-project/commit/4ca8ea8c972ae05a891687eda6704ec607184fae
DIFF: https://github.com/llvm/llvm-project/commit/4ca8ea8c972ae05a891687eda6704ec607184fae.diff
LOG: [Clang] Fix GPU intrinsic helpers incorrectly sign extending (#129560)
Summary:
These return values are actually signed, meaning that casting will
extend it and then all the bits will be one.
Added:
Modified:
clang/lib/Headers/amdgpuintrin.h
clang/lib/Headers/nvptxintrin.h
Removed:
################################################################################
diff --git a/clang/lib/Headers/amdgpuintrin.h b/clang/lib/Headers/amdgpuintrin.h
index 355e75d0b2d42..6ad8e54f4aadd 100644
--- a/clang/lib/Headers/amdgpuintrin.h
+++ b/clang/lib/Headers/amdgpuintrin.h
@@ -121,7 +121,7 @@ __gpu_read_first_lane_u64(uint64_t __lane_mask, uint64_t __x) {
uint32_t __hi = (uint32_t)(__x >> 32ull);
uint32_t __lo = (uint32_t)(__x & 0xFFFFFFFF);
return ((uint64_t)__builtin_amdgcn_readfirstlane(__hi) << 32ull) |
- ((uint64_t)__builtin_amdgcn_readfirstlane(__lo));
+ ((uint64_t)__builtin_amdgcn_readfirstlane(__lo) & 0xFFFFFFFF);
}
// Returns a bitmask of threads in the current lane for which \p x is true.
diff --git a/clang/lib/Headers/nvptxintrin.h b/clang/lib/Headers/nvptxintrin.h
index 29d0adcabc82f..03594dd9bd6cb 100644
--- a/clang/lib/Headers/nvptxintrin.h
+++ b/clang/lib/Headers/nvptxintrin.h
@@ -131,7 +131,8 @@ __gpu_read_first_lane_u64(uint64_t __lane_mask, uint64_t __x) {
__gpu_num_lanes() - 1)
<< 32ull) |
((uint64_t)__nvvm_shfl_sync_idx_i32(__mask, __lo, __id,
- __gpu_num_lanes() - 1));
+ __gpu_num_lanes() - 1) &
+ 0xFFFFFFFF);
}
// Returns a bitmask of threads in the current lane for which \p x is true.
More information about the cfe-commits
mailing list