[PATCH] D129536: [CUDA][FIX] Make shfl[_sync] for unsigned long long non-recursive

Johannes Doerfert via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed Jul 20 14:59:57 PDT 2022


jdoerfert added a comment.

In D129536#3666257 <https://reviews.llvm.org/D129536#3666257>, @tra wrote:

> In D129536#3663957 <https://reviews.llvm.org/D129536#3663957>, @jdoerfert wrote:
>
>> @tra, unsure about the crash. For me this passes fine (no gpu), is anything missing?
>
> The tests in the patch are running with `-emit-llvm`, so they are not actually lowering to NVPTX and that's where the failure happens. https://godbolt.org/z/cchaWxrhn

The assertion is arguably not great but doesn't really matter, does it? How would I detect if they are supported?



================
Comment at: clang/lib/Headers/__clang_cuda_intrinsics.h:237-238
 
-inline __device__ unsigned int
-__match64_any_sync(unsigned int mask, unsigned long long value) {
+inline __device__ unsigned int __match64_any_sync(unsigned int mask,
+                                                  unsigned long long value) {
   return __nvvm_match_any_sync_i64(mask, value);
----------------
tra wrote:
> Nit: this change is irrelevant to the patch and can be removed.
me running clang format on the file. I'll push it nfc before.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D129536/new/

https://reviews.llvm.org/D129536



More information about the cfe-commits mailing list