[PATCH] D129536: [CUDA][FIX] Make shfl[_sync] for unsigned long long non-recursive

Artem Belevich via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed Jul 20 15:12:09 PDT 2022


tra added a comment.

In D129536#3666860 <https://reviews.llvm.org/D129536#3666860>, @jdoerfert wrote:

> The assertion is arguably not great but doesn't really matter, does it? How would I detect if they are supported?

The latest revision of the patch is fine in this regard. My comment pointing to compiler crash reproducer was only intended to address the "For me this passes fine" part.

The only remaining thing is the manual `__CUDA_ARCH__` redifinition which looks suspect to me. Is there any reason not to use `-target-cpu sm_30` instead?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D129536/new/

https://reviews.llvm.org/D129536



More information about the cfe-commits mailing list