[clang] [Clang] Fix 'gpuintrin.h' match when included with no arch set (PR #129927)

Wed Mar 5 16:35:51 PST 2025

================
@@ -179,8 +179,10 @@ __gpu_shuffle_idx_u64(uint64_t __lane_mask, uint32_t __idx, uint64_t __x,
 _DEFAULT_FN_ATTRS static __inline__ uint64_t
 __gpu_match_any_u32(uint64_t __lane_mask, uint32_t __x) {
   // Newer targets can use the dedicated CUDA support.
-  if (__CUDA_ARCH__ >= 700 || __nvvm_reflect("__CUDA_ARCH") >= 700)
+#if __CUDA_ARCH__ >= 700
+  if (__nvvm_reflect("__CUDA_ARCH") >= 700)
----------------
Artem-B wrote:

Also, if we're already checking for `__CUDA_ARCH__` on preprocessor level, is there any point to use `__nvvm_reflect()` at all?

IIUIC, the original code was intended to be compiled into "generic" IR, and it had to rely on __nvvm_reflect() to do things differently if/when it eventually ends up targeting a newer GPU, but now that the check is made up front, I'm not quite sure that we need it any more.


https://github.com/llvm/llvm-project/pull/129927