[Openmp-commits] [PATCH] D88185: [OpenMP] cmake option LIBOMPTARGET_NVPTX_MAX_SM for nvptx device RTL

Wed Sep 23 17:02:27 PDT 2020

JonChesterfield added a comment.

Change seems reasonable. Amdgcn could benefit from the same, e.g. for trying to get apu systems with about 8 CU to run openmp code. Suggest we do that in a different patch if someone asks for it.

I'd like to get rid of the structure this macro controls entirely but don't have a good time estimate for that. This looks like a good idea in the meantime.

================
Comment at: openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.h:62
+// GA102 design has a maxinum of 84 SMs
+#define MAX_SM 108
+#elif __CUDA_ARCH__ >= 700
----------------
Can we distinguish between GA100 and GA102? This structure is large so oversizing wastes significant memory.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D88185/new/

https://reviews.llvm.org/D88185