[Openmp-commits] [PATCH] D88185: [OpenMP] cmake option LIBOMPTARGET_NVPTX_MAX_SM for nvptx device RTL

Jon Chesterfield via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Wed Sep 23 17:02:27 PDT 2020

JonChesterfield added a comment.

Change seems reasonable. Amdgcn could benefit from the same, e.g. for trying to get apu systems with about 8 CU to run openmp code. Suggest we do that in a different patch if someone asks for it.

I'd like to get rid of the structure this macro controls entirely but don't have a good time estimate for that. This looks like a good idea in the meantime.

Comment at: openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.h:62
+// GA102 design has a maxinum of 84 SMs
+#define MAX_SM 108
+#elif __CUDA_ARCH__ >= 700
Can we distinguish between GA100 and GA102? This structure is large so oversizing wastes significant memory.



More information about the Openmp-commits mailing list