[Openmp-commits] [PATCH] D88185: [OpenMP] cmake option LIBOMPTARGET_NVPTX_MAX_SM for nvptx device RTL
Jon Chesterfield via Phabricator via Openmp-commits
openmp-commits at lists.llvm.org
Wed Sep 23 17:02:27 PDT 2020
JonChesterfield added a comment.
Change seems reasonable. Amdgcn could benefit from the same, e.g. for trying to get apu systems with about 8 CU to run openmp code. Suggest we do that in a different patch if someone asks for it.
I'd like to get rid of the structure this macro controls entirely but don't have a good time estimate for that. This looks like a good idea in the meantime.
================
Comment at: openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.h:62
+// GA102 design has a maxinum of 84 SMs
+#define MAX_SM 108
+#elif __CUDA_ARCH__ >= 700
----------------
Can we distinguish between GA100 and GA102? This structure is large so oversizing wastes significant memory.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D88185/new/
https://reviews.llvm.org/D88185
More information about the Openmp-commits
mailing list