[Openmp-commits] [PATCH] D88185: [OpenMP] cmake option LIBOMPTARGET_NVPTX_MAX_SM for nvptx device RTL

Ye Luo via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Wed Sep 23 18:08:27 PDT 2020

ye-luo added inline comments.

Comment at: openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.h:62
+// GA102 design has a maxinum of 84 SMs
+#define MAX_SM 108
+#elif __CUDA_ARCH__ >= 700
JonChesterfield wrote:
> ye-luo wrote:
> > JonChesterfield wrote:
> > > Can we distinguish between GA100 and GA102? This structure is large so oversizing wastes significant memory.
> > GA100 is __CUDA_ARCH__ 800. GA102 is 860.
> > There are also 700, 720, 750
> > I don't really feel the necessity to add more resolution because LIBOMPTARGET_NVPTX_MAX_SM can be leveraged.
> It could matter to someone with a GA102 who hasn't read the cmake. Back of envelope math suggests there's a little under a gigabyte of allocated but unused memory between 84 and 108.
On arch 600, My measurement between 56 and 6 indicates about 500MB difference. So I expect 200MB difference and should matter little to GA102 owners. RTX 3070 has 8GB.



More information about the Openmp-commits mailing list