[Openmp-commits] [PATCH] D88185: [OpenMP] cmake option LIBOMPTARGET_NVPTX_MAX_SM for nvptx device RTL

Wed Sep 23 18:08:27 PDT 2020

ye-luo added inline comments.

================
Comment at: openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.h:62
+// GA102 design has a maxinum of 84 SMs
+#define MAX_SM 108
+#elif __CUDA_ARCH__ >= 700
----------------
JonChesterfield wrote:
> ye-luo wrote:
> > JonChesterfield wrote:
> > > Can we distinguish between GA100 and GA102? This structure is large so oversizing wastes significant memory.
> > GA100 is __CUDA_ARCH__ 800. GA102 is 860.
> > There are also 700, 720, 750
> > I don't really feel the necessity to add more resolution because LIBOMPTARGET_NVPTX_MAX_SM can be leveraged.
> It could matter to someone with a GA102 who hasn't read the cmake. Back of envelope math suggests there's a little under a gigabyte of allocated but unused memory between 84 and 108.
On arch 600, My measurement between 56 and 6 indicates about 500MB difference. So I expect 200MB difference and should matter little to GA102 owners. RTX 3070 has 8GB.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D88185/new/

https://reviews.llvm.org/D88185