[Openmp-commits] [PATCH] D88185: [OpenMP] cmake option LIBOMPTARGET_NVPTX_MAX_SM for nvptx device RTL

Wed Sep 23 18:13:23 PDT 2020

JonChesterfield added inline comments.

================
Comment at: openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.h:62
+// GA102 design has a maxinum of 84 SMs
+#define MAX_SM 108
+#elif __CUDA_ARCH__ >= 700
----------------
ye-luo wrote:
> JonChesterfield wrote:
> > ye-luo wrote:
> > > JonChesterfield wrote:
> > > > Can we distinguish between GA100 and GA102? This structure is large so oversizing wastes significant memory.
> > > GA100 is __CUDA_ARCH__ 800. GA102 is 860.
> > > There are also 700, 720, 750
> > > I don't really feel the necessity to add more resolution because LIBOMPTARGET_NVPTX_MAX_SM can be leveraged.
> > It could matter to someone with a GA102 who hasn't read the cmake. Back of envelope math suggests there's a little under a gigabyte of allocated but unused memory between 84 and 108.
> On arch 600, My measurement between 56 and 6 indicates about 500MB difference. So I expect 200MB difference and should matter little to GA102 owners. RTX 3070 has 8GB.
Measuring beats mental arithmetic against a different arch. Amdgpu was 2.1gb w/64, so about 30mb/SM. Sort of glad to hear nvptx is smaller per SM.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D88185/new/

https://reviews.llvm.org/D88185