[libc-commits] [PATCH] D148485: [libc] Add the '--threads' and '--blocks' option to the GPU loaders

Mon Apr 17 11:42:42 PDT 2023

tra added inline comments.

================
Comment at: libc/utils/gpu/loader/nvptx/Loader.cpp:137-139
+          cuLaunchKernel(function, params.num_blocks_x, /*gridDimY=*/1,
+                         /*gridDimZ=*/1, params.num_threads_x, /*blockDimY=*/1,
                          /*bloackDimZ=*/1, 0, stream, nullptr, args_config))
----------------
jhuber6 wrote:
> tra wrote:
> > If we're allowing controlling the number of blocks/threads at all, is there a reason not to allow specifying all dimensions?
> I wasn't sure if I should bother, but I could definitely add it since it's hardly anymore work from what's here.
> 
> I think internally for implementations we'll need to generate all of our thread id's using the full dimensions as well, but that's probably standard.
That's one common pattern. However, there are also use cases when small kernels benefit performance-wise from being able to use x/y/z indices directly, without having to calculate the single thread ID and then split it into sub-indices.

There's also a limit on how large the individual dimensions can be, so specifying a single one may not be sufficient for large inputs:
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#features-and-technical-specifications-technical-specifications-per-compute-capability

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D148485/new/

https://reviews.llvm.org/D148485