[libc-commits] [PATCH] D148485: [libc] Add the '--threads' and '--blocks' option to the GPU loaders
Artem Belevich via Phabricator via libc-commits
libc-commits at lists.llvm.org
Mon Apr 17 11:42:42 PDT 2023
tra added inline comments.
================
Comment at: libc/utils/gpu/loader/nvptx/Loader.cpp:137-139
+ cuLaunchKernel(function, params.num_blocks_x, /*gridDimY=*/1,
+ /*gridDimZ=*/1, params.num_threads_x, /*blockDimY=*/1,
/*bloackDimZ=*/1, 0, stream, nullptr, args_config))
----------------
jhuber6 wrote:
> tra wrote:
> > If we're allowing controlling the number of blocks/threads at all, is there a reason not to allow specifying all dimensions?
> I wasn't sure if I should bother, but I could definitely add it since it's hardly anymore work from what's here.
>
> I think internally for implementations we'll need to generate all of our thread id's using the full dimensions as well, but that's probably standard.
That's one common pattern. However, there are also use cases when small kernels benefit performance-wise from being able to use x/y/z indices directly, without having to calculate the single thread ID and then split it into sub-indices.
There's also a limit on how large the individual dimensions can be, so specifying a single one may not be sufficient for large inputs:
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#features-and-technical-specifications-technical-specifications-per-compute-capability
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D148485/new/
https://reviews.llvm.org/D148485
More information about the libc-commits
mailing list