[Openmp-commits] [PATCH] D134396: [OpenMP][libomptarget] New plugin infrastructure and new CUDA plugin

Kevin Sala Penadés via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Sat Oct 22 17:58:13 PDT 2022


kevinsala added inline comments.


================
Comment at: openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.h:162
+               ptrdiff_t *ArgOffsets, int32_t NumArgs, int32_t NumTeamsClause,
+               int32_t ThreadLimitClause, int32_t LoopTripCount,
+               AsyncInfoWrapperTy &AsyncInfoWrapper) const;
----------------
jhuber6 wrote:
> Why do we use `int32_t` for the loop trip count when the interface uses `uint64_t`? I think it's totally possible someone would have a trip-count > 3 billion or so.
The loop trip count can decide the number of blocks/groups launched by a kernel. In CUDA and HSA, these parameters are `unsigned int`and `uint32_t`, respectively, and they are actually limited by the corresponding max blocks/groups properties of the device. So, although the program can generate a larger loop trip count, we will have to adapt the value at some point before the kernel launch. In this patch, we adapt the loop trip count to be the minimum between the original count and the `int32_t`'s maximum. The original plugins just cast the original `uint64_t` value to `unsigned int`/`int`.

Maybe we could continue using the method of this patch but emitting a debug message to notify the user that the loop trip count was changed.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134396/new/

https://reviews.llvm.org/D134396



More information about the Openmp-commits mailing list