[Openmp-commits] [PATCH] D134396: [OpenMP][libomptarget] New plugin infrastructure and new CUDA plugin

Joseph Huber via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Mon Oct 24 10:06:12 PDT 2022


jhuber6 added inline comments.


================
Comment at: openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.h:162
+               ptrdiff_t *ArgOffsets, int32_t NumArgs, int32_t NumTeamsClause,
+               int32_t ThreadLimitClause, int32_t LoopTripCount,
+               AsyncInfoWrapperTy &AsyncInfoWrapper) const;
----------------
kevinsala wrote:
> jhuber6 wrote:
> > Why do we use `int32_t` for the loop trip count when the interface uses `uint64_t`? I think it's totally possible someone would have a trip-count > 3 billion or so.
> The loop trip count can decide the number of blocks/groups launched by a kernel. In CUDA and HSA, these parameters are `unsigned int`and `uint32_t`, respectively, and they are actually limited by the corresponding max blocks/groups properties of the device. So, although the program can generate a larger loop trip count, we will have to adapt the value at some point before the kernel launch. In this patch, we adapt the loop trip count to be the minimum between the original count and the `int32_t`'s maximum. The original plugins just cast the original `uint64_t` value to `unsigned int`/`int`.
> 
> Maybe we could continue using the method of this patch but emitting a debug message to notify the user that the loop trip count was changed.
I think it would be better to do the narrowing in the implementation rather than the interface. That way we don't need to do this cast in every plugin and can have a single comment explaining why.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134396/new/

https://reviews.llvm.org/D134396



More information about the Openmp-commits mailing list