[Openmp-commits] [PATCH] D110006: [OpenMP] Add support for dynamic shared memory in new RTL
Jon Chesterfield via Phabricator via Openmp-commits
openmp-commits at lists.llvm.org
Mon Oct 4 06:43:29 PDT 2021
JonChesterfield added inline comments.
================
Comment at: openmp/libomptarget/plugins/cuda/src/rtl.cpp:102
/// file later.
struct omptarget_device_environmentTy {
int32_t debug_level;
----------------
jhuber6 wrote:
> JonChesterfield wrote:
> > This struct is also defined in deviceRTLs/common/device_environment.h, which doesn't seem to have the extra field present. Any objection to moving that header up a couple of levels then including it from the various plugins and device runtimes?
> >
> > uint64_t seems overly ambitious for shared size. The amdgpu hardware is limited to ~ 64k, can cuda go over 32 bits for this?
> It would probably be a good idea to keep the device environment stuff consistent. Currently there might be some difference between the new and old runtime so we should make sure that they all share the same size at least.
>
> And you're probably right, the argument in `cuLaunchKernel` that it's passed to only uses an `unsigned int` which is most likely 32 bits. Not likely someone is going to allocate a few billion bytes of shared memory.
Cool. I've wanted this for a while (rocm and trunk haven't always matched either) so will put up a patch
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D110006/new/
https://reviews.llvm.org/D110006
More information about the Openmp-commits
mailing list