[Openmp-commits] [PATCH] D110006: [OpenMP] Add support for dynamic shared memory in new RTL
Jon Chesterfield via Phabricator via Openmp-commits
openmp-commits at lists.llvm.org
Mon Oct 4 08:59:55 PDT 2021
JonChesterfield added inline comments.
================
Comment at: openmp/libomptarget/plugins/cuda/src/rtl.cpp:102
/// file later.
struct omptarget_device_environmentTy {
int32_t debug_level;
----------------
tianshilei1992 wrote:
> JonChesterfield wrote:
> > JonChesterfield wrote:
> > > JonChesterfield wrote:
> > > > jhuber6 wrote:
> > > > > JonChesterfield wrote:
> > > > > > This struct is also defined in deviceRTLs/common/device_environment.h, which doesn't seem to have the extra field present. Any objection to moving that header up a couple of levels then including it from the various plugins and device runtimes?
> > > > > >
> > > > > > uint64_t seems overly ambitious for shared size. The amdgpu hardware is limited to ~ 64k, can cuda go over 32 bits for this?
> > > > > It would probably be a good idea to keep the device environment stuff consistent. Currently there might be some difference between the new and old runtime so we should make sure that they all share the same size at least.
> > > > >
> > > > > And you're probably right, the argument in `cuLaunchKernel` that it's passed to only uses an `unsigned int` which is most likely 32 bits. Not likely someone is going to allocate a few billion bytes of shared memory.
> > > > Cool. I've wanted this for a while (rocm and trunk haven't always matched either) so will put up a patch
> > > The dynamic_shared_size field is currently unused (and not defined in the devicertl), but the cuda plugin raises an error when the size of the global in the devicertl is different to the size declared in this file. I don't understand how that is working at present.
> > Specifically, this patch missed the deviceRTL definition:
> > libomptarget/deviceRTLs/common/device_environment.h
> > struct omptarget_device_environmentTy {
> > int32_t debug_level;
> > uint32_t num_devices;
> > uint32_t device_num;
> > };
> >
> This patch only adds the support in the new device runtime, as described in the title.
Right, but both old and new device runtime use the same host plugin, and the host plugin raises an error if this struct has the wrong size, and the old device runtime has one without the extra uint64_t. Just put a comment by the size check further down this file
================
Comment at: openmp/libomptarget/plugins/cuda/src/rtl.cpp:930
if (CUSize != sizeof(DeviceEnv)) {
REPORT(
"Global device_environment '%s' - size mismatch (%zu != %zu)\n",
----------------
size mismatch here with old runtime, as far as I can tell
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D110006/new/
https://reviews.llvm.org/D110006
More information about the Openmp-commits
mailing list