[Openmp-commits] [PATCH] D110006: [OpenMP] Add support for dynamic shared memory in new RTL

Jon Chesterfield via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Mon Oct 4 08:22:45 PDT 2021

JonChesterfield added inline comments.

Comment at: openmp/libomptarget/plugins/cuda/src/rtl.cpp:102
 /// file later.
 struct omptarget_device_environmentTy {
   int32_t debug_level;
JonChesterfield wrote:
> JonChesterfield wrote:
> > jhuber6 wrote:
> > > JonChesterfield wrote:
> > > > This struct is also defined in deviceRTLs/common/device_environment.h, which doesn't seem to have the extra field present. Any objection to moving that header up a couple of levels then including it from the various plugins and device runtimes?
> > > > 
> > > > uint64_t seems overly ambitious for shared size. The amdgpu hardware is limited to ~ 64k, can cuda go over 32 bits for this?
> > > It would probably be a good idea to keep the device environment stuff consistent. Currently there might be some difference between the new and old runtime so we should make sure that they all share the same size at least. 
> > > 
> > > And you're probably right, the argument in `cuLaunchKernel` that it's passed to only uses an `unsigned int` which is most likely 32 bits. Not likely someone is going to allocate a few billion bytes of shared memory.
> > Cool. I've wanted this for a while (rocm and trunk haven't always matched either) so will put up a patch
> The dynamic_shared_size field is currently unused (and not defined in the devicertl), but the cuda plugin raises an error when the size of the global in the devicertl is different to the size declared in this file. I don't understand how that is working at present.
Specifically, this patch missed the deviceRTL definition:
struct omptarget_device_environmentTy {
  int32_t debug_level;
  uint32_t num_devices;
  uint32_t device_num;

  rG LLVM Github Monorepo



More information about the Openmp-commits mailing list