[PATCH] D123471: [CUDA] Create offloading entries when using the new driver
Joseph Huber via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Fri May 6 16:37:56 PDT 2022
jhuber6 added inline comments.
================
Comment at: clang/lib/CodeGen/CGCUDARuntime.h:58
+ /// Mark the entry as a global variable.
+ OffloadGlobalEntry = 0x0,
+ /// Mark the entry as a managed global variable.
----------------
tra wrote:
> jhuber6 wrote:
> > tra wrote:
> > > We're still using the same numeric value for two different kinds of entities.
> > > Considering that it's the third round we're making around this point, I'm starting to suspect that I may be missing something.
> > > Is there a particular reason kernels and global unmanaged variables have to have the same 'kind'?
> > >
> > > It's possible that I didn't do a good job explaining my enthusiastic nitpicking here.
> > > My suggestion to have unified enum for **all** entities we register is based on a principle of separation of responsibilities. If we want to know what kind of entry we're dealing with, checking the 'kind' field should be sufficient. The 'size' field should only indicate the size of the entity. Having to consider both kind and size to determine what you're dealing with just muddies things and should not be done unless there's a good reason for that. E.g. it might be OK if we were short on flag bits.
> > >
> > >
> > Ah, I see the point you're making now. This is yet another thing that OpenMP did that I just copied. I wouldn't have implemented it this way but I figured it would be simpler to keep them similar. I mostly did it this way because I did some initial tests of registering and accessing CUDA globals in OpenMP and it required using the same flags for the kernels and globals. We could change it for CUDA in the future and I could make that change here if it's valuable. Ideally I would like to rewrite how we do all this registration with the structs but breaking the ABI makes it complicated...
> > I did some initial tests of registering and accessing CUDA globals in OpenMP and it required using the same flags for the kernels and globals.
>
> OK. So, there is something that requires this magic. If that's something we must have, then it must be mentioned in the comments around the enum.
>
> Do you know where I should find the code which needs this? I'm curious what's going on there.
> I wonder if it just checks for "flags==0" and refuses to deal with unknown flags.
> To think of it, we probably want to put the enum into a common header which defines the `__tgt_offload_entry`.We would not want OpenMP itself to start using the same bits for something else.
Sorry, I should be more specific. The OpenMP offloading runtime currently uses a size of zero to indicate a kernel function and the flags have a different meaning if it's a kernel. For OpenMP, 0 is a kernel, 1 and 2 are device ctors / dtors. I'm not sure why they chose this over just another flag but it's the current standard. You can see it used like this here https://github.com/llvm/llvm-project/blob/main/openmp/libomptarget/src/omptarget.cpp#L147. I'm not sure if there's a good way to wrangle these together now that I think about it, considering OpenMP already uses `0x1` to represent `link` OpenMP variables so this already collides. But treating the flags different on the size is at least consistent with what OpenMP does. It makes it a little hard to define one enum for it since we use it two different ways, I'm not a fan of it but it's what the current ABI uses.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D123471/new/
https://reviews.llvm.org/D123471
More information about the cfe-commits
mailing list