[PATCH] D125904: [Cuda] Use fallback method to mangle externalized decls if no CUID given

Tue May 24 15:08:24 PDT 2022

jhuber6 added a comment.

A

================
Comment at: clang/lib/CodeGen/CodeGenModule.cpp:6836
+
+  // If the CUID is not specified we try to generate a unique postfix.
+  if (getLangOpts().CUID.empty()) {
----------------
jhuber6 wrote:
> tra wrote:
> > jhuber6 wrote:
> > > tra wrote:
> > > > > However, [CUID] is not always availible. 
> > > > 
> > > > The question is -- when and why is it not available? I'm getting the feeling that we're fixing the consequence here, not the root cause.
> > > > 
> > > > Is there a reason we can't make sure that the driver always generates a cuid for offload subcompilations and error out if it's needed but is not provided?
> > > > That would make this fallback unnecessary and would be a more robust approach in general.
> > > > 
> > > So, I'm more in favor of this approach because it doesn't require extra intervention from the compiler driver, this makes it less convoluted to do split compilation since we don't have an extra arguments. The way I would prefer it, is that we do this implicitly by default without requiring extra thought from the driver, but if it's not good enough we can support the manual `CUID` approach to let the user override it. I think this is a cleaner implementation, and is mostly coming from my support for CUDA in the new driver which currently doesn't implement the CUID as we do with the old driver. Generally I'd prefer things to behave independent of the driver, so we can consider host and device compilation more separately.
> > > So, I'm more in favor of this approach because it doesn't require extra intervention from the compiler driver
> > 
> > We need the driver intervention for any cc1 compilations anyways, so this does not buy us anything.  While you can run a sub-compilation manually with handcrafted cc1 flags, that's not a practical use case. The driver is the ultimate source of cc1 flags.
> > 
> > > this makes it less convoluted to do split compilation since we don't have an extra arguments.
> > 
> > For CUDA/HIP sub-compilation should be done with clang --cuda-host-only/--cuda-device-only.  Whether the driver supplies yet another cc1 option, --cuid=... makes no difference to the user launching such sub-compilation. 
> > 
> > > The way I would prefer it, is that we do this implicitly by default without requiring extra thought from the driver, but if it's not good enough we can support the manual CUID approach to let the user override it.
> > 
> > I agree that we can come up with something that will almost always work. Possibly even good enough for all practical purposes. However, if a better solution would take comparable effort, it would make sense to do things right and avoid adding technical debt. 
> > 
> > On the other hand, requiring the driver to supply identical cuid to all sub-compilations appears to be a better approach to me:
> > * Driver is the best place to do it, functionally. Driver has access to all user-provided inputs and is in position to guarantee that all subcompilations get the same cuid.
> > * Calculating CUID in the driver keeps relevant logic in one place. Doing it in the driver *and* in the codegen 
> > * Figuring out what inputs are relevant for calculation of CUID in cc1 invocation is error prone. E.g. we have to guess which cc1 options are relevant or not and is the driver would pass a macro to one subcompilation but not to another, we would end up generating mismatching CUID and would not have any way to notice that. Even when that's not the case, we would need to guess which flags, supplied by the driver, are relevant. At CC1 level that may be somewhat complicated as top-level options may expand to quite a few more cc1 options. E.g. we'll need to take into account `-std=...`, `--cuda-path=`, `-include ...`, `-I` (and other include paths)... All of that does not belong to the codegen.
> > 
> > The driver is already doing CUID computation, so I do not see any downsides to just letting it do its job, and I do believe it will be a better, and likely less complicated, solution.
> > 
> > > ... mostly coming from my support for CUDA in the new driver which currently doesn't implement the CUID as we do with the old driver
> > 
> > Right. That appears to be the key missing piece.
> > 
> > What are the obstacles for having CUID calculation done in the new driver. It should have all the info it needs. What am I missing?
> > 
> > For CUDA/HIP sub-compilation should be done with clang --cuda-host-only/--cuda-device-only. Whether the driver supplies yet another cc1 option, --cuid=... makes no difference to the user launching such sub-compilation.
> The problem I have with this is that we use the command line to generate the value, so they aren't going to be the same without the user manually specifying it. I guess we could filter out only "relevant" command line flags, maybe that's an option. I just think it's not intuitive for a name mangling scheme to depend on something external, but there's definitely advantages to doing it that way.
> 
> I can see your point for the Driver handling this stuff. Now that I'm thinking about it I don't think looking at the macros or the other arguments is a sound solution in the first place. Even without that it would work for almost all the same cases just using the file's unique ID. Without that, this solution is guaranteed not to conflict with any other file on the same file system at the time of compilation. This, as we discussed, potentially fails for non-static source trees and compiling the same file twice and linking it. The current CUID implementation fails on the former, this method fails on both.
> 
> If the CUID didn't exist, the way I would have implemented it would simply be with the File-ID, and have the CUID be a simple marshalling option that lets the user override it to something unique if needed. I personally think that's simpler for 99.99% of cases and has an easy-out in the last 0.01%. Given that it already exists there's some desire to keep it since the work has already been done I understand.
> 
> > What are the obstacles for having CUID calculation done in the new driver. It should have all the info it needs. What am I missing?
> It's less of a difficulty in implementing and more hoping we could make the name mangling more simple and work by default without the driver. 
> Also, we may need this support for a single case in OpenMP, and I'd prefer not need to generate the CUID for OpenMP offloading when it's unused the vast majority of the time. Generally I'd prefer if compiling for the host / device was conceptually the same to the user without requiring external values. If we're sold on the CUID method I can go forward with that, but from my perspective what it's buying us is the ability to compile the following
> ```
> static __device__ int a;
> 
> #ifdef MACRO
>   do_something_with(a);
> #else
>   do_something_else();
> #endif
> ```
> ```
> clang foo.cu -DMACRO -c -o 1.o
> clang foo.cu 1.o
> ```
> 
> This is just a tough problem overall, I don't think there's a single perfect solution. Whatever we choose we'll be trading reproducibility for correctness or whatever. You have more seniority in this space so it's your call what you think I should go forward with. 
Also, it's incredibly convoluted, but I can think of a way to break even the current CUID for this.
```
static __device__ int a;

__device__ int __attribute__((weak)) *a_ref = &a;
```
```
$ clang a.cu -c -fgpu-rdc
$ mv a.o b.o
$ clang a.cu -c -fgpu-rdc
$ nvlink a.o b.o -arch=sm_35 -o out.cubin
nvlink error   : Multiple definition of '_ZL1a__static__d041026c8e4167e6' in '1.o', first defined in 'a.o'
nvlink fatal   : merge_elf failed
```

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D125904/new/

https://reviews.llvm.org/D125904