[PATCH] D146973: [Clang] Implicitly include LLVM libc headers for the GPU

Mon Mar 27 13:57:09 PDT 2023

tra added inline comments.

================
Comment at: clang/lib/Driver/ToolChains/Clang.cpp:1230
+      llvm::sys::path::append(P, "llvm-libc");
+      CmdArgs.push_back("-c-isystem");
+      CmdArgs.push_back(Args.MakeArgString(P));
----------------
jhuber6 wrote:
> sivachandra wrote:
> > tra wrote:
> > > sivachandra wrote:
> > > > jhuber6 wrote:
> > > > > tra wrote:
> > > > > > jhuber6 wrote:
> > > > > > > tra wrote:
> > > > > > > > Ensuring the right include order will be tricky. Interaction between the headers provided by llvm-libc vs the system headers will be interesting if we end up mixing the headers. It may be less problematic compared to the C++ standard library, but I doubt that mixing two implementations would work well here, either. 
> > > > > > > > 
> > > > > > > > So, the major question I have -- does adding include path here guarantees that we're not picking up the host headers? I do not see any changes that would exclude system headers from include paths.
> > > > > > > > If we do have include paths leading to both llvm-libc and the host headers, what's expected to happen if user code ends up including a header that's not present in  llvm-libc? Is that possible?
> > > > > > > > 
> > > > > > > Right now I'm just kind of relying on an expectation that since this will be the first `c-isystem` path set, then it will pull in these libraries first if they exist. It's not captured by the tests, but compiling with `-v` shows this path being used first in my experience. So, theoretically, if there is an implementation of said header in this location, it will be picked up before anything else. Otherwise it'll just search the other standard locations.
> > > > > > I think this will be a problem. We're cross-compiling here and for that to work reliably we need to make sure that only target headers are in effect. The problem is that we likely just do not have sufficiently complete set of headers for the GPU. Do we? I have no idea what exactly llvm-libc provides and whether it is sufficient for normal user code to cross-compile for a GPU. 
> > > > > > 
> > > > > > It would be interesting to try to compile some C++ code which would include commonly used, but generally target-agnostic, headers like <vector> <complex> <algorithm>, etc and check whether we end up pulling in any system headers. Then check what happens if we do not have system headers available at all.
> > > > > No, it's definitely not complete. Some headers work on the GPU, most break in some way or another. The only ones `llvm-libc` provides currently is `string.h` and `ctype.h`. But, I figured this wouldn't be a problem since it would just go to the system headers anyway if we didn't provide them. So we are merely replacing maybe broken with probably works.
> > > > > 
> > > > > I was talking with Johannes and he brings up other issues about the idea of host-device compatibility between these headers. Since, fundamentally, right now `libc` generates its own headers and needs to generate its own headers to function. But there can be a problem when migrating data between the host and the device is the headers on the host differ somewhat to those on the device. I'm not sure what a good overall solution to that problem is.
> > > > Normally, one should not expect target and host headers to be compatible. So, if you are building for the host, you should use host headers and if you are building for the target, you should use target headers. Does general GPU build not follow this approach? May be there are some common headers but I do not expect them to be from the standard libraries.
> > > We can generally assume that the GPU and the host do have largely identical types. At least the subset of the types we expect to exchange between host and GPU.
> > > CUDA compilation cheats, and allows the host to provide most of the headers, with clang and CUDA SDK providing a subset of GPU-side overloads. This way, if GPU-side functions do implicitly rely on the code nominally provided for the host by the host headers, but if we need to code-gen it, we can only do so for a subset that's actually compileable for the GPU -- either constexpr functions, lambdas or `__device__` overloads provided by us.
> > > 
> > > Standalone compilation does not have the benefit of having the cake and being able to eat it. It has to be all or nothing, as we do not have the ability to separate the host and GPU code we pull in via headers, nor can we provide a GPU-side overloads. In a way injecting llvm-libc path is a crude attempt to do that by providing GPU-side implementations before we get to include host-side ehaders that would provide the host versions. While it may sometimes work, I do not think it's a robust solution.
> > > 
> > > The only sound(-ish) ways out I see are:
> > > - implement sufficiently complete set of headers for the GPU.
> > > - provide GPU-side overloads which would allow us to pull in system headers and augment them with GPU-specific implementations when necessary. 
> > > 
> > > The former is quite a bit of work. The latter is easier, but gets us away from "it's just a plain C compilation, just for a GPU", as we'll grow yet another C dialect and the libc headers will need to have additional glue to mark functions as "for GPU".
> > > 
> > > 
> > > 
> > > We can generally assume that the GPU and the host do have largely identical types. At least the subset of the types we expect to exchange between host and GPU.
> > 
> > Types for communication have to be of similar topology I would think and putting them in common header is expected. But, would standard library types have match with the host?
> > 
> > > CUDA compilation cheats, and allows the host to provide most of the headers, with clang and CUDA SDK providing a subset of GPU-side overloads. This way, if GPU-side functions do implicitly rely on the code nominally provided for the host by the host headers, but if we need to code-gen it, we can only do so for a subset that's actually compileable for the GPU -- either constexpr functions, lambdas or `__device__` overloads provided by us.
> > > 
> > > In a way injecting llvm-libc path is a crude attempt to do that by providing GPU-side implementations before we get to include host-side ehaders that would provide the host versions. While it may sometimes work, I do not think it's a robust solution.
> > 
> > Will it still be considered crude if the new path being proposed here contains only standard library headers?
> > 
> > > The only sound(-ish) ways out I see are:
> > > - implement sufficiently complete set of headers for the GPU.
> > 
> > What does "sufficiently complete" mean? Are you referring to an incomplete header file or an incomplete set of header files? Assuming it is the latter, is it problematic if we fallback to host-headers (or headers which are currently used for the GPU build)?
> > 
> > - implement sufficiently complete set of headers for the GPU.
> 
> This is what I was going for, we slowly implement standard C headers and provide them as a list of functions and interfaces that are expected to work on the GPU. Otherwise we default to the support we already have. Given that LLVM `libc` is already aiming to provide its own complete set of headers we should be able to reuse a lot of work to this effect.
> 
> > - provide GPU-side overloads which would allow us to pull in system headers and augment them with GPU-specific implementations when necessary.
> 
> I don't think this is a robust solution to implementing a `libc`. The problem is that the system headers can pull in implementation details that we don't necessarily want to recreate. Things like functions or symbols that aren't defined in our `libc` that we have no way of wrapping around if we use the system header. I think the only reasonable way is to try to keep them completely separate and add some glue to the generated headers for the GPU as-needed.
> would standard library types have match with the host?

That's probably getting us into a gray area. Ideally, we want the host and GPU to "just work" with each other. 
In practice we can't always make it work as such types may contain host or GPU pointers and therefore may not always be functional on the other side.

> Will it still be considered crude 

Perhaps "crude" was not the best choice of a word. Not sure what would be a better choice, though.

> if the new path being proposed here contains only standard library headers?

If that's the only path providing the standard headers (I.e. no other host headers are involved), it would be fine.
Or, if the headers involved contain only the well-defined set of functions in both libc and the system headers, and are guaranteed to be seamlessly interchangeable, that would be fine, too. "Fine" as in -- libc files will replace the subset of the host APIs, but whether the remaining host headers are compileable for GPU is still up in the air.

The point is that include path injection is unlikely to address the fundamental issue that we're using host includes during cross-compilation. We happen to be able to assume that the types on the host and the GPU are expected to be identical, but that's not necessarily sufficient. E.g. host headers may use inline assembly for the host CPU.

> What does "sufficiently complete" mean? 

incomplete set of header files.

> is it problematic if we fallback to host-headers (or headers which are currently used for the GPU build)?

It depends. Some code will compile fine. Other will parse OK, but we would not be able to codegen it (e.g. calls some host-only function), Some would fail to parse (e.g. something using inline asm for the host CPU, or relying something that's been ifdef'ed out because we're not targeting the host CPU.)

The approach we used with CUDA was to allow use of the subset of the host headers (constexpr functions, lambdas) and selectively provide `__device__` overloads to allow a subset of the standard C++ libraries to work. It's largely limited to math functions, and <complex>. This approach is also problematic as it depends on the implementation details of the standard library, so it's prone to breaking.

libc has the benefit of relatively constrained API (compared to libc++), but has a disadvantage that there's no easy way to provide GPU overloads, other than via include path.

I guess if we only replace standard math functions, include path injection may work well enough. That said, I still don't think it's a good general approach, as it assumes that the host headers would compile sensibly for a GPU they were never intended to be used with.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D146973/new/

https://reviews.llvm.org/D146973