[libc-commits] [PATCH] D139839: [libc] Add a loader utility for AMDHSA architectures for testing

Tue Dec 13 15:48:55 PST 2022

jhuber6 added a comment.

In D139839#3993320 <https://reviews.llvm.org/D139839#3993320>, @jdoerfert wrote:

> In D139839#3993272 <https://reviews.llvm.org/D139839#3993272>, @sivachandra wrote:
>
>> On the general topic of adding dependencies on other projects like `openmp`, we should strive to keep the libc project as self-contained as possible.

This is my philosophy as well, I'd prefer to keep this `libc` implementation on the GPU separate from other GPU runtimes.

> The problem is that now we duplicate (and in the future extend) the "gpu launch" logic for each target.
> We are effectively creating yet another offloading runtime. We will invent new abstraction layers, new features will be added, etc.

I don't think this is a bad thing, the loader tool here has utility outside of this project if we wanted to directly stimulate the GPU without potentially pulling in potential bugs and complexity from the existing (and quite large) offloading runtimes like CUDA and OpenMP. The design I outlined is more in-line with a `libc` on the GPU as it behaves like simple cross-compilation for testing purposes. My plan is that the additional features are part of a "host" portion of the GPU `libc`, which means the loader in its current state will be unchanged and simply link against the host runtime as would CUDA or OpenMP or HIP if they link in the `libc`.

> This patch has 400 lines of HSA magic copied probably from OpenMP. We will need the same for CUDA, OneAPI, maybe OpenCL, ...
> Then we need extra logic in all of them to support allocators (via pre-allocation), in all their shapes (bump, free-lists, ...).
> Then we need to put RPC logic in them, again per target, and all of it is then not yet tested with OpenMP offload.

A lot of it is boilerplate, we could clean it up with some shared headers as @JonChesterfield brought up. But I do agree there's a non-zero amount of effort required to write these loaders for a new architecture, but standing up a new target in the `libc` would require runtime support for the future RPC, allocators, etc as well.

> A loader using the existing offloading runtime:
>
> 1. does not require clang changes
> 2. requires 2 minimal files (shown below)
> 3. will work on all supported GPU targets, the MPI target, virtual GPU target, ...

To be fair, the clang changes are simply to support what I think is a valuable compilation mode for targeting GPUs which allows us to treat it as cross-compiling rather than a distinct offloading system. These changes are pretty minimal for NVPTX and weren't required for AMDGPU.

It's difficult to reach a consensus on these topics, I'm with @sivachandra and @JonChesterfield that a `libc` should be mostly standalone implementation wise and then linked into different targets.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139839/new/

https://reviews.llvm.org/D139839