[PATCH] D137154: Adding nvvm_reflect clang builtin

Wed Nov 9 11:32:19 PST 2022

tra added a subscriber: jhuber6.
tra added a comment.

In D137154#3917692 <https://reviews.llvm.org/D137154#3917692>, @hdelan wrote:

> Thanks for feedback. Instead of adding `__nvvm_reflect` as a clang builtin, would it be acceptable if I modified the NVVMReflect pass

That would be less problematic, but I'm still concerned that it would tacitly endorse the use of `__nvvm_reflect` by LLVM users.

> so that it works with addrspace casting as well? This would allow us to use `__nvvm_reflect` in openCL

Relaxing argument type checks on `__nvvm_reflect` function would be fine with me.

That said,...

TBH, I still don't quite convinced that compiler changes is the right solution for making it possible for *one* library to rely on something that was never intended to be exposed to compiler users.

Perhaps we should take a step back, figure out the fundamental problem you need to solve (as opposed to figuring out how to make a tactical hack work) and then figure out a more principled solution.

> In DPC++ for CUDA we use libclc as a wrapper around CUDA SDK's libdevice. Like libdevice we want to precompile libclc to bc for the CUDA backend without specializing for a particular arch, so that we can call different __nv funcs based on the arch. For this reason we use the __nvvm_reflect llvm intrinsic.

For starters, libdevice by itself is something that's not quite intended for the end user. It was a rather poor stop-gap solution to address the fact that there used to be no linking phase for GPU binaries and no 'standard' math library the code could rely on. The library itself does not have anything particularly interesting in it. Its major advantage is that it exists, while we don't have our own GPU-side libm yet. We do want to get rid of libdevice and replace it with an open-source math library of our own. With the recent improvements in offloading support in clang driver we're getting closer to making it possible.

As for the code specialization, why not build for individual GPUs? To me it looks like this use case is a good match for the "new-driver" offloading that's been recently implemented in clang. It allows compiling/linking GPU side code and that should obviate the need for shipping bitcode and relying on `__nvvm_reflect` for specialization.
The downside is that it's a recent feature, so it would not be available in older clang versions. @jhuber6 : I'm also not sure if OpenCL is supported by the new driver.

With the new driver, you in theory should be able to compile the source with `--offload-arch=A --offload-arch=B` and it would produce an object file with GPU-specific bitcode or object file which can then be transparently linked into the final executable, where clang would also perform final linking of GPU binaries, as well.

I realize that it may be hard or not feasible for your project right now. I'm OK with allowing limited nvvm_reflect use for the time being, but please do consider making things work w/o it or libdevice, if possible.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D137154/new/

https://reviews.llvm.org/D137154