[PATCH] D84068: AMDGPU/clang: Search resource directory for device libraries

Mon Aug 10 16:07:13 PDT 2020

arsenm added a comment.

In D84068#2208132 <https://reviews.llvm.org/D84068#2208132>, @tra wrote:

> In D84068#2204713 <https://reviews.llvm.org/D84068#2204713>, @arsenm wrote:
>
>>> If we ship them with clang, who/where/how builds them?
>>> If they come from ROCm packages, how would those packages add stuff into *clang* install directory? Resource dir is a rather awkward location if contents may be expected to change routinely.
>>
>> Symlinks. I've been building the device libraries as part of LLVM_EXTERNAL_PROJECTS, and think this should be the preferred way to build and package the libraries. This is how compiler-rt is packaged on linux distributions. The compiler-rt binaries are a separate package symlinked into the resource directory locations. I'm not sure what you mean exactly by change routinely, the libraries should be an implementation detail invisible to users, not something they should be directly relying on. Only clang actually knows how to use them correctly and every other user is buggy
>>
>>> What if I have multiple ROCm versions installed? Which one should provide the bitcode in the resource dir?
>>
>> These should be treated as an integral part of clang, and not something to mix and match. Each rocm version should have its own copy of the device libraries. It only happens to work most of the time if you mismatch these, and this isn't a guaranteed property.
>
> I'm still not sure how that's going to work. We have `M clang versions`:`N ROCm versions` relationship here. 
> If I have one clang version, but want to do two different builds, one with ROCm-X and one with ROCm-Y, how would I do that? It sounds like I'll need to have multiple clang installation variants.
>
> Similarly, if I have multiple clang versions installed, how would ROCm know which of those clang installations must be updated?
>
> What if I install yet another clang version *after* ROCm has been installed, how will ROCm package know that it needs up update yet another clang installation.
>
> This will get rather unmanageable as soon as you get beyond the "I only have one clang and one ROCm version" scenario.

What I'm aiming for is a 1 clang : 1 device library copy. Every clang should have its own device library build. It's an implementation detail of clang, and not an independent component you can do anything with (correctly at least). clang is also minimally useful without the libraries

> I think it would make much more sense for clang to treat ROCm's bits as an external dependency, similar to CUDA. Be definition clang/llvm does not control anything outside of its own packages. While ROCm is AMD's package, I'm willing to bet that eventually various Linux distros will start shuffling its bits around the same way it happened to CUDA.

Long term, I'd rather aim for merging rocm-device-libs into libclc and making it an llvm project. They're largely forks of the same original sources from about 5 years ago, and it's an unfortunate split of effort. I also specifically do not want distros to be shuffling this around, and want it to behave exactly like compiler-rt. As far as I can tell cuda clang does not actually work with the Ubuntu packaged cuda, which arbitrarily moved the nvvm binaries, and I don't really want a repeat of this situation. I also do want the device libs build to support a non-rocm package to install to a standard distro clang package, which is different than the rocm libraries trying to support every clang in the universe. Ideally just a regular clang works without any formal rocm installation.

>>> As long as explicitly specified `--hip-device-lib-path` can still point to the right path, it's probably OK, but it all adds some confusion about who controls which parts of the HIP compilation and how it all is supposed to work in cases that deviate from the default assumptions.
>>
>> Long term I would rather get rid of --hip-device-lib-path, and only use the standard -resource_dir flags
>
> Please, please, please keep explicit path option. There are real use cases where you can not expect ROCm to be installed anywhere 'standard'. Or at all. 
> Imagine people embedding libclang into their GUI/tools. There's no resource directory. There may be no ROCm installation, or it may not be possible due to lack of privileges.

This isn't any different that compiler-rt; I would expect these cases to embed and mount these in a virtual filesystem.

> I short, I think that tightly coupling clang's expectations to a non-clang project is not a good idea.
> Summoning @echristo for a second opinion.

With the dance of searching for a rocm or cuda installation, it's still a coupling. It's just weirder looking and assumes the existing poor, non-standard packaging practices. What I want is something that's indistinguishable from compiler-rt from a packaging perspective.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D84068/new/

https://reviews.llvm.org/D84068