[libc-commits] [PATCH] D152575: Added modf for NVPTX and AMDGPU targets to implement 'libmgpu.a' for math on the GPU

Joseph Huber via Phabricator via libc-commits libc-commits at lists.llvm.org
Wed Jun 14 18:36:55 PDT 2023


jhuber6 added a comment.

In D152575#4423253 <https://reviews.llvm.org/D152575#4423253>, @sivachandra wrote:

> In D152575#4421826 <https://reviews.llvm.org/D152575#4421826>, @jhuber6 wrote:
>
>> That's a good question, we'd probably like to have both if we'd like to perform some performance tests, but I'm not sure if that's a good reason to keep it in-tree if we have a suitable alternative. @sivachandra What do you think?
>
> Per my understanding from the earlier discussion, we will add a vendor-wrapper only if the in-tree implementation is not sufficiently proven to be a good enough replacement for the vendor implementation. If that is correct, I would like to stick to that. You shouldn't need a wrapper in the libc for differential testing.

It's not strictly necessary, but it will make it a lot easier if we can simply enable or disable a flag to check the performance delta either in the implementation itself or an application. My proposal is that we implement all the vendor targets at once to get a mostly complete facade of a `libm.a` that we can test. This will mostly just be copying the existing headers at https://github.com/llvm/llvm-project/blob/main/clang/lib/Headers/__clang_cuda_math.h and https://github.com/llvm/llvm-project/blob/main/clang/lib/Headers/__clang_hip_math.h. The expectation is that the order of implementation by default will be native GPU implementation > generic implementation > vendor implementation with a special with an extra configuration variable. That is, since we have a native implementation of `sinf` we will only user the vendor version of `sinf` if the user specified it in some list like `LIBC_GPU_VENDOR_MATH=sinf`. The only burden with this approach is carrying around a few extra entrypoints that are only enabled if specified by the user, but in exchange we get something that's functional immediately and can be incrementally improved. However, I do think that anything that can be replaced with a built-in shouldn't be considered a 'vendor' implementation and can simply be placed in the regular source.

> That said, will it ever be that a builtin for a floating point primitive will not be available? If yes, then the ideal approach we should take is to "fix the compiler". If that is not practical, we can take up adding vendor wrappers at that time with reduced scope (as in, that option is taken only if the builtin is not available.)

Yes, fixing the compiler will be the preferred approach here. Since we *only* build the GPU libraries with an up-to-date clang we can always patch the compiler in parallel with the GPU implementation.

Thanks for being understanding with this whole project. The GPU implementation brings in a lot of non-standard requirements but I think it's a a compelling project so I'm glad you've stuck with it thus far.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D152575/new/

https://reviews.llvm.org/D152575



More information about the libc-commits mailing list