[libc-commits] [PATCH] D152575: Added modf for NVPTX and AMDGPU targets to implement 'libmgpu.a' for math on the GPU

Siva Chandra via Phabricator via libc-commits libc-commits at lists.llvm.org
Wed Jun 14 23:58:56 PDT 2023


sivachandra added a comment.

In D152575#4423390 <https://reviews.llvm.org/D152575#4423390>, @jhuber6 wrote:

> It's not strictly necessary, but it will make it a lot easier if we can simply enable or disable a flag to check the performance delta either in the implementation itself or an application.

I would expect that a libc implementation for a GPU math function, even if it is just a builtin-wrapper, is being added because it is expected/proven to be either better or equivalent to the vendor implementation. It is the burden of the GPU libc developer to verify that - making it convenient is definitely something that can be accommodated in the libc project. We do have a number of such things for comparison against the system libc. But, all such conveniences are outside of the libc (as in, there are not alternate wrapper entrypoints) - in fact, for many functions, the wrapper indirection is close to 50% overhead that it is not meaningful to compare against wrappers.

About builtin-wrappers, I will be surprised if vendor libraries are doing anything different to get better performance etc. If that is really the case, the builtins should be fixed.

> My proposal is that we implement all the vendor targets at once to get a mostly complete facade of a `libm.a` that we can test. This will mostly just be copying the existing headers at https://github.com/llvm/llvm-project/blob/main/clang/lib/Headers/__clang_cuda_math.h and https://github.com/llvm/llvm-project/blob/main/clang/lib/Headers/__clang_hip_math.h. The expectation is that the order of implementation by default will be native GPU implementation > generic implementation > vendor implementation with a special with an extra configuration variable. That is, since we have a native implementation of `sinf` we will only user the vendor version of `sinf` if the user specified it in some list like `LIBC_GPU_VENDOR_MATH=sinf`. The only burden with this approach is carrying around a few extra entrypoints that are only enabled if specified by the user, but in exchange we get something that's functional immediately and can be incrementally improved. However, I do think that anything that can be replaced with a built-in shouldn't be considered a 'vendor' implementation and can simply be placed in the regular source.

I thought we agreed on this plan already. So, as a first step, add all builtin-wrappers wherever possible. Next, add vendor-wrappers and their libc-implementation alternates when available. Then the long tail of actually comparing/verifying/improving libc implementations to make them the default. I would really like if the libc implementations were the default to begin with, but as a practical decision in these initial stages, we will make the vendor wrappers the default. But, the vendor wrappers should be added only if there are no builtins which implement the corresponding floating point operations.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D152575/new/

https://reviews.llvm.org/D152575



More information about the libc-commits mailing list