[libc-commits] [PATCH] D152575: Added modf for NVPTX and AMDGPU targets to implement 'libmgpu.a' for math on the GPU

Thu Jun 15 09:34:39 PDT 2023

jhuber6 added a comment.

In D152575#4423642 <https://reviews.llvm.org/D152575#4423642>, @sivachandra wrote:

> I would expect that a libc implementation for a GPU math function, even if it is just a builtin-wrapper, is being added because it is expected/proven to be either better or equivalent to the vendor implementation. It is the burden of the GPU libc developer to verify that - making it convenient is definitely something that can be accommodated in the libc project. We do have a number of such things for comparison against the system libc. But, all such conveniences are outside of the libc (as in, there are no alternate wrapper entrypoints) - in fact, for many functions, the wrapper indirection is close to 50% overhead that it is not meaningful to compare against wrappers.

That's reasonable, we can bias towards in-tree implementations as long as the performance is somewhat on-par.

> About builtin-wrappers, I will be surprised if vendor libraries are doing anything different to get better performance etc. If that is really the case, the builtins should be fixed.

So, the vendor libraries are presented as LLVM-IR. This means that they will ultimately use the same intrinsics and be compiled by the same exact compiler. So there should be no difference. We should probably scan for any vendor implementation that can be replaced with one of the `nvvm` or `__builtin` functions. However we should keep in mind that the floating point behavior of these functions will probably be somewhat divergent from `libc`'s correct rounding assertion, we'll need to specify as such when we document this.

> I thought we agreed on this plan already. So, as a first step, add all builtin-wrappers wherever possible. Next, add vendor-wrappers and their libc-implementation alternates when available. Then the long tail of actually comparing/verifying/improving libc implementations to make them the default. I would really like if the libc implementations were the default to begin with, but as a practical decision in these initial stages, we will make the vendor wrappers the default. But, the vendor wrappers should be added only if there are no builtins which implement the corresponding floating point operations.

Yes, that's very reasonable. There's no reason to even compare the performance if it just goes to an intrinsic as I've mentioned above. So we should add all vendor functions, except ones that can be trivially recreated without calling into the accompanying library.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D152575/new/

https://reviews.llvm.org/D152575