[libc-commits] [PATCH] D152603: [libc] Add math functions to AMD/NVPTX libm

Wed Jun 21 11:26:03 PDT 2023

arsenm added inline comments.

================
Comment at: libc/src/math/gpu/cos.cpp:14

-extern "C" {
-double __nv_sin(double);
-}
+LLVM_LIBC_FUNCTION(double, cos, (double x)) { return __builtin_cos(x); }

----------------
jhuber6 wrote:
> sivachandra wrote:
> > I am not sure the GPUs would have instructions to do trignometric functions.
> They do not, https://godbolt.org/z/zTWY168a3.
They do for f32. llvm.sin.f32 and llvm.cos.f32 expand fast-mathy for AMDGPU and I've been debating dropping support for them because it's just wrong

================
Comment at: libc/src/math/gpu/exp2f.cpp:14

-extern "C" {
-double __nv_sin(double);
-}
+LLVM_LIBC_FUNCTION(float, exp2f, (float x)) { return __builtin_exp2f(x); }

----------------
jhuber6 wrote:
> sivachandra wrote:
> > AMD gpus seem to be having `exp2f`. Can we assume other GPUs also have them? Even if they do have, what guarantees and promises they come with? The reason I am asking is to understand how we should set up tests for them.
> Seems NVPTX does not support it https://godbolt.org/z/a1jGzeo48. Maybe @arsenm can answer the later questions about the properties.
> 
> I'm still not sure how to stand up tests for this math on the GPU. We may need to map it against some host implementation and compare within some ULP. We could potentially have the GPU allocate a table like `arr = (x, f(x))` for that math function and pass it in, letting  the GPU scan it.
IMO the math intrinsics should all be correctly lowered. We've traditionally expanded a lot of the f32 intrinsics in a fast math type of way. I'm fixing the ones it's reasonable to inline expand (e.g. see D153025, D153024, D153023, D153022). sqrt is also currently wrong, and I have partial patches I'm working on to fix. sqrt f32 and f64 are totally doable.

llvm.pow.f32, llvm.cos.f32 and llvm.sin.f32's correct expansions are so big I don't think it's practical to implement backend expansions for., and we should probably stop supporting the fast expansions we do for them now. The f64 versions for nearly every function (except sqrt) I think is also impractically large to handle without emitting a libcall.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D152603/new/

https://reviews.llvm.org/D152603