[llvm] [NVPTX] Support llvm.{exp2, log2} for f32/f16/bf16 and vectors (PR #120519)

Thu Dec 19 16:22:09 PST 2024

jhuber6 wrote:

> Hmm. Looking at the front-end, I see that we forward `exp2f` to `__nv_exp2f`:
> 
> https://github.com/llvm/llvm-project/blob/6f983f88537415952ec528c42f89f1d5b620fe68/clang/lib/Headers/__clang_cuda_math.h#L112
> 
> The interesting part is that `__nv_exp2f` in libdevice implements things via `@llvm.nvvm.ex2.approx...`
> 
> ```
> ; Function Attrs: alwaysinline nounwind
> define float @__nv_exp2f(float %x) #0 {
>   %1 = call i32 @__nvvm_reflect(ptr @.str) #6
>   %2 = icmp ne i32 %1, 0
>   br i1 %2, label %3, label %5
> 
> 3:                                                ; preds = %0
>   %4 = call float @llvm.nvvm.ex2.approx.ftz.f(float %x) #6
>   br label %__exp2f.exit
> 
> 5:                                                ; preds = %0
>   %6 = call float @llvm.nvvm.ex2.approx.f(float %x) #6
>   br label %__exp2f.exit
> 
> __exp2f.exit:                                     ; preds = %3, %5
>   %.0 = phi float [ %4, %3 ], [ %6, %5 ]
>   ret float %.0
> }
> ```
> 
> Considering that CUDA has been living with that implementation all this time, perhaps _that'_ is the way we should handle things here, too. In other words using ex2.approx may be OK unconditionally, after all.

Some day I'd really like to remove this little shim layer and just let `exp2` be `exp2` and provide these via the LLVM libm. Right now the main limitation is due to issues w/ linking libcalls via LTO. Because `llvm.exp2` does not link `exp2` it won't be extracted during LTO time. Then this will make it to the backend, get lowered to the `exp2` libcall, and be unresolved. Either that or error on an unimplemented intrinsic. I think we need some logic to state that `llvm.exp2` extracts `exp2` from static libraries, then for certain backends (AMDGPU, NVPTX) before any other optimization we do IR lowering of `llvm.exp2` to `exp2` and put `nobuiltin` on the call site to prevent any further optimizations from turning it back into the LLVM intrinsic.

https://github.com/llvm/llvm-project/pull/120519