[llvm] [NVPTX] Support llvm.{exp2, log2} for f32/f16/bf16 and vectors (PR #120519)
Artem Belevich via llvm-commits
llvm-commits at lists.llvm.org
Thu Dec 19 16:24:16 PST 2024
Artem-B wrote:
NVIDIA's docs only [say ](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html?highlight=precision#floating-point-instructions-ex2)"The maximum absolute error is 2-22.5 for fraction in the primary range.".
However, there's a post on NVIDIA's forum, claiming the current implementation error to be 2.38344 ulp:
https://forums.developer.nvidia.com/t/more-accurate-version-of-exp2f-with-no-change-in-performance/243209
Both nvcc and clang currently translate `exp2f` into `ex2.approx`. I think that should be fine. To be on the safe side, we can keep the gating option, but set the default to be enabled.
https://github.com/llvm/llvm-project/pull/120519
More information about the llvm-commits
mailing list