[llvm] [NVPTX] Select bfloat16 add/mul/sub as fma on SM80 (PR #121065)

Thu Jan 9 15:54:34 PST 2025

peterbell10 wrote:

>FADD/FMUL for bf16 requires PTX 7.8 and sm_90. https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#half-precision-floating-point-instructions-mul

Exactly... It requires sm_90, so for sm_80 there is support for fma but not add/mul/sub. What's confusing here?

https://github.com/llvm/llvm-project/pull/121065