[llvm] [NVPTX] Select bfloat16 add/mul/sub as fma on SM80 (PR #121065)
via llvm-commits
llvm-commits at lists.llvm.org
Thu Jan 9 15:54:34 PST 2025
peterbell10 wrote:
>FADD/FMUL for bf16 requires PTX 7.8 and sm_90. https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#half-precision-floating-point-instructions-mul
Exactly... It requires sm_90, so for sm_80 there is support for fma but not add/mul/sub. What's confusing here?
https://github.com/llvm/llvm-project/pull/121065
More information about the llvm-commits
mailing list