[llvm] [NVPTX] Select bfloat16 add/mul/sub as fma on SM80 (PR #121065)
Artem Belevich via llvm-commits
llvm-commits at lists.llvm.org
Thu Jan 9 15:48:35 PST 2025
Artem-B wrote:
> The spec says
>
> > `add{.rnd}.bf16` and `add{.rnd}.bf16x2` requires `sm_90` or higher.
>
> I don't see any suggestion that that only applies for specific PTX versions.
`sm_90` is only supported by PTX 7.8 and newer:
![image](https://github.com/user-attachments/assets/c859abe8-af63-4d8e-97b9-eb7ec5ba3fe5)
FMA instruction for bf16 types requires PTX 7.0 and sm_80: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#half-precision-floating-point-instructions-fma
FADD/FMUL for bf16 requires PTX 7.8 and sm_90. https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#half-precision-floating-point-instructions-mul
So, the only cases where we can benefit from this patch are:
* PTX <= 7.0 and GPU >= sm_80(otherwise, there's no BF16 FMA support)
* PTX < 7.8 (otherwise FMUL is available, and we don't need the patch).
sm_90 or newer GPUs require PTX 7.8 and therefore do not benefit from the patch.
What's left is sm_80 and PTX versions 7.0 through 7.7.
What do I miss?
https://github.com/llvm/llvm-project/pull/121065
More information about the llvm-commits
mailing list