[PATCH] D118977: [NVPTX] Add more FMA intriniscs/builtins
Jakub Chlanda via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Sun Feb 6 22:25:25 PST 2022
jchlanda added a comment.
In D118977#3297465 <https://reviews.llvm.org/D118977#3297465>, @tra wrote:
>> They all require PTX 7.0, SM_80.
>
> According to https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#half-precision-floating-point-instructions-fma only `fma.relu` and `bf16*` variants require ptx70/sm80:
>
> PTX ISA Notes
> Introduced in PTX ISA version 4.2.
>
> fma.relu.{f16, f16x2} and fma{.relu}.{bf16, bf16x2} introduced in PTX ISA version 7.0.
>
> Target ISA Notes
> Requires sm_53 or higher.
>
> fma.relu.{f16, f16x2} and fma{.relu}.{bf16, bf16x2} require sm_80 or higher.
My bad, sorry. Fixed now.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D118977/new/
https://reviews.llvm.org/D118977
More information about the cfe-commits
mailing list