[PATCH] D118977: [NVPTX] Add more FMA intriniscs/builtins

Artem Belevich via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Feb 4 10:25:36 PST 2022


tra added a comment.

> They all require PTX 7.0, SM_80.

According to https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#half-precision-floating-point-instructions-fma only `fma.relu` and `bf16*` variants require ptx70/sm80:

  PTX ISA Notes
  Introduced in PTX ISA version 4.2.
  
  fma.relu.{f16, f16x2} and fma{.relu}.{bf16, bf16x2} introduced in PTX ISA version 7.0.
  
  Target ISA Notes
  Requires sm_53 or higher.
  
  fma.relu.{f16, f16x2} and fma{.relu}.{bf16, bf16x2} require sm_80 or higher.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D118977/new/

https://reviews.llvm.org/D118977



More information about the llvm-commits mailing list