[all-commits] [llvm/llvm-project] 6eab9d: [NVPTX] remove incorrect NVPTX intrinsic transform...

Mon Jan 8 15:17:13 PST 2024

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 6eab9dd7f01e6cad9f1a93bd52e4c6e7b4c3c1fa
      https://github.com/llvm/llvm-project/commit/6eab9dd7f01e6cad9f1a93bd52e4c6e7b4c3c1fa
  Author: Alex MacLean <amaclean at nvidia.com>
  Date:   2024-01-08 (Mon, 08 Jan 2024)

  Changed paths:
    M llvm/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp
    M llvm/test/Transforms/InstCombine/NVPTX/nvvm-intrins.ll

  Log Message:
  -----------
  [NVPTX] remove incorrect NVPTX intrinsic transformations (#76870)

`nvvm_fabs_f`
`nvvm_fabs_ftz_f`

Unfortunately, llvm fabs is not equivalent to these intrinsics since
llvm fabs is defined to only set the sign bit to zero while these can
also flush subnormal inputs and modify NaNs.

`nvvm_round_d`
`nvvm_round_f`
`nvvm_round_ftz_f`

llvm.nvvm.round uses RNI, while llvm.round codegens to RZI. LLVM defines
llvm.round to use the same rounding as libm
`round[f]()`, which is not necessary the same as how we define
llvm.nvvm.round.

`nvvm_sqrt_rn_f`
`nvvm_sqrt_rn_ftz_f`

sqrt may be lowered to a less precise version of sqrt, such as
sqrt.approx in NVPTX depending on factors such as the value of
-nvptx-prec-sqrtf32. These intrinsics should always become the
corresponding NVPTX instructions.

`nvvm_add_rn_d`
`nvvm_add_rn_f`
`nvvm_add_rn_ftz_f`
`nvvm_mul_rn_d`
`nvvm_mul_rn_f`
`nvvm_mul_rn_ftz_f`

These nvvm intrinsics have an explicitly specified rounding mode (.rn).
They should always be lowered to a PTX instruction with the same
explicit rounding mode. Converting to fmul and fadd instructions result
in the PTX instructions without rounding modes specified. This can cause
issue because:

> An add [or mul] instruction with no rounding modifier defaults to
round-to-nearest-even and may be optimized aggressively by the code
optimizer. In particular, mul/add sequences with no rounding modifiers
may be optimized to use fused-multiply-add instructions on the target
device.

`nvvm_div_rn_f`
`nvvm_div_rn_ftz_f`
`nvvm_rcp_rn_f`
`nvvm_rcp_rn_ftz_f`

fdiv may be lowered to a less precise version of div, such as div.full
in NVPTX depending on factors such as the value of -nvptx-prec-divf32.
These intrinsics should always become the corresponding NVPTX
instructions.