[llvm] [InstSimplify] Optimize maximumnum and minimumnum (PR #139581)

Fri Aug 15 14:17:50 PDT 2025

Artem-B wrote:

> NVPTXTargetTransformInfo.cpp currently transforms nvvm_fmax.* into maxnum, but should really use the new maximumnum intrinsic instead to match the SNaN semantics of PTX max.

Can you elaborate, please?  AFAICT, https://docs.nvidia.com/cuda/parallel-thread-execution/#floating-point-instructions-max makes no promises about the instruction behavior  regarding sNan/qNan handling.

That said, unlike `maxnum` the instruction does appear to preserve signed zero order, so in that sense it does match `maximumnum` better.

In any case, the current handling of nvvm_fmax/fmin has been this way for a pretty long time now (https://github.com/llvm/llvm-project/commit/698c31b8db6f25a8b2dbc34e93dd04770e9016c4), and I'd be somewhat reluctant to change that.

We may want to introduce a new intrinsic to provide IEEE-compliant behavior. Even it it will be slower initially, we can live with that for now.

https://github.com/llvm/llvm-project/pull/139581