[PATCH] D137655: Expand fminimum/fmaximum into fminnum/fmaxnum + NaN check

Artem Belevich via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Nov 10 10:33:41 PST 2022


tra added a comment.

> We do not have an instruction for this in PTX prior to SM 8.0,

I assume that we're talking about `min.nan.*/man.nan.*` instruction variants that appeared in PTX7.0 on sm80+.
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#floating-point-instructions-max
docs.nvidia.com/cuda/parallel-thread-execution/index.html#half-precision-floating-point-instructions-max

Looks like we do not properly constrain instruction availability in llvm/lib/Target/NVPTX/NVPTXInstrInfo.td 
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/NVPTX/NVPTXInstrInfo.td#L940
There are no predicates on sm80+ or ptx70 and we sort of rely on custom lowering, and not always correctly as things stand:
https://godbolt.org/z/G8vYb5ajT

This patch should help generating correct instructions for fp64.

Still, I think we need to fix instruction definitions to correctly reflect their availability.

On a side note, we may want to add some min/max correctness tests to CUDA tests in llvm test-suite. Considering that we have different lowering on different GPUs, we do want to make sure that we actually do consistently get the results we expect across different GPUs and CUDA versions. We currently do not have any sm80 GPUs on cuda buildbots, but we'll get them eventually.



================
Comment at: llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp:615
   // max.f16, max.f16x2 and max.NaN are supported on sm_80+.
   auto GetMinMaxAction = [&](LegalizeAction NotSm80Action) {
     bool IsAtLeastSm80 = STI.getSmVersion() >= 80 && STI.getPTXVersion() >= 70;
----------------
I think this should be refactored into a more generic `GetGPUAction(sm, ptx, ifAvailableAction, FallbackAction)`.

This would make it clear which actions we take and why. `GetMinMax` action just says 'magic'.




================
Comment at: llvm/test/CodeGen/NVPTX/fminimum-fmaximum.ll:65
+  ; CHECK-DAG: setp.nan.f64
+  ; CHECK-DAG: selp.f64
+  %x = call double @llvm.minimum.f64(double %a, double %b)
----------------
What exactly do we end up generating here?

If both `setp` and `min`the  are inputs for `selp` then `-DAG` should be removed from `selp`


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D137655/new/

https://reviews.llvm.org/D137655



More information about the llvm-commits mailing list