[PATCH] D86078: [AArch64] Improved lowering for saturating float to int.

Tue Sep 8 07:27:08 PDT 2020

paulwalker-arm added a comment.

I'm afraid some of my comments are probably conflicting.  I would have experimented a little to understand better how things work, but I'm guessing the patch is based on other work because I couldn't some of the affected functions/nodes.

================
Comment at: llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:8069-8074
+    if (IsSigned || !NativeSaturating) {
+      // Clamp Src by MinFloat from below. If Src is NaN the result is MinFloat.
+      // If unsigned and natively saturating, we don't need to clamp, because
+      // the zero bound is independent of the saturation width.
+      Clamped = DAG.getNode(ISD::FMAXNUM, dl, SrcVT, Clamped, MinFloatNode);
+    }
----------------
Perhaps I've misread something but with this change it looks like a NaN input to FP_TO_UINT_SAT can result in `(unsigned int)MaxFloat` rather than the expected zero? because the following ISD::FMINNUM will choose the non-NaN input rather than propagating MinFloat like it did before (because the FMAXNUM has been omitted).

This goes against my previous suggestion but perhaps it will be cleaner to perform all the lowering within LowerFP_TO_INT_SAT as the process will be a uniform:
```
// This handles native saturation and NaN -> 0 transformations.
A = getNode(FP_TO_UINT // Letting LowerFP_TO_INT handle input promotion, rather than duplicate that handling.
A = getNode(ISD::MIN, A, SAT_HI)
if (isSigned)
  A = getNode(ISD::MAX, A, SAT_LO)
return A
```

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:3012-3015
+  // The native fp to int conversions are already saturating, lower to them.
+  unsigned Opcode = Op.getOpcode() == ISD::FP_TO_SINT_SAT
+    ? ISD::FP_TO_SINT : ISD::FP_TO_UINT;
+  return DAG.getNode(Opcode, dl, DstVT, Src);
----------------
The `DstVT.getScalarSizeInBits().getFixedSize() != SatWidth` case feels like something we should catch and transform as early as possible to enable other optimisation opportunities. Perhaps worth a DAGCombine using the same target TLI hook mentioned before.

================
Comment at: llvm/test/CodeGen/AArch64/fptosi-sat-scalar.ll:80-90
 define i19 @test_signed_i19_f32(float %f) nounwind {
 ; CHECK-LABEL: test_signed_i19_f32:
 ; CHECK:       // %bb.0:
 ; CHECK-NEXT:    mov w8, #-931135488
 ; CHECK-NEXT:    mov w9, #65472
 ; CHECK-NEXT:    movk w9, #18559, lsl #16
 ; CHECK-NEXT:    fmov s1, w8
----------------
This test backs up the performance side of what I'm saying in expandFP_TO_INT_SAT.  Using the knowledge that the saturation value is exactly representable as a float is not the only thing you care about for AArch64.  You also care about "what's the most efficient way to perform the clamping operations".

Assuming (mainly because I haven't yet found documented proof) that a native FP_TO_INT operation on AArch64 implements the NaN->0 requirement, then I'm wondering if there's ever a good reason to perform the clamping at the input stage. Perhaps it's really a question of scalar vs vector with regards to the availability of MIN/MAX instructions.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D86078/new/

https://reviews.llvm.org/D86078