[PATCH] D86078: [AArch64] Improved lowering for saturating float to int.

Wed Sep 9 01:21:16 PDT 2020

ebevhan added a comment.

In D86078#2260930 <https://reviews.llvm.org/D86078#2260930>, @paulwalker-arm wrote:

> I would have experimented a little to understand better how things work, but I'm guessing the patch is based on other work because I couldn't some of the affected functions/nodes.

I forgot to add the parent revision of this to Phab, sorry. It's added now.

================
Comment at: llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:8069-8074
+    if (IsSigned || !NativeSaturating) {
+      // Clamp Src by MinFloat from below. If Src is NaN the result is MinFloat.
+      // If unsigned and natively saturating, we don't need to clamp, because
+      // the zero bound is independent of the saturation width.
+      Clamped = DAG.getNode(ISD::FMAXNUM, dl, SrcVT, Clamped, MinFloatNode);
+    }
----------------
paulwalker-arm wrote:
> Perhaps I've misread something but with this change it looks like a NaN input to FP_TO_UINT_SAT can result in `(unsigned int)MaxFloat` rather than the expected zero? because the following ISD::FMINNUM will choose the non-NaN input rather than propagating MinFloat like it did before (because the FMAXNUM has been omitted).
> 
> This goes against my previous suggestion but perhaps it will be cleaner to perform all the lowering within LowerFP_TO_INT_SAT as the process will be a uniform:
> ```
> // This handles native saturation and NaN -> 0 transformations.
> A = getNode(FP_TO_UINT // Letting LowerFP_TO_INT handle input promotion, rather than duplicate that handling.
> A = getNode(ISD::MIN, A, SAT_HI)
> if (isSigned)
>   A = getNode(ISD::MAX, A, SAT_LO)
> return A
> ```
> 
I suspect you may be right. Your suggested pattern might be cleaner.

Should I just drop the NativeSaturation parameter altogether and only handle the expansion explicitly in Lowering?

Should I add more types to Custom lowering of these nodes on AArch64?

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:3012-3015
+  // The native fp to int conversions are already saturating, lower to them.
+  unsigned Opcode = Op.getOpcode() == ISD::FP_TO_SINT_SAT
+    ? ISD::FP_TO_SINT : ISD::FP_TO_UINT;
+  return DAG.getNode(Opcode, dl, DstVT, Src);
----------------
paulwalker-arm wrote:
> The `DstVT.getScalarSizeInBits().getFixedSize() != SatWidth` case feels like something we should catch and transform as early as possible to enable other optimisation opportunities. Perhaps worth a DAGCombine using the same target TLI hook mentioned before.
What sort of combines do you have in mind here?

================
Comment at: llvm/test/CodeGen/AArch64/fptosi-sat-scalar.ll:80-90
 define i19 @test_signed_i19_f32(float %f) nounwind {
 ; CHECK-LABEL: test_signed_i19_f32:
 ; CHECK:       // %bb.0:
 ; CHECK-NEXT:    mov w8, #-931135488
 ; CHECK-NEXT:    mov w9, #65472
 ; CHECK-NEXT:    movk w9, #18559, lsl #16
 ; CHECK-NEXT:    fmov s1, w8
----------------
paulwalker-arm wrote:
> This test backs up the performance side of what I'm saying in expandFP_TO_INT_SAT.  Using the knowledge that the saturation value is exactly representable as a float is not the only thing you care about for AArch64.  You also care about "what's the most efficient way to perform the clamping operations".
> 
> Assuming (mainly because I haven't yet found documented proof) that a native FP_TO_INT operation on AArch64 implements the NaN->0 requirement, then I'm wondering if there's ever a good reason to perform the clamping at the input stage. Perhaps it's really a question of scalar vs vector with regards to the availability of MIN/MAX instructions.
So perhaps we should just skip the standard expansion for these cases and only expand to the pattern you specified there. It's probably not safe to do it for all cases if we can't guarantee the saturating behavior of other FP_TO_INT operations.

> Assuming (mainly because I haven't yet found documented proof) that a native FP_TO_INT operation on AArch64 implements the NaN->0 requirement,

That is interesting, I can't find it either. I simply adapted this from the previous patchset by nikic, andI presumed that that was correct.
Perhaps it's an undocumented behavior.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D86078/new/

https://reviews.llvm.org/D86078