[PATCH] D154528: [AMDGPU][GlobalISel] Generate fast fp64-to-fp16 conversions in unsafe mode.

Fri Jul 7 03:40:33 PDT 2023

kosarev added inline comments.

================
Comment at: llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:17037-17040
+    // Avoid folding legal fp_rounds into non-legal ones.
+    if (LegalDAG &&
+        TLI.getOperationAction(ISD::FP_ROUND, VT) != TargetLowering::Legal)
+      return SDValue();
----------------
This doesn't seem to cause any test failures here upstream and eliminates the need for the True16 f16 = fp_round f64 pattern downstream.

Not sure if we want that in a separate patch or better keep here to provide some context. It looks problematic to give it a real test without having some True16 support.

================
Comment at: llvm/lib/Target/AMDGPU/VOP1Instructions.td:519-522
+def : GCNPat<
+    (f16 (fpround f64:$src)),
+    (V_CVT_F16_F32_e64 SRCMODS.NONE, (V_CVT_F32_F64_e64 SRCMODS.NONE, $src))
+>;
----------------
foad wrote:
> kosarev wrote:
> > arsenm wrote:
> > > kosarev wrote:
> > > > foad wrote:
> > > > > kosarev wrote:
> > > > > > foad wrote:
> > > > > > > kosarev wrote:
> > > > > > > > arsenm wrote:
> > > > > > > > > Should just let the expansion split them up, this is missing the source modifiers
> > > > > > > > Updated to match modifiers. The expansion would combine back to a single fpround f64, so doesn't work.
> > > > > > > "The expansion would combine back to a single fpround f64" - that sounds like the combine is broken then? It should respect legality.
> > > > > > It's trivial to explain it to GISel that f16 = fp_round f64 should be expanded while f16 = fp_round f32 is legal as it is. However, in SDAG, if I'm not wrong, we can only differentiate by the resulting type, and if we choose custom expansion for all to-f16 roundings, then SDAG will try to combine whatever they result in.
> > > > > > 
> > > > > > It's not an immediate problem for this patch because we always do FP_TO_FP16, because we don't have 16-bit registers here yet. For True16, however, if we want it be a chain of two FP_ROUNDs, the expansion will be combined back. But then if we need the pattern for the True16 case anyway, then I'm not sure doing special work in GlobalISel and for the non-16bit cases is worth it. Would appreciate your opinions.
> > > > > SITargetLowering::lowerFP_ROUND is already doing custom legalization for to-f16 fprounds. It expands f64-to-f16 into two steps, and leaves the others alone. I would hope that post-legalization combiners would not create new to-f16 fprounds, because generally after legalization you should not create any new nodes unless they are "Legal" - and this does not include "Custom".
> > > > Well, the two FP_ROUNDs we might be expanding into here //are// legal. But I'm not sure already-legal expansions should mean no combining attempts. Otherwise, why would we want any combinings after legalization.
> > > This is one of the consequences of the DAG's single-type legality checks. You need to consider both the source and dest types for legality
> > I probably misread Jay's comment. Not combining anything post-legalization if this results into a node for which getOperationAction() yields 'Custom' makes sense, yes,
> "Not combining anything post-legalization if this results into a node for which getOperationAction() yields 'Custom' makes sense" - right, that is my understanding of the desired behaviour. But I can't promise that all existing combines obey it.
This bit in DAGCombiner.cpp reads like the expectation was that combining legal SDAGs is not required to produce legal nodes:

```
    // If this combine is running after legalizing the DAG, re-legalize any
    // nodes pulled off the worklist.
    if (LegalDAG) {
      SmallSetVector<SDNode *, 16> UpdatedNodes;
      bool NIsValid = DAG.LegalizeOp(N, UpdatedNodes);

      for (SDNode *LN : UpdatedNodes)
        AddToWorklistWithUsers(LN);

      ...
```

But then we also have this snippet that clearly addresses the same kind of problem, so not a precedent.

```
    // Legalizing in AArch64TargetLowering::LowerCONCAT_VECTORS() and combining
    // here could cause an infinite loop. That legalizing happens when LegalDAG
    // is true and input of AArch64TargetLowering::LowerCONCAT_VECTORS() is
    // scalable.
    if (In.getOpcode() == ISD::CONCAT_VECTORS && In.hasOneUse() &&
        !(LegalDAG && In.getValueType().isScalableVector())) {
      ...
```

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D154528/new/

https://reviews.llvm.org/D154528