[llvm] [AArch64][SVE] Use SVE for scalar FP converts in streaming[-compatible] functions (PR #112564)

Thu Oct 24 06:44:43 PDT 2024

================
@@ -28295,7 +28352,21 @@ SDValue AArch64TargetLowering::LowerToPredicatedOp(SDValue Op,
                                                    unsigned NewOp) const {
   EVT VT = Op.getValueType();
   SDLoc DL(Op);
-  auto Pg = getPredicateForVector(DAG, DL, VT);
+  SDValue Pg;
+
+  // FCVTZS_ZPmZ_DtoS and FCVTZU_ZPmZ_DtoS are special cases. These operations
+  // return nxv4i32 rather than the correct nxv2i32, as nxv2i32 is an illegal
+  // unpacked type. So, in this case, we take the predicate size from the
+  // operand.
+  SDValue LastOp{};
+  if ((NewOp == AArch64ISD::FCVTZU_MERGE_PASSTHRU ||
+       NewOp == AArch64ISD::FCVTZS_MERGE_PASSTHRU) &&
----------------
paulwalker-arm wrote:

This is why their definitions take `null_frag` for the `ir_op` because they are not amenable to the stock ISD nodes (which include the target specific ones that simply add predication).

You need to take a step back and look at the original selection failure when the hacks are removed.  I see:
```
LLVM ERROR: Cannot select: t15: nxv4i32 = AArch64ISD::FCVTZS_MERGE_PASSTHRU t13, t9, undef:nxv4i32
  t13: nxv4i1 = AArch64ISD::PTRUE TargetConstant:i32<31>
    t12: i32 = TargetConstant<31>
  t9: nxv2f64 = insert_vector_elt undef:nxv2f64, t2, Constant:i64<0>
    t8: nxv2f64 = undef
    t2: f64,ch = CopyFromReg t0, Register:f64 %0
      t1: f64 = Register %0
    t7: i64 = Constant<0>
  t14: nxv4i32 = undef
```

Which shows `t15` and `t9` having different elements counts, which means the DAG is malformed for the reason I highlight in `replaceScalarFPConversionWithSVE()`.  First get to the point where the DAG is correct and then let's see what, if any, section failures occur.

https://github.com/llvm/llvm-project/pull/112564