[llvm] [AArch64][SVE] Use NEON for ISD::FP_ROUND cases (PR #171776)

Thu Dec 11 06:16:23 PST 2025

================
@@ -4776,6 +4777,36 @@ SDValue AArch64TargetLowering::LowerFP_ROUND(SDValue Op,
     return getSVESafeBitCast(VT, Narrow, DAG);
   }
 
+  // Split fp_rounds where VT == 128 bits and SrcVT == 256 bits,
+  // When the minimum SVE vector length is 256 bits, it is best to manually
+  // lower this with NEON for v8f16, and v8bf16 will crash without doing so as
+  // both types are legal and will not automatically split in legalization.
+  auto SplitConcat = [&](MVT DestTy, MVT HalfDestTy, MVT HalfSrcTy) {
+    SDValue Concat = Op->getOperand(0);
+    if (Concat.getOpcode() == ISD::CONCAT_VECTORS) {
+      SDValue ConcatOp0 = Concat.getOperand(0);
+      SDValue ConcatOp1 = Concat.getOperand(1);
+      SDLoc DL(Op);
+      SDValue L = DAG.getNode(ISD::FP_ROUND, DL, HalfDestTy, ConcatOp0,
+                              Op->getOperand(1));
+      SDValue R = DAG.getNode(ISD::FP_ROUND, DL, HalfDestTy, ConcatOp1,
+                              Op->getOperand(1));
+      return DAG.getNode(ISD::CONCAT_VECTORS, DL, DestTy, L, R);
+    }
+    return SDValue();
+  };
+
+  if (VT == MVT::v8bf16) {
+    if (SrcVT == MVT::v8f32 && Subtarget->hasBF16())
+      if (auto Split = SplitConcat(MVT::v8bf16, MVT::v4bf16, MVT::v4f32))
+        return Split;
+    // Anything else for v8bf16 is legal
+    return Op;
+  }
+  if (VT == MVT::v8f16 && SrcVT == MVT::v8f32)
----------------
david-arm wrote:

Actually I think you can avoid all of this code if instead in AArch64TargetLowering::AArch64TargetLowering you add MVT::v8bf16 to the list we already have:

```
    // NOTE: Currently this has to happen after computeRegisterProperties rather
    // than the preferred option of combining it with the addRegisterClass call.
    if (Subtarget->useSVEForFixedLengthVectors()) {
...
      for (auto VT : {MVT::v8f16, MVT::v4f32})
        setOperationAction(ISD::FP_ROUND, VT, Custom);
```

I gave that a try and it seems to work. Essentially we only mark it as custom lowering if we're going to use SVE for fixed-length vectors, i.e. we know the SVE vector length >= 256 bits. In LowerFP_ROUND we then fall through to this code:

```
  if (useSVEForFixedLengthVectorVT(SrcVT, !Subtarget->isNeonAvailable()))
    return LowerFixedLengthFPRoundToSVE(Op, DAG);
```

The final code isn't any better than what you have in the current PR because we end up using splice+bfcvt+uzp1, but it does fix the immediate crashes without adding extra complexity for now. We can always refine this to produce more optimal codegen in a later patch if you prefer this approach?

https://github.com/llvm/llvm-project/pull/171776