[llvm] [AArch64][SVE] Use NEON for ISD::FP_ROUND cases (PR #171776)
David Sherwood via llvm-commits
llvm-commits at lists.llvm.org
Thu Dec 11 06:16:23 PST 2025
================
@@ -4776,6 +4777,36 @@ SDValue AArch64TargetLowering::LowerFP_ROUND(SDValue Op,
return getSVESafeBitCast(VT, Narrow, DAG);
}
+ // Split fp_rounds where VT == 128 bits and SrcVT == 256 bits,
+ // When the minimum SVE vector length is 256 bits, it is best to manually
+ // lower this with NEON for v8f16, and v8bf16 will crash without doing so as
+ // both types are legal and will not automatically split in legalization.
+ auto SplitConcat = [&](MVT DestTy, MVT HalfDestTy, MVT HalfSrcTy) {
+ SDValue Concat = Op->getOperand(0);
+ if (Concat.getOpcode() == ISD::CONCAT_VECTORS) {
+ SDValue ConcatOp0 = Concat.getOperand(0);
+ SDValue ConcatOp1 = Concat.getOperand(1);
+ SDLoc DL(Op);
+ SDValue L = DAG.getNode(ISD::FP_ROUND, DL, HalfDestTy, ConcatOp0,
+ Op->getOperand(1));
+ SDValue R = DAG.getNode(ISD::FP_ROUND, DL, HalfDestTy, ConcatOp1,
+ Op->getOperand(1));
+ return DAG.getNode(ISD::CONCAT_VECTORS, DL, DestTy, L, R);
+ }
+ return SDValue();
+ };
+
+ if (VT == MVT::v8bf16) {
+ if (SrcVT == MVT::v8f32 && Subtarget->hasBF16())
+ if (auto Split = SplitConcat(MVT::v8bf16, MVT::v4bf16, MVT::v4f32))
+ return Split;
+ // Anything else for v8bf16 is legal
+ return Op;
+ }
+ if (VT == MVT::v8f16 && SrcVT == MVT::v8f32)
----------------
david-arm wrote:
Actually I think you can avoid all of this code if instead in AArch64TargetLowering::AArch64TargetLowering you add MVT::v8bf16 to the list we already have:
```
// NOTE: Currently this has to happen after computeRegisterProperties rather
// than the preferred option of combining it with the addRegisterClass call.
if (Subtarget->useSVEForFixedLengthVectors()) {
...
for (auto VT : {MVT::v8f16, MVT::v4f32})
setOperationAction(ISD::FP_ROUND, VT, Custom);
```
I gave that a try and it seems to work. Essentially we only mark it as custom lowering if we're going to use SVE for fixed-length vectors, i.e. we know the SVE vector length >= 256 bits. In LowerFP_ROUND we then fall through to this code:
```
if (useSVEForFixedLengthVectorVT(SrcVT, !Subtarget->isNeonAvailable()))
return LowerFixedLengthFPRoundToSVE(Op, DAG);
```
The final code isn't any better than what you have in the current PR because we end up using splice+bfcvt+uzp1, but it does fix the immediate crashes without adding extra complexity for now. We can always refine this to produce more optimal codegen in a later patch if you prefer this approach?
https://github.com/llvm/llvm-project/pull/171776
More information about the llvm-commits
mailing list