[llvm] [AArch64][SVE] Add custom lowering for bfloat FMUL (with +bf16) (PR #167502)

Mon Nov 17 09:41:15 PST 2025

================
@@ -514,61 +514,100 @@ define <vscale x 8 x bfloat> @fmla_nxv8bf16(<vscale x 8 x bfloat> %a, <vscale x
 ;
 
 define <vscale x 2 x bfloat> @fmul_nxv2bf16(<vscale x 2 x bfloat> %a, <vscale x 2 x bfloat> %b) {
-; NOB16B16-LABEL: fmul_nxv2bf16:
-; NOB16B16:       // %bb.0:
-; NOB16B16-NEXT:    lsl z1.s, z1.s, #16
-; NOB16B16-NEXT:    lsl z0.s, z0.s, #16
-; NOB16B16-NEXT:    ptrue p0.d
-; NOB16B16-NEXT:    fmul z0.s, p0/m, z0.s, z1.s
-; NOB16B16-NEXT:    bfcvt z0.h, p0/m, z0.s
-; NOB16B16-NEXT:    ret
+; NOB16B16-NONSTREAMING-LABEL: fmul_nxv2bf16:
+; NOB16B16-NONSTREAMING:       // %bb.0:
+; NOB16B16-NONSTREAMING-NEXT:    movi v2.2d, #0000000000000000
+; NOB16B16-NONSTREAMING-NEXT:    ptrue p0.d
+; NOB16B16-NONSTREAMING-NEXT:    bfmlalb z2.s, z0.h, z1.h
+; NOB16B16-NONSTREAMING-NEXT:    bfcvt z0.h, p0/m, z2.s
----------------
paulwalker-arm wrote:

This can introduce trapping behaviour that's not in the original code.  Our SVE support for trapping math is pretty loose but we do try to ensure not to introduce new sources of exceptions.  This was my mistake I referenced in an earlier comment (which I've fixed via https://github.com/llvm/llvm-project/pull/168387). For this PR the only problematic case is `nxv2bf16`, so we're not going to loose out much by omitting that case from the new lowering.

https://github.com/llvm/llvm-project/pull/167502