[llvm] [AArch64] Fix SVE scalar fcopysign lowering without neon. (PR #129787)
David Green via llvm-commits
llvm-commits at lists.llvm.org
Wed Mar 5 05:21:58 PST 2025
================
@@ -66,32 +55,40 @@ define void @test_copysign_f16(ptr %ap, ptr %bp) {
define void @test_copysign_bf16(ptr %ap, ptr %bp) {
; SVE-LABEL: test_copysign_bf16:
; SVE: // %bb.0:
-; SVE-NEXT: adrp x8, .LCPI1_0
-; SVE-NEXT: ldr h1, [x0]
-; SVE-NEXT: ldr h2, [x1]
-; SVE-NEXT: ldr q0, [x8, :lo12:.LCPI1_0]
-; SVE-NEXT: adrp x8, .LCPI1_1
-; SVE-NEXT: ldr q4, [x8, :lo12:.LCPI1_1]
-; SVE-NEXT: mov z3.d, z0.d
-; SVE-NEXT: fmov s0, s1
-; SVE-NEXT: fmov s3, s2
-; SVE-NEXT: bif v0.16b, v3.16b, v4.16b
+; SVE-NEXT: sub sp, sp, #16
+; SVE-NEXT: .cfi_def_cfa_offset 16
+; SVE-NEXT: ldr h0, [x0]
+; SVE-NEXT: ldr h1, [x1]
+; SVE-NEXT: fmov w8, s0
+; SVE-NEXT: str h1, [sp, #12]
+; SVE-NEXT: ldrb w9, [sp, #13]
+; SVE-NEXT: and w8, w8, #0x7fff
+; SVE-NEXT: tst w9, #0x80
+; SVE-NEXT: fmov s0, w8
+; SVE-NEXT: eor w8, w8, #0x8000
+; SVE-NEXT: fmov s1, w8
+; SVE-NEXT: fcsel h0, h1, h0, ne
----------------
davemgreen wrote:
The combo of SVE and bf16 is not handled very well yet - it was failed to legalize when I tried using scalable vectors. The scalar version should always be safe, but it would be more efficient to use SVE instruction if they were available. The codegen should be be identical to the fp16 version, but I figured it wasn't worth bitcasting the types to fp16 and back, it sounded like a bit of a bodge. We can get the improved codegen once SVE bf16 is doing better.
https://github.com/llvm/llvm-project/pull/129787
More information about the llvm-commits
mailing list