[PATCH] D80410: [WIP][SVE] Pass through dup(0) to zero-merging pseudos

Tue May 26 15:17:42 PDT 2020

cameron.mcinally added inline comments.

================
Comment at: llvm/test/CodeGen/AArch64/sve-intrinsics-fp-arith-merging.ll:301
+}
+
+
----------------
I'm looking at some rough latency tables we've put together and it looks like the tied-reg MOVPRFX sequence is 1 cycle faster than the SEL sequence:

```
; CHECK-NEXT:  movprfx z1.s, p0/z, z0.s
; CHECK-NEXT:  fsubr   z1.s, p0/m, z1.s, z0.s
; CHECK-NEXT:  mov     z0.d, z1.d
```

The vector MOV is faster than the DUP. And we burn the extra z1 register for both cases, so that's a wash.

That said, the MOVPRFX sequence we're generating actually looks like this:

```
; CHECK-NEXT:  mov z1.s, #0
; CHECK-NEXT:  movprfx z1.s, p0/z, z0.s
; CHECK-NEXT:  fsubr   z1.s, p0/m, z1.s, z0.s
; CHECK-NEXT:  mov     z0.d, z1.d
```

where the DUP #0 is a dead instruction. It's proving pretty hard to get rid of the DUP at the MachineInstruction level though. Still looking...

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D80410/new/

https://reviews.llvm.org/D80410