[PATCH] D80410: [WIP][SVE] Pass through dup(0) to zero-merging pseudos
    Cameron McInally via Phabricator via llvm-commits 
    llvm-commits at lists.llvm.org
       
    Tue May 26 15:17:42 PDT 2020
    
    
  
cameron.mcinally added inline comments.
================
Comment at: llvm/test/CodeGen/AArch64/sve-intrinsics-fp-arith-merging.ll:301
+}
+
+
----------------
I'm looking at some rough latency tables we've put together and it looks like the tied-reg MOVPRFX sequence is 1 cycle faster than the SEL sequence:
```
; CHECK-NEXT:  movprfx z1.s, p0/z, z0.s
; CHECK-NEXT:  fsubr   z1.s, p0/m, z1.s, z0.s
; CHECK-NEXT:  mov     z0.d, z1.d
```
The vector MOV is faster than the DUP. And we burn the extra z1 register for both cases, so that's a wash.
That said, the MOVPRFX sequence we're generating actually looks like this:
```
; CHECK-NEXT:  mov z1.s, #0
; CHECK-NEXT:  movprfx z1.s, p0/z, z0.s
; CHECK-NEXT:  fsubr   z1.s, p0/m, z1.s, z0.s
; CHECK-NEXT:  mov     z0.d, z1.d
```
where the DUP #0 is a dead instruction. It's proving pretty hard to get rid of the DUP at the MachineInstruction level though. Still looking...
CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D80410/new/
https://reviews.llvm.org/D80410
    
    
More information about the llvm-commits
mailing list