[PATCH] D80410: [WIP][SVE] Pass through dup(0) to zero-merging pseudos

Sander de Smalen via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed May 27 14:43:24 PDT 2020


sdesmalen marked an inline comment as done.
sdesmalen added inline comments.


================
Comment at: llvm/test/CodeGen/AArch64/sve-intrinsics-fp-arith-merging.ll:301
+}
+
+
----------------
cameron.mcinally wrote:
> I'm looking at some rough latency tables we've put together and it looks like the tied-reg MOVPRFX sequence is 1 cycle faster than the SEL sequence:
> 
> ```
> ; CHECK-NEXT:  movprfx z1.s, p0/z, z0.s
> ; CHECK-NEXT:  fsubr   z1.s, p0/m, z1.s, z0.s
> ; CHECK-NEXT:  mov     z0.d, z1.d
> ```
> 
> The vector MOV is faster than the DUP. And we burn the extra z1 register for both cases, so that's a wash.
> 
> That said, the MOVPRFX sequence we're generating actually looks like this:
> 
> ```
> ; CHECK-NEXT:  mov z1.s, #0
> ; CHECK-NEXT:  movprfx z1.s, p0/z, z0.s
> ; CHECK-NEXT:  fsubr   z1.s, p0/m, z1.s, z0.s
> ; CHECK-NEXT:  mov     z0.d, z1.d
> ```
> 
> where the DUP #0 is a dead instruction. It's proving pretty hard to get rid of the DUP at the MachineInstruction level though. Still looking...
> I'm looking at some rough latency tables we've put together and it looks like the tied-reg MOVPRFX sequence is 1 cycle faster than the SEL sequence:
Ah that's good to know. Always using the tied-operand constraint for the zeroing forms possibly makes the common cases slower though, because it forces the compiler to honour the constraints and avoids benefiting from the reverse instructions as the register allocator will already have done the work. All cases except this one don't need the dup+select and can use movprfx directly and make use of the commutative/reverse instructions to expand the pseudo.

> That said, the MOVPRFX sequence we're generating actually looks like this:
Is that with a different example than the one in this test?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D80410/new/

https://reviews.llvm.org/D80410





More information about the llvm-commits mailing list