[PATCH] D57300: [X86][BdVer2] Transfer delays from the integer to the floating point unit.
Roman Lebedev via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Jan 30 02:35:10 PST 2019
lebedev.ri added a comment.
In D57300#1375602 <https://reviews.llvm.org/D57300#1375602>, @andreadb wrote:
> 2e: 41 bf 00 00 00 00 mov $0x0,%r15d
> 34: c4 c3 41 20 ff 01 vpinsrb $0x1,%r15d,%xmm7,%xmm7
> 3a: c4 c3 41 20 ff 01 vpinsrb $0x1,%r15d,%xmm7,%xmm7
> ....
> ea88: c4 c3 41 20 ff 01 vpinsrb $0x1,%r15d,%xmm7,%xmm7
>
>
> If there is really a bypass delay, then that code snippet is not going to expose it.
> The real bottleneck in that code snippet is the dependency on %xmm7. R15 is only set once at the beginning by a zero-move, and then never updated again.
>
> In this case, we have that each cycle the scheduler issues a uOp to moves R15 to the FPU. However, the vpinsrd can only be issued every other cycle due to the dependency on XMM7. That means, in the long run, any bypass delay is going to be hidden by the latency caused by the data dependency on XMM7.
> Basically, that code snippet is not good to measure those kinds of delays...
Very nice observation.
Let's //try// something better.
$ cat /tmp/snippet.s ; ./bin/llvm-exegesis -mode=latency -snippets-file=/tmp/snippet.s
# LLVM-EXEGESIS-DEFREG EAX 0
# LLVM-EXEGESIS-DEFREG XMM0 0
# LLVM-EXEGESIS-DEFREG XMM1 0
vpinsrb $0, %eax, %xmm0, %xmm1
vpextrb $0, %xmm1, %eax
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-a71a33.o
---
mode: latency
key:
instructions:
- 'VPINSRBrr XMM1 XMM0 EAX i_0x0'
- 'VPEXTRBrr EAX XMM1 i_0x0'
config: ''
register_initial_values:
- 'EAX=0x0'
- 'XMM0=0x0'
- 'XMM1=0x0'
cpu_name: bdver2
llvm_triple: x86_64-unknown-linux-gnu
num_repetitions: 10000
measurements:
- { key: latency, value: 11.0282, per_snippet_value: 22.0564 }
error: ''
info: ''
assembled_snippet: B8000000004883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F04244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F0C244883C410C4E37920C800C4E37914C800C4E37920C800C4E37914C800C4E37920C800C4E37914C800C4E37920C800C4E37914C800C4E37920C800C4E37914C800C4E37920C800C4E37914C800C4E37920C800C4E37914C800C4E37920C800C4E37914C800C3
...
$ /usr/bin/objdump -d /tmp/snippet-a71a33.o
/tmp/snippet-a71a33.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <foo>:
0: b8 00 00 00 00 mov $0x0,%eax
5: 48 83 ec 10 sub $0x10,%rsp
9: c7 04 24 00 00 00 00 movl $0x0,(%rsp)
10: c7 44 24 04 00 00 00 movl $0x0,0x4(%rsp)
17: 00
18: c7 44 24 08 00 00 00 movl $0x0,0x8(%rsp)
1f: 00
20: c7 44 24 0c 00 00 00 movl $0x0,0xc(%rsp)
27: 00
28: c5 fa 6f 04 24 vmovdqu (%rsp),%xmm0
2d: 48 83 c4 10 add $0x10,%rsp
31: 48 83 ec 10 sub $0x10,%rsp
35: c7 04 24 00 00 00 00 movl $0x0,(%rsp)
3c: c7 44 24 04 00 00 00 movl $0x0,0x4(%rsp)
43: 00
44: c7 44 24 08 00 00 00 movl $0x0,0x8(%rsp)
4b: 00
4c: c7 44 24 0c 00 00 00 movl $0x0,0xc(%rsp)
53: 00
54: c5 fa 6f 0c 24 vmovdqu (%rsp),%xmm1
59: 48 83 c4 10 add $0x10,%rsp
5d: c4 e3 79 20 c8 00 vpinsrb $0x0,%eax,%xmm0,%xmm1
63: c4 e3 79 14 c8 00 vpextrb $0x0,%xmm1,%eax
...
eab1: c4 e3 79 20 c8 00 vpinsrb $0x0,%eax,%xmm0,%xmm1
eab7: c4 e3 79 14 c8 00 vpextrb $0x0,%xmm1,%eax
eabd: c3 retq
Though i suppose that still have the dependency on `xmm1`.
Repository:
rL LLVM
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D57300/new/
https://reviews.llvm.org/D57300
More information about the llvm-commits
mailing list