[PATCH] D57300: [X86][BdVer2] Transfer delays from the integer to the floating point unit.

Roman Lebedev via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Jan 29 08:45:34 PST 2019


lebedev.ri added a comment.

In D57300#1375180 <https://reviews.llvm.org/D57300#1375180>, @andreadb wrote:

> > I'm unable to find this number in the "AMD SOG for family 15h".
> >  llvm-exegesis measures the latencies of these instructions as 2,
> >  which matches the latencies specified in "AMD SOG for family 15h".
>
> Can you print out the code snippet used by llvm-exegesis to measure that latency?


Sure, i should have done that.

  $ ./bin/llvm-exegesis -mode=latency -opcode-name=VPINSRBrr
  Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-639c95.o
  ---
  mode:            latency
  key:             
    instructions:    
      - 'VPINSRBrr XMM7 XMM7 R15D i_0x1'
    config:          ''
    register_initial_values: 
      - 'XMM7=0x0'
      - 'R15D=0x0'
  cpu_name:        bdver2
  llvm_triple:     x86_64-unknown-linux-gnu
  num_repetitions: 10000
  measurements:    
    - { key: latency, value: 2.0296, per_snippet_value: 2.0296 }
  error:           ''
  info:            Repeating a single explicitly serial instruction
  assembled_snippet: 41574883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F3C244883C41041BF00000000C4C34120FF01C4C34120FF01C4C34120FF01C4C34120FF01C4C34120FF01C4C34120FF01C4C34120FF01C4C34120FF01C4C34120FF01C4C34120FF01C4C34120FF01C4C34120FF01C4C34120FF01C4C34120FF01C4C34120FF01C4C34120FF01415FC3

F7866621: snippet-639c95.o <https://reviews.llvm.org/F7866621>

  $ /usr/bin/objdump -d /tmp/snippet-639c95.o 
  
  /tmp/snippet-639c95.o:     file format elf64-x86-64
  
  
  Disassembly of section .text:
  
  0000000000000000 <foo>:
         0:       41 57                   push   %r15
         2:       48 83 ec 10             sub    $0x10,%rsp
         6:       c7 04 24 00 00 00 00    movl   $0x0,(%rsp)
         d:       c7 44 24 04 00 00 00    movl   $0x0,0x4(%rsp)
        14:       00 
        15:       c7 44 24 08 00 00 00    movl   $0x0,0x8(%rsp)
        1c:       00 
        1d:       c7 44 24 0c 00 00 00    movl   $0x0,0xc(%rsp)
        24:       00 
        25:       c5 fa 6f 3c 24          vmovdqu (%rsp),%xmm7
        2a:       48 83 c4 10             add    $0x10,%rsp
        2e:       41 bf 00 00 00 00       mov    $0x0,%r15d
        34:       c4 c3 41 20 ff 01       vpinsrb $0x1,%r15d,%xmm7,%xmm7
        3a:       c4 c3 41 20 ff 01       vpinsrb $0x1,%r15d,%xmm7,%xmm7
  ....
      ea88:       c4 c3 41 20 ff 01       vpinsrb $0x1,%r15d,%xmm7,%xmm7
      ea8e:       c4 c3 41 20 ff 01       vpinsrb $0x1,%r15d,%xmm7,%xmm7
      ea94:       41 5f                   pop    %r15
      ea96:       c3                      retq   

I have additionally tried measuring the actual MCA snippets:

  $ cat /tmp/snippet.s
  # LLVM-EXEGESIS-LIVEIN EAX
  # LLVM-EXEGESIS-LIVEIN XMM0
  vpinsrb $0, %eax, %xmm0, %xmm0
  vpinsrb $1, %eax, %xmm0, %xmm0
  $ ./bin/llvm-exegesis -mode=latency -snippets-file=/tmp/snippet.s
  Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-67eb60.o
  ---
  mode:            latency
  key:             
    instructions:    
      - 'VPINSRBrr XMM0 XMM0 EAX i_0x0'
      - 'VPINSRBrr XMM0 XMM0 EAX i_0x1'
    config:          ''
    register_initial_values: []
  cpu_name:        bdver2
  llvm_triple:     x86_64-unknown-linux-gnu
  num_repetitions: 10000
  measurements:    
    - { key: latency, value: 2.0317, per_snippet_value: 4.0634 }
  error:           ''
  info:            ''
  assembled_snippet: C4E37920C000C4E37920C001C4E37920C000C4E37920C001C4E37920C000C4E37920C001C4E37920C000C4E37920C001C4E37920C000C4E37920C001C4E37920C000C4E37920C001C4E37920C000C4E37920C001C4E37920C000C4E37920C001C3
  ...

and

  $ cat /tmp/snippet.s
  # LLVM-EXEGESIS-LIVEIN EAX
  # LLVM-EXEGESIS-LIVEIN XMM0
  add %eax, %eax
  vpinsrb $0, %eax, %xmm0, %xmm0
  vpinsrb $1, %eax, %xmm0, %xmm0
  $ ./bin/llvm-exegesis -mode=latency -snippets-file=/tmp/snippet.s
  Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-562e79.o
  ---
  mode:            latency
  key:             
    instructions:    
      - 'ADD32rr EAX EAX EAX'
      - 'VPINSRBrr XMM0 XMM0 EAX i_0x0'
      - 'VPINSRBrr XMM0 XMM0 EAX i_0x1'
    config:          ''
    register_initial_values: []
  cpu_name:        bdver2
  llvm_triple:     x86_64-unknown-linux-gnu
  num_repetitions: 10000
  measurements:    
    - { key: latency, value: 1.4034, per_snippet_value: 4.2102 }
  error:           ''
  info:            ''
  assembled_snippet: 01C0C4E37920C000C4E37920C00101C0C4E37920C000C4E37920C00101C0C4E37920C000C4E37920C00101C0C4E37920C000C4E37920C00101C0C4E37920C000C4E37920C00101C0C3

Am i holding llvm-exegesis wrong, or does this mean the info in Agner incorrect here?


Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D57300/new/

https://reviews.llvm.org/D57300





More information about the llvm-commits mailing list