[PATCH] D57300: [X86][BdVer2] Transfer delays from the integer to the floating point unit.

Roman Lebedev via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Feb 1 02:56:07 PST 2019


lebedev.ri added a comment.

In D57300#1380363 <https://reviews.llvm.org/D57300#1380363>, @andreadb wrote:

> Thanks for running that experiment. There is clearly an 8-10cy delay.
>
> Out of curiosity, do you get the same latency if the insertion/extract is at index $1 (I.e. not at index 0)?


I did, the results appear to be consistent:

  $ cat /tmp/snippet.s ; ./bin/llvm-exegesis -mode=latency -snippets-file=/tmp/snippet.s
  # LLVM-EXEGESIS-DEFREG EAX 0
  # LLVM-EXEGESIS-DEFREG XMM0 0
  # LLVM-EXEGESIS-DEFREG XMM1 0
  vpinsrb $1, %eax, %xmm0, %xmm1
  vpextrb $1, %xmm1, %eax
  Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-2b8c21.o
  ---
  mode:            latency
  key:             
    instructions:    
      - 'VPINSRBrr XMM1 XMM0 EAX i_0x1'
      - 'VPEXTRBrr EAX XMM1 i_0x1'
    config:          ''
    register_initial_values: 
      - 'EAX=0x0'
      - 'XMM0=0x0'
      - 'XMM1=0x0'
  cpu_name:        bdver2
  llvm_triple:     x86_64-unknown-linux-gnu
  num_repetitions: 10000
  measurements:    
    - { key: latency, value: 11.0372, per_snippet_value: 22.0744 }
  error:           ''
  info:            ''
  assembled_snippet: B8000000004883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F04244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F0C244883C410C4E37920C801C4E37914C801C4E37920C801C4E37914C801C4E37920C801C4E37914C801C4E37920C801C4E37914C801C4E37920C801C4E37914C801C4E37920C801C4E37914C801C4E37920C801C4E37914C801C4E37920C801C4E37914C801C3
  ...
  $ cat /tmp/snippet.s ; ./bin/llvm-exegesis -mode=latency -snippets-file=/tmp/snippet.s
  # LLVM-EXEGESIS-DEFREG EAX 0
  # LLVM-EXEGESIS-DEFREG XMM0 0
  # LLVM-EXEGESIS-DEFREG XMM1 0
  vpinsrb $0, %eax, %xmm0, %xmm1
  vpextrb $1, %xmm1, %eax
  Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-3b8c6f.o
  ---
  mode:            latency
  key:             
    instructions:    
      - 'VPINSRBrr XMM1 XMM0 EAX i_0x0'
      - 'VPEXTRBrr EAX XMM1 i_0x1'
    config:          ''
    register_initial_values: 
      - 'EAX=0x0'
      - 'XMM0=0x0'
      - 'XMM1=0x0'
  cpu_name:        bdver2
  llvm_triple:     x86_64-unknown-linux-gnu
  num_repetitions: 10000
  measurements:    
    - { key: latency, value: 11.0304, per_snippet_value: 22.0608 }
  error:           ''
  info:            ''
  assembled_snippet: B8000000004883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F04244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F0C244883C410C4E37920C800C4E37914C801C4E37920C800C4E37914C801C4E37920C800C4E37914C801C4E37920C800C4E37914C801C4E37920C800C4E37914C801C4E37920C800C4E37914C801C4E37920C800C4E37914C801C4E37920C800C4E37914C801C3
  ...
  $ cat /tmp/snippet.s ; ./bin/llvm-exegesis -mode=latency -snippets-file=/tmp/snippet.s
  # LLVM-EXEGESIS-DEFREG EAX 0
  # LLVM-EXEGESIS-DEFREG XMM0 0
  # LLVM-EXEGESIS-DEFREG XMM1 0
  vpinsrb $1, %eax, %xmm0, %xmm1
  vpextrb $0, %xmm1, %eax
  Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-5f6929.o
  ---
  mode:            latency
  key:             
    instructions:    
      - 'VPINSRBrr XMM1 XMM0 EAX i_0x1'
      - 'VPEXTRBrr EAX XMM1 i_0x0'
    config:          ''
    register_initial_values: 
      - 'EAX=0x0'
      - 'XMM0=0x0'
      - 'XMM1=0x0'
  cpu_name:        bdver2
  llvm_triple:     x86_64-unknown-linux-gnu
  num_repetitions: 10000
  measurements:    
    - { key: latency, value: 11.0333, per_snippet_value: 22.0666 }
  error:           ''
  info:            ''
  assembled_snippet: B8000000004883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F04244883C4104883EC10C7042400000000C744240400000000C744240800000000C744240C00000000C5FA6F0C244883C410C4E37920C801C4E37914C800C4E37920C801C4E37914C800C4E37920C801C4E37914C800C4E37920C801C4E37914C800C4E37920C801C4E37914C800C4E37920C801C4E37914C800C4E37920C801C4E37914C800C4E37920C801C4E37914C800C3
  ...



> That being said. I think this change is good, and it is consistent with the latency value defined for the WriteVecMoveToGpr and WriteVecMoveFromGpr.

I suspect `ReadFpu2Int` will too be introduced?

> So, LGTM

Thank you for the review.


Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D57300/new/

https://reviews.llvm.org/D57300





More information about the llvm-commits mailing list