[PATCH] D57300: [X86][BdVer2] Transfer delays from the integer to the floating point unit.

Andrea Di Biagio via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Jan 29 09:28:14 PST 2019


andreadb added a comment.

        2e:       41 bf 00 00 00 00       mov    $0x0,%r15d
        34:       c4 c3 41 20 ff 01       vpinsrb $0x1,%r15d,%xmm7,%xmm7
        3a:       c4 c3 41 20 ff 01       vpinsrb $0x1,%r15d,%xmm7,%xmm7
  ....
      ea88:       c4 c3 41 20 ff 01       vpinsrb $0x1,%r15d,%xmm7,%xmm7

If there is really a bypass delay, then that code snippet is not going to expose it.
The real bottleneck in that code is the dependency on %xmm7. R15 is only set once at the beginning by a zero-move, and then never updated again.

Even if there is a bypass delay is caused by the int-to-fpu transfer for the data in R15, that is not visible if we run that code snippet.
In this case, we have that every cycle the scheduler can execute a uOp that moves R15 to the FPU. If really there is a bypass delay, that is going to be hidden by the latency introduced by the long dependency chain on XMM7.
Basically, that code snippet is not good to measure those kinds of delays...


Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D57300/new/

https://reviews.llvm.org/D57300





More information about the llvm-commits mailing list