[PATCH] D66424: [X86][Btver2] Fix latency and throughput of CMPXCHG instructions.

Andrea Di Biagio via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Aug 19 12:21:31 PDT 2019


andreadb updated this revision to Diff 215965.
andreadb added a comment.

Patch updated.

CMPXCHG have been moved to resources-x86_64.s as requested by @RKSimon .

Added missing scheduling information for CMPXCHG8B and CMPXCHG16B.
CMPXCHG8B is 11cy and unfortunately doesn't seem to benefit from store-to-load forwarding. That means, throughput is clearly limited by the in/out dependency on GPR registers. The uOP composition is sadly unknown (due to the lack of PMCs for the Integer pipes). I have reused the same mix of consumed resource from the other CMPXCHG instructions for CMPXCHG8B too.

LOCK CMPXCHG8B  is instead 18cycles.

CMPXCHG16B is 32cycles. Up to 38cycles when the LOCK prefix is specified. Due to the in/out dependencies, throughput is limited to 1 instruction every 32 (or 38) cycles dependeing on whether the LOCK prefix is specified or not.
I wouldn't be surprised if the microcode for CMPXCHG16B is similar to 2x microcode from CMPXCHG8B. So, I have speculatively set the JALU01 consumption to 2x the resource cycles used for CMPXCHG8B.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D66424/new/

https://reviews.llvm.org/D66424

Files:
  lib/Target/X86/MCTargetDesc/X86MCTargetDesc.cpp
  lib/Target/X86/MCTargetDesc/X86MCTargetDesc.h
  lib/Target/X86/X86InstrInfo.h
  lib/Target/X86/X86SchedPredicates.td
  lib/Target/X86/X86ScheduleBtVer2.td
  test/tools/llvm-mca/X86/BtVer2/resources-cmpxchg.s
  test/tools/llvm-mca/X86/BtVer2/resources-x86_64.s

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D66424.215965.patch
Type: text/x-patch
Size: 17089 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20190819/eab9c29b/attachment.bin>


More information about the llvm-commits mailing list