[PATCH] D49196: [llvm-mca][BtVer2] teach how to identify false dependencies on partially written registers.

Wed Jul 11 09:25:56 PDT 2018

andreadb created this revision.
andreadb added reviewers: RKSimon, spatel, courbet, mattd, craig.topper, lebedev.ri.
Herald added subscribers: gbedwell, tschuett.

The goal of this patch is to improve the throughput analysis in llvm-mca for the case where instructions the source code in input perform partial register writes.

On x86, partial register writes are quite difficult to model, mainly because different processors tend to implement different register merging schemes in hardware.

When the code contains partial register writes, the IPC (instructions per cycles) estimated by llvm-mca tends to diverge quite significantly from the observed IPC (using perf).

Modern AMD processors (at least, from Bulldozer onwards) don't rename partial registers. Quoting Agner Fog's microarchitecture.pdf:

  The processor always keeps the different parts of an integer register together. For example, AL and AH are not treated as independent by the out-of-order execution mechanism. An instruction that writes to part of a register will therefore have a false dependence on any previous write to the same register or any part of it.

This patch is a first important step towards improving the analysis of partial register updates.

This patch changes the semantic of RegisterFile descriptors in tablegen, and teaches llvm-mca how to identify false dependences in the presence of partial register writes (for more details: see the new code comments in include/Target/TargetSchedule.h - class RegisterFile).

This patch doesn't address the case where a write to a part of a register is followed by a read from the whole register.
On Intel chips, high8 registers (AH/BH/CH/DH)) can be stored in separate physical registers. However, a later (dirty) read of the full register (example: AX/EAX) triggers a merge uOp, which adds extra latency (and potentially affects the pipe usage).
This is a very interesting article about partial register writes on Intel chips here: https://stackoverflow.com/questions/45660139/how-exactly-do-partial-registers-on-haswell-skylake-perform-writing-al-seems-to

In future, the definition of RegisterFile can be extended with extra information that may be used to identify cases where a register read is slowed down by a merge of a partial write.

Please let me know if okay to commit.

-Andrea

https://reviews.llvm.org/D49196

Files:
  include/llvm/Target/TargetSchedule.td
  lib/Target/X86/X86ScheduleBtVer2.td
  test/tools/llvm-mca/X86/BtVer2/partial-reg-update-2.s
  test/tools/llvm-mca/X86/BtVer2/partial-reg-update-3.s
  test/tools/llvm-mca/X86/BtVer2/partial-reg-update-4.s
  test/tools/llvm-mca/X86/BtVer2/partial-reg-update-5.s
  test/tools/llvm-mca/X86/BtVer2/partial-reg-update-6.s
  tools/llvm-mca/Instruction.cpp
  tools/llvm-mca/Instruction.h
  tools/llvm-mca/RegisterFile.cpp
  tools/llvm-mca/RegisterFile.h

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D49196.155014.patch
Type: text/x-patch
Size: 34966 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20180711/519c310c/attachment.bin>