[PATCH] D63628: AMD K10 (Barcelona) Initial Scheduler model

Mon Jun 24 05:33:28 PDT 2019

andreadb added a comment.

Out of curiosity, did you investigate on why three benchmarks show a 6% slowdown?

According to Agner, the selection of floating point pipes done by the FPU is sub-optimal and can lead to bottlenecks that are difficult to predict.
There is another important source of bottlenecks to keep into account. It looks like that there is only one result bus per execution unit.
That forces schedulers to artificially delay opcodes to avoid bus conflicts. This is especially true for code that mixes instructions with different latencies.
Agner describes this in section "Mixing instructions with different latency".

We don't model those hazards in the scheduling model.
I wonder if the sequence of instructions computed by the post-RA machine scheduler for those benchmarks incurred in one of those bottlenecks.

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63628/new/

https://reviews.llvm.org/D63628