[PATCH] D63628: AMD K10 (Barcelona) Initial Scheduler model

Mon Jun 24 15:49:24 PDT 2019

lebedev.ri marked an inline comment as done.
lebedev.ri added a comment.

Thanks for taking a look!

In D63628#1555404 <https://reviews.llvm.org/D63628#1555404>, @andreadb wrote:

> Out of curiosity, did you investigate on why three benchmarks show a 6% slowdown?

Actually no, i didn't look at that as of this moment.
It doesn't //appear// to be noise, but those 4 tests are essentially covering
the same codepath, so the reason will be the same for all of them.

> According to Agner, the selection of floating point pipes done by the FPU is sub-optimal and can lead to bottlenecks that are difficult to predict.
>  ...
> 
> We don't model those hazards in the scheduling model.
>  I wonder if the sequence of instructions computed by the post-RA machine scheduler for those benchmarks incurred in one of those bottlenecks.

It's certainly not a FPU cluster problem since all those tests are integer (like i said, i didn't do through benchmark...)
What's the suggestion there, rebenchmarking with `let PostRAScheduler = 0;` ?

In D63628#1555515 <https://reviews.llvm.org/D63628#1555515>, @lebedev.ri wrote:

> In D63628#1555473 <https://reviews.llvm.org/D63628#1555473>, @RKSimon wrote:
>
> > I think I've commented on this before
>
>
> Oh? I don't remember any such feedback.. Good to know i guess.
>
> > - but why did you pre-commit the test/tools/llvm-mca/X86/Barcelona/* files since adding this model completely overwrites them?

Let me rephraze, should have i been aware of any such feedback i certainly would not have precommitted them,
but kept up with those struggles. Just for the sake of completeness, where did i miss it?

================
Comment at: lib/Target/X86/X86ScheduleBarcelona.td:286
+
+defm : BnWriteRes<WriteZero, [/*No ExePorts*/], 0, [], 0>; // FIXME
+
----------------
andreadb wrote:
> FIXME?
This one is just a leftover.

================
Comment at: lib/Target/X86/X86ScheduleBarcelona.td:305
+// to '1' to tell the scheduler that the nop uses an ALU slot for a cycle.
+defm : BnWriteResInt<WriteNop, BnInt012, [BnALU012], 0, [1], 1>; // FIXME
+
----------------
andreadb wrote:
> Maybe clarify what there is to FIX.
> Did you verify that NOPs consume an ALU slot?
It certainly does as per Agner's instruction_tables.pdf
There is certainly a hit of a `1` on `retired_uops`.
There isn't any `IssueCounters` for integer pipelines,
so there is no direct way to check integer unit consumption i'm afraid :(
That leaves us with trying to manually measure throughput
of some high-throughput instruction that can go to any int pipe,
in the presence of NOP, i suppose (i actually suggested that as way for
exegesis to indirectly measure unit consumption). I did not do that yet.

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63628/new/

https://reviews.llvm.org/D63628