[PATCH] D63628: AMD K10 (Barcelona) Initial Scheduler model
Roman Lebedev via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Jun 24 15:49:24 PDT 2019
lebedev.ri marked an inline comment as done.
lebedev.ri added a comment.
Thanks for taking a look!
In D63628#1555404 <https://reviews.llvm.org/D63628#1555404>, @andreadb wrote:
> Out of curiosity, did you investigate on why three benchmarks show a 6% slowdown?
Actually no, i didn't look at that as of this moment.
It doesn't //appear// to be noise, but those 4 tests are essentially covering
the same codepath, so the reason will be the same for all of them.
> According to Agner, the selection of floating point pipes done by the FPU is sub-optimal and can lead to bottlenecks that are difficult to predict.
> ...
>
> We don't model those hazards in the scheduling model.
> I wonder if the sequence of instructions computed by the post-RA machine scheduler for those benchmarks incurred in one of those bottlenecks.
It's certainly not a FPU cluster problem since all those tests are integer (like i said, i didn't do through benchmark...)
What's the suggestion there, rebenchmarking with `let PostRAScheduler = 0;` ?
In D63628#1555515 <https://reviews.llvm.org/D63628#1555515>, @lebedev.ri wrote:
> In D63628#1555473 <https://reviews.llvm.org/D63628#1555473>, @RKSimon wrote:
>
> > I think I've commented on this before
>
>
> Oh? I don't remember any such feedback.. Good to know i guess.
>
> > - but why did you pre-commit the test/tools/llvm-mca/X86/Barcelona/* files since adding this model completely overwrites them?
Let me rephraze, should have i been aware of any such feedback i certainly would not have precommitted them,
but kept up with those struggles. Just for the sake of completeness, where did i miss it?
================
Comment at: lib/Target/X86/X86ScheduleBarcelona.td:286
+
+defm : BnWriteRes<WriteZero, [/*No ExePorts*/], 0, [], 0>; // FIXME
+
----------------
andreadb wrote:
> FIXME?
This one is just a leftover.
================
Comment at: lib/Target/X86/X86ScheduleBarcelona.td:305
+// to '1' to tell the scheduler that the nop uses an ALU slot for a cycle.
+defm : BnWriteResInt<WriteNop, BnInt012, [BnALU012], 0, [1], 1>; // FIXME
+
----------------
andreadb wrote:
> Maybe clarify what there is to FIX.
> Did you verify that NOPs consume an ALU slot?
It certainly does as per Agner's instruction_tables.pdf
There is certainly a hit of a `1` on `retired_uops`.
There isn't any `IssueCounters` for integer pipelines,
so there is no direct way to check integer unit consumption i'm afraid :(
That leaves us with trying to manually measure throughput
of some high-throughput instruction that can go to any int pipe,
in the presence of NOP, i suppose (i actually suggested that as way for
exegesis to indirectly measure unit consumption). I did not do that yet.
Repository:
rL LLVM
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D63628/new/
https://reviews.llvm.org/D63628
More information about the llvm-commits
mailing list