[PATCH] D52779: AMD BdVer2 (Piledriver) Initial Scheduler model

Tue Oct 2 23:57:06 PDT 2018

courbet added a comment.

In https://reviews.llvm.org/D52779#1252526, @lebedev.ri wrote:

> In https://reviews.llvm.org/D52779#1252491, @courbet wrote:
>
> > This is awesome !
>
>
> Thank you!
>
> In https://reviews.llvm.org/D52779#1252491, @courbet wrote:
>
> > In https://reviews.llvm.org/D52779#1252458, @lebedev.ri wrote:
> >
> > > - *Many* of the inconsistencies are noise
> > > - fp measurements are flaky
> > > - Non-fp measurements are somewhat flaky too
> >
> >
> > Part of the flakiness can be explained by zero idioms: since llvm-exegesis explores register allocation randomly, it will hit some zero idioms by chance (and you've hit it e.g. for SUB32rr). Analysis still does not handle variant classes (PR38884), and we need better highlighting of instances where mcinst predicates were true. I'm currently working on this.
> >
> > I'll have a look at the inconsistencies to see if I can see anything else that might be an analysis issue.
>
>
> Part of, sure.
>  I do some measurement 3x 10'000 repetitions, and get 3cycles latency all three times, then repeat that and get 3cycles one time and 4cycles two times.

You'll be getting different register allocations each time. This will impact two things: some instructions might have special paths for some combinations of registers (we all know about xor eax, eax, be there are more surprising ones, see the original llvm-exegesis RFC <https://docs.google.com/document/d/1QidaJMJUyQdRrFKD66vE1_N55whe0coQ3h1GpFzz27M/edit#bookmark=kix.q6a0imw9qn1n> for an example).
gchatelet@ is working on autodetecting these by exploring the allocation space (both registers to operands and values to registers and immediates).

> Doing some measurement 10x 10'000 times vs 1x 1'000'000 times sometimes produces different results (sometimes, very different), too.

Do you have the mnemonics for these ? I would expect this to happen for instructions whose latency depends on the values in the registers, where repeating execution leads to changing the value, and therefore the latency. This happens for e.g. SQRT or FMUL.

> For fp, as discussed previously elsewhere, it is caused by nan/inf/subnormals/etc.
>  But i get the same flakiness for non-floats, too, so they may be somewhat affected by the same problem.

In https://reviews.llvm.org/D52779#1252526, @lebedev.ri wrote:

> In https://reviews.llvm.org/D52779#1252491, @courbet wrote:
>
> > This is awesome !
>
>
> Thank you!
>
> In https://reviews.llvm.org/D52779#1252491, @courbet wrote:
>
> > In https://reviews.llvm.org/D52779#1252458, @lebedev.ri wrote:
> >
> > > - *Many* of the inconsistencies are noise
> > > - fp measurements are flaky
> > > - Non-fp measurements are somewhat flaky too
> >
> >
> > Part of the flakiness can be explained by zero idioms: since llvm-exegesis explores register allocation randomly, it will hit some zero idioms by chance (and you've hit it e.g. for SUB32rr). Analysis still does not handle variant classes (PR38884), and we need better highlighting of instances where mcinst predicates were true. I'm currently working on this.
> >
> > I'll have a look at the inconsistencies to see if I can see anything else that might be an analysis issue.
>
>
> Part of, sure.
>  I do some measurement 3x 10'000 repetitions, and get 3cycles latency all three times, then repeat that and get 3cycles one time and 4cycles two times.
>  Doing some measurement 10x 10'000 times vs 1x 1'000'000 times sometimes produces different results (sometimes, very different), too.
>  For fp, as discussed previously elsewhere, it is caused by nan/inf/subnormals/etc.
>  But i get the same flakiness for non-floats, too, so they may be somewhat affected by the same problem.

Repository:
  rL LLVM

https://reviews.llvm.org/D52779