[PATCH] D74156: [llvm-exegesis] Exploring X86::OperandType::OPERAND_COND_CODE

Wed Feb 26 04:19:22 PST 2020

gchatelet added a comment.

In D74156#1873657 <https://reviews.llvm.org/D74156#1873657>, @lebedev.ri wrote:

> In D74156#1869502 <https://reviews.llvm.org/D74156#1869502>, @gchatelet wrote:
>
> > In D74156#1864292 <https://reviews.llvm.org/D74156#1864292>, @lebedev.ri wrote:
> >
> > > In D74156#1864096 <https://reviews.llvm.org/D74156#1864096>, @gchatelet wrote:
> > >
> > > > - repetition mode (to see the impact of the decoder)
> > >
> > >
> > > This //appears// to be currently handled via `-repetition-mode` switch (D68125 <https://reviews.llvm.org/D68125>).
> >
> >
> > yes
> >
> > > Does this really have to be accounted for (a dimension in) in the greedy approach?
> >
> > @courbet and I believe that having two measures are better than one and can demonstrate the impact of decoding the instruction.
>
>
> Oh, now i see what you mean. At least for `CMOV`, i'm seeing wildly different results
>
> |           | Latency | RThroughput |
> | duplicate | 1       | 0.8         |
> | loop      | 2       | 0.6         |
> |
>
> where latency=1 seems correct, and i'd expect the througput to be close to 1/2 (since there are two execution units).
>
> So i would personally guess that `--repetition-mode=` shouldn't even be an another measurement,
>  but instead much like what is already done by running the whole snippet a few times
>  and denoising the result (not averaging!), the same should be done with this - run both, take minimum
>  https://github.com/llvm/llvm-project/blob/c55cf4afa9161bb4413b7ca9933d553327f5f069/llvm/tools/llvm-exegesis/lib/LatencyBenchmarkRunner.cpp#L38-L39
>
> What are you thoughts on this?

Well that depends on how you see `llvm-exegesis`. It was originally designed to understand the behavior of the CPU.
To me `--repetition-mode` is interesting to explore sensitivity to instruction decoding.
If you mix everything and take the minimum then you lose information on the root cause and then you deduce that the minimum latency is X but it only happens if the instruction is already decoded and run in a tight loop.
I think it's important to keep this information separate. You can always take the minimum between the two during analysis though.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D74156/new/

https://reviews.llvm.org/D74156