[PATCH] D74156: [llvm-exegesis] Exploring X86::OperandType::OPERAND_COND_CODE

Wed Feb 26 05:22:10 PST 2020

gchatelet added a comment.

In D74156#1893192 <https://reviews.llvm.org/D74156#1893192>, @lebedev.ri wrote:

> In D74156#1893182 <https://reviews.llvm.org/D74156#1893182>, @gchatelet wrote:
>
> > In D74156#1893156 <https://reviews.llvm.org/D74156#1893156>, @lebedev.ri wrote:
> >
> > > Thank you for replying!
> >
> >
> > Sure, my apologies for the lag. I'm extra busy right now.
> >
> > > 
> > > 
> > > In D74156#1893104 <https://reviews.llvm.org/D74156#1893104>, @gchatelet wrote:
> > > 
> > >> In D74156#1873657 <https://reviews.llvm.org/D74156#1873657>, @lebedev.ri wrote:
> > >>
> > >> > In D74156#1869502 <https://reviews.llvm.org/D74156#1869502>, @gchatelet wrote:
> > >> >
> > >> > > In D74156#1864292 <https://reviews.llvm.org/D74156#1864292>, @lebedev.ri wrote:
> > >> > >
> > >> > > > In D74156#1864096 <https://reviews.llvm.org/D74156#1864096>, @gchatelet wrote:
> > >> > > >
> > >> > > > > - repetition mode (to see the impact of the decoder)
> > >> > > >
> > >> > > >
> > >> > > > This //appears// to be currently handled via `-repetition-mode` switch (D68125 <https://reviews.llvm.org/D68125>).
> > >> > >
> > >> > >
> > >> > > yes
> > >> > >
> > >> > > > Does this really have to be accounted for (a dimension in) in the greedy approach?
> > >> > >
> > >> > > @courbet and I believe that having two measures are better than one and can demonstrate the impact of decoding the instruction.
> > >> >
> > >> >
> > >> > Oh, now i see what you mean. At least for `CMOV`, i'm seeing wildly different results
> > >> >
> > >> > |           | Latency | RThroughput |
> > >> > | duplicate | 1       | 0.8         |
> > >> > | loop      | 2       | 0.6         |
> > >> > |
> > >> >
> > >> > where latency=1 seems correct, and i'd expect the througput to be close to 1/2 (since there are two execution units).
> > >> >
> > >> > So i would personally guess that `--repetition-mode=` shouldn't even be an another measurement,
> > >> >  but instead much like what is already done by running the whole snippet a few times
> > >> >  and denoising the result (not averaging!), the same should be done with this - run both, take minimum
> > >> >  https://github.com/llvm/llvm-project/blob/c55cf4afa9161bb4413b7ca9933d553327f5f069/llvm/tools/llvm-exegesis/lib/LatencyBenchmarkRunner.cpp#L38-L39
> > >> >
> > >> > What are you thoughts on this?
> > >>
> > >>
> > >> Well that depends on how you see `llvm-exegesis`. It was originally designed to understand the behavior of the CPU.
> > >>  To me `--repetition-mode` is interesting to explore sensitivity to instruction decoding.
> > >>  If you mix everything and take the minimum then you lose information on the root cause and then you deduce that the minimum latency is X but it only happens if the instruction is already decoded and run in a tight loop.
> > >>  I think it's important to keep this information separate.
> > > 
> > > 
> > > I do agree in principle.
> > > 
> > >> You can always take the minimum between the two during analysis though.
> > > 
> > > I think this has the same-ish basic issue as merging results from different `--mode=`s - what if there is more than one result?
> > >  Currently, we form `PerInstructionStats`, which accumulates min/max/averages, which is nice for when results end up being noisy.
> > >  How would we aggregate (via taking `min`) results from different `--repetition-mode=`s, without sacrificing that?
> > >  Pick all `--repetition-mode=duplicate` and all `--repetition-mode=loop` and zipper-merge each pair?
> >
> > Right now you can simply run the benchmark twice (with `loop` or `duplicate` repetition mode), concatenate the two outputs and run them through the analysis.
> >  We currently don't consider the flag as part of the instruction so it would be reflected into the `PerInstructionStats`, increasing the confidence interval if there's a discrepancy between the two.
>
>
> Right, indeed, which is why i'm asking this question in the first place :)
>  Because at least for me that is the opposite from the behavior i'd expect/want to have in
>  analysis mode - in that case one of repetition modes results in obviously-wrong(C) results,
>  so i'm asking if it would be reasonable to fixup that during analysis via such a zipper merge.

I fail to see what is the expected behavior from your point of view. What should the tool do according to you?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D74156/new/

https://reviews.llvm.org/D74156