[PATCH] D60000: [llvm-exegesis] Post-processing for chained instrs in latency mode (PR41275)

Wed Apr 3 15:04:35 PDT 2019

lebedev.ri marked an inline comment as done.
lebedev.ri added a comment.

In D60000#1452962 <https://reviews.llvm.org/D60000#1452962>, @gchatelet wrote:

> >> To me, a better approach would be to read all the experiments, create the dependency graph between the 2-instructions snippets and solve a system of equations to recover the per instruction latency, then use the analyzer on the result.
> > 
> > Can you explain that in a bit more detail? Something like
> > 
> >   lat(i_0) = m_0
> >   sum(lat(i_t)+lat(i_0)) = m_1
> >   lat(i_1) = m_2
> >   sum(lat(i_t)+lat(i_1)) = m_3
> >   ...
> >   lat(i_n) => ?
> >   sum(lat(i_t)+lat(i_n)) = m_n
> >   lat(i_t) => ?
> > 
> > 
> > Do you suggest to take known `lat(i_0)..lat(i_n)` from measurements too?
>
> Yes.
>  Measurement ought to be coherent for runs on the same CPU so with enough data the resulting linear system will be over constrained and is solvable using ordinary least square (https://en.wikipedia.org/wiki/Ordinary_least_squares).

Okay, sounds sane.
Intermediate issue to solve: creating all these 2-instr chained configs must then also
create configs to measure the params of that second instr (without going into endless loop).

>> How will that scheme will account for domain transfer delays?
> 
> You could associate a supplementary variable for pairs of instructions but this would need a lot of data to converge (way too many variables).
>  A simpler approach is to make sure that we don't generate domain transfer delays when generating the snippet, rejecting pairs of instructions for which it would occur.
>  Or annotate the results with information about domain transfer delays and deal with it in the post processing (adding variables for pairs in {int,vector,fp,store}² when we know such a transfer exist) this way we can recover the domain transfer delays as well.

Hmm, i don't mean to mock, but it sounds kinda hand-wavy/arbitrary.
We don't want to use latencies of instructions specified in scheduler profile, but at the same
time we are ok with expecting that the sched profile explains all the domain transfer delays.
They should probably also be variables, not hardcoded. But i admit i have not thought that part through.

In the same thoughtflow, would be great if it could try to magically deduce the actual Units (*not* just pressure distribution)

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D60000/new/

https://reviews.llvm.org/D60000