[PATCH] D59539: [llvm-exegesis] Option to lobotomize dbscan (PR40880)

Tue Mar 26 02:52:44 PDT 2019

lebedev.ri added a comment.

In D59539#1442729 <https://reviews.llvm.org/D59539#1442729>, @courbet wrote:

> > To reword: because if i do simple clustering by opcode, i will then need to add yet another
> >  "stabilization" step - for each cluster, check that every measurement is neighbor of all
> >  the other points in that cluster, and if they are not, mark cluster as noise.
> >  (well, not every vs. every, just the lower/upper triangle excluding diagonal)
>
> OK I see, thanks. To sum up my understanding: There are some areas where two clusters that should be separate are so noisy that there is a dense region connecting the two clusters, so even taking a small epsilon will not separate them. You want to reject these merged clusters based on the variance of the points within the cluster.

There are two situations, as far as i can tell:
(also, i'm only looking at the case with only a single dimension - latency/uops/rthrouthput, not a combination of measurements.)

1. Let's suppose we have measurements 0.5, 1.0, 1.5. If they are all from the same opcode, they will currently be put into the same cluster. This is unwanted (at least for me)
2. If you have measurements: 0.5(opcode a), 3.5(opcode a), they will be put into different clusters, which is, while correct, also not quite wanted, because they are from the same opcode. That should be "unstable" cluster. (it is unspecified why that happened, could be noisy measurements, could be cpu pipeline quirks, could be register fastpath, could be dependent on the reg values, etc etc)

The second issue standalone i have resolved with D58355 <https://reviews.llvm.org/D58355>, but the first issue remains.
So i'm trying to solve the first issue, without regressing the second issue.

> One suggestion I have is to compute the variance within the cluster (this can be done incrementally when adding points to the cluster) and reject clusters where the variance is more than a certain threshold. What do you think ?
> 
>> I can do that instead, maybe that would even better than this (no dependency on measurement ordering).
> 
> Yes, I would really like to avoid the dependence on the ordering.

Okay, i will try the "cluster by opcode + stabilize" approach, thanks!

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59539/new/

https://reviews.llvm.org/D59539