<html>
    <head>
      <base href="https://bugs.llvm.org/">
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - [llvm-exegesis] If the same instruction lies in multiple clusters, it's noise"
   href="https://bugs.llvm.org/show_bug.cgi?id=40715">40715</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>[llvm-exegesis] If the same instruction lies in multiple clusters, it's noise
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>tools
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>trunk
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>Linux
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>enhancement
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>llvm-exegesis
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>lebedev.ri@gmail.com
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>clement.courbet@gmail.com, gchatelet@google.com, llvm-bugs@lists.llvm.org
          </td>
        </tr></table>
      <p>
        <div>
        <pre>If you measure e.g. latency of some instruction N (>1) times, and feed
that into analysis mode, if the latency is non-identical,
you will get several different clusters with the same instruction.

It's not really actionable.

I think, something like this could work:
1. Go through all the clusters
   * record (in a map, with key being instruction ID, so it's an array)
     the cluster id's in which we have encountered this instruction.
2. Go through the map.
   * If the instruction was encountered in less than 2 clusters,
     skip to the next instruction.
   * Else go through the each cluster
     in which we have encountered this instruction:
       * duplicate the cluster ("by splitting it into two halves")
       * remove the instruction from original cluster
       * and only keep the instruction in the new cluster.
       * Mark new cluster as noise.
This way we will effectively hide such instructions from the report,
(because currently all non-valid clusters are not reported)
while not destroying the "good" part of the report.

This should also be the fastest way, only one linear scan through the clusters,
and one linear scan over the instruction map.

Since currently analysis only works with all the measurements
being of the same type, this is somewhat simple, for now.

Thoughts?</pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>