<html>

    <head>

      <base href="http://llvm.org/bugs/" />

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW --- - Information loss and performance issues in branch probability representation"

   href="http://llvm.org/bugs/show_bug.cgi?id=22718">22718</a>

          </td>

        </tr>

        <tr>

          <th>Summary</th>

          <td>Information loss and performance issues in branch probability representation

          </td>

        </tr>

        <tr>

          <th>Product</th>

          <td>libraries

          </td>

        </tr>

        <tr>

          <th>Version</th>

          <td>trunk

          </td>

        </tr>

        <tr>

          <th>Hardware</th>

          <td>PC

          </td>

        </tr>

        <tr>

          <th>OS</th>

          <td>All

          </td>

        </tr>

        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>

        <tr>

          <th>Severity</th>

          <td>normal

          </td>

        </tr>

        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>

        <tr>

          <th>Component</th>

          <td>Global Analyses

          </td>

        </tr>

        <tr>

          <th>Assignee</th>

          <td>dnovillo@google.com

          </td>

        </tr>

        <tr>

          <th>Reporter</th>

          <td>dnovillo@google.com

          </td>

        </tr>

        <tr>

          <th>CC</th>

          <td>davidxl@google.com, llvmbugs@cs.uiuc.edu

          </td>

        </tr>

        <tr>

          <th>Classification</th>

          <td>Unclassified

          </td>

        </tr></table>

      <p>

        <div>

        <pre>BranchProbabilityInfo, MachineBranchProbabilityInfo mimic the raw profile data

representation and represents branch probability using non-normalized weights.

We have found some issues where this representation loses information.

Scaling and capping of branch weights may lead to distorted probability for

very hot branches:

 - The weight for each branch is capped at UINT32_MAX/number_successors. The

max weight limit imposed is too small for long-running applications -- hot

branches can easily hit the cap.

 - When any branch weight hits the limit, the implementation does not actually

do any scaling on weights of other branches. For hot branches, this can cause a

highly biased branch to become non-biased. Cascading bad effects will also

occur in the frequency propagation stage due to the capping.

This happens in BranchProbabilityInfo::calcMetadataWeights, as well as

MachineBranchProbabilityInfo::getSumForBlock(const MachineBasicBlock *MBB,

uint32_t &Scale) const

Additionally, the API for getting branch probabilities has some performance

issues:

For many clients where getting relative branch hotness is enough, the client

can simply use getEdgeWeight interface. However if the client needs to know the

actual branch probability, getEdgeProbability is needed. To compute edge

probability, the implementation needs to compute getSumForBlock which needs to

walk through all successors. This has O(N^2) behavior when we need to compute

branch prob for each out going cfg edge.

Instead of representing branch probabilities using non-normalized weights, we

could represent probabilities directly using a fixed point representation with

scaling (between 0 and 1). The benefits would be:

1) conceptually simple. This would allow us to not rely on weight.

2) clients using getEdgeWeight are not affected. The weight can be derived from

getEdgeProb

3) The implementation for getEdgeProb becomes faster.

Finally, we could change the DenseMap hash key for edge probabilities.

Currently, edge probabilities are stored in a map using Pair<BasicBlock*,

SuccessIndex> as the key. Retrieving branch probability information is slow, as

it needs to walk through successors, find the match, and then use the successor

index to do the lookup. A solution that used pair<BasicBlock*, BasicBlock*> as

the key would avoid that.</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>