[PATCH] Fix information loss in branch probability computation.

Wed May 6 16:18:41 PDT 2015

> On 2015 May 6, at 19:00, Diego Novillo <dnovillo at google.com> wrote:
> 
> This modifies the  scaling to do it based on the sum of all branch weights. If
> the sum of weights exceeds 32 bits, compute an integer scale to bring all the
> weights within the range.
> 
> Additionally, remove the use of scaled integers. Use 64 bit integers for
> all the calculations.
> 
> Adjust tests and an assertion in switch lowering. The assertion was
> using the previous branch weight limits.
> 
> 
> http://reviews.llvm.org/D9442
> 
> Files:
>  lib/Analysis/BranchProbabilityInfo.cpp
>  lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
>  test/Analysis/BranchProbabilityInfo/pr22718.ll
>  test/CodeGen/X86/MachineBranchProb.ll
> 
> EMAIL PREFERENCES
>  http://reviews.llvm.org/settings/panel/emailpreferences/
> <D9442.25103.patch>
> Index: lib/Analysis/BranchProbabilityInfo.cpp
> ===================================================================
> --- lib/Analysis/BranchProbabilityInfo.cpp
> +++ lib/Analysis/BranchProbabilityInfo.cpp
> @@ -21,6 +21,7 @@
>  #include "llvm/IR/LLVMContext.h"
>  #include "llvm/IR/Metadata.h"
>  #include "llvm/Support/Debug.h"
> +#include "llvm/Support/ScaledNumber.h"
>  #include "llvm/Support/raw_ostream.h"
>  
>  using namespace llvm;
> @@ -115,11 +116,6 @@
>  // Minimum weight of an edge. Please note, that weight is NEVER 0.
>  static const uint32_t MIN_WEIGHT = 1;
>  
> -static uint32_t getMaxWeightFor(BasicBlock *BB) {
> -  return UINT32_MAX / BB->getTerminator()->getNumSuccessors();
> -}
> -
> -
>  /// \brief Calculate edge weights for successors lead to unreachable.
>  ///
>  /// Predict that a successor which leads necessarily to an
> @@ -185,27 +181,45 @@
>    if (!WeightsNode)
>      return false;
>  
> +  // Check that the number of successors is manageable.
> +  assert(TI->getNumSuccessors() < UINT32_MAX && "Too many successors");
> +
>    // Ensure there are weights for all of the successors. Note that the first
>    // operand to the metadata node is a name, not a weight.
>    if (WeightsNode->getNumOperands() != TI->getNumSuccessors() + 1)
>      return false;
>  
> -  // Build up the final weights that will be used in a temporary buffer, but
> -  // don't add them until all weights are present. Each weight value is clamped
> -  // to [1, getMaxWeightFor(BB)].
> -  uint32_t WeightLimit = getMaxWeightFor(BB);
> +  // Build up the final weights that will be used in a temporary buffer.
> +  // Compute the sum of all weights to later decide whether they need to
> +  // be scaled to fit in 32 bits.
> +  uint64_t WeightSum = 0;
>    SmallVector<uint32_t, 2> Weights;
>    Weights.reserve(TI->getNumSuccessors());
>    for (unsigned i = 1, e = WeightsNode->getNumOperands(); i != e; ++i) {
>      ConstantInt *Weight =
>          mdconst::dyn_extract<ConstantInt>(WeightsNode->getOperand(i));
>      if (!Weight)
>        return false;
> -    Weights.push_back(Weight->getLimitedValue(WeightLimit));
> +    assert(Weight->getValue().getActiveBits() <= 32 &&
> +           "Too many bits for uint32_t");
> +    Weights.push_back(Weight->getZExtValue());
> +    WeightSum += Weights.back();
>    }
>    assert(Weights.size() == TI->getNumSuccessors() && "Checked above");
> -  for (unsigned i = 0, e = TI->getNumSuccessors(); i != e; ++i)
> -    setEdgeWeight(BB, i, Weights[i]);
> +
> +  // If the sum of weights does not fit in 32 bits, scale every weight down
> +  // accordingly.
> +  uint64_t ScalingFactor =
> +      (WeightSum > UINT32_MAX) ? WeightSum / UINT32_MAX + 1 : 1;
> +
> +  WeightSum = 0;
> +  for (unsigned i = 0, e = TI->getNumSuccessors(); i != e; ++i) {
> +    uint32_t W = Weights[i] / ScalingFactor;
> +    WeightSum += W;
> +    setEdgeWeight(BB, i, W);
> +  }
> +  assert(WeightSum <= UINT32_MAX &&
> +         "Expected weights to scale down to 32 bits");
>  
>    return true;
>  }
> Index: lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
> ===================================================================
> --- lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
> +++ lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
> @@ -8014,7 +8014,7 @@
>        // here, use the same flooring mechanism previously used by BPI.
>        Weight = std::max(
>            1u, BPI->getEdgeWeight(SI.getParent(), I.getSuccessorIndex()));
> -      assert(Weight <= UINT32_MAX / SI.getNumSuccessors());
> +      assert(Weight < UINT32_MAX);

The new assertion doesn't check the sum though, and the sum can now
overflow because of the floor.

In scenarios with lots of successors that have a weight of 0 (and one
very big successor), the floor here will produce a sum that's over
`UINT32_MAX`.  This could overflow in the `visitSwitch()` code.

E.g., `UINT32_MAX - 1` successors, s.t.:

  - `1` successor has a weight of `3`.
  - `UINT32_MAX - 2` successors have a weight of `0` each.

The sum will be `UINT32_MAX + 1`, which overflows.

I don't think we can increase the `ScalingFactor` above (and loosen this
assertion) until the floor has been removed.

>From your email in the other thread:

> > (Or we otherwise need to check that we have space to add 1 to all the
> > weights of 0.)
> 
> I've gone this way in the update to the BPI scaling patch
> (http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20150504/275088.html).

I think if you want to go this way, you need to:

 1. Check that sum in use is small enough after adding 1 to all the
    weights.
 2. If it overflows, rescale everything locally.

It's probably simpler just to split the two changes in BPI into two
patches, or roll in the change to `visitSwitch()`.