[PATCH] Fix information loss in branch probability computation.
Duncan P. N. Exon Smith
dexonsmith at apple.com
Wed May 6 16:18:41 PDT 2015
> On 2015 May 6, at 19:00, Diego Novillo <dnovillo at google.com> wrote:
>
> This modifies the scaling to do it based on the sum of all branch weights. If
> the sum of weights exceeds 32 bits, compute an integer scale to bring all the
> weights within the range.
>
> Additionally, remove the use of scaled integers. Use 64 bit integers for
> all the calculations.
>
> Adjust tests and an assertion in switch lowering. The assertion was
> using the previous branch weight limits.
>
>
> http://reviews.llvm.org/D9442
>
> Files:
> lib/Analysis/BranchProbabilityInfo.cpp
> lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
> test/Analysis/BranchProbabilityInfo/pr22718.ll
> test/CodeGen/X86/MachineBranchProb.ll
>
> EMAIL PREFERENCES
> http://reviews.llvm.org/settings/panel/emailpreferences/
> <D9442.25103.patch>
> Index: lib/Analysis/BranchProbabilityInfo.cpp
> ===================================================================
> --- lib/Analysis/BranchProbabilityInfo.cpp
> +++ lib/Analysis/BranchProbabilityInfo.cpp
> @@ -21,6 +21,7 @@
> #include "llvm/IR/LLVMContext.h"
> #include "llvm/IR/Metadata.h"
> #include "llvm/Support/Debug.h"
> +#include "llvm/Support/ScaledNumber.h"
> #include "llvm/Support/raw_ostream.h"
>
> using namespace llvm;
> @@ -115,11 +116,6 @@
> // Minimum weight of an edge. Please note, that weight is NEVER 0.
> static const uint32_t MIN_WEIGHT = 1;
>
> -static uint32_t getMaxWeightFor(BasicBlock *BB) {
> - return UINT32_MAX / BB->getTerminator()->getNumSuccessors();
> -}
> -
> -
> /// \brief Calculate edge weights for successors lead to unreachable.
> ///
> /// Predict that a successor which leads necessarily to an
> @@ -185,27 +181,45 @@
> if (!WeightsNode)
> return false;
>
> + // Check that the number of successors is manageable.
> + assert(TI->getNumSuccessors() < UINT32_MAX && "Too many successors");
> +
> // Ensure there are weights for all of the successors. Note that the first
> // operand to the metadata node is a name, not a weight.
> if (WeightsNode->getNumOperands() != TI->getNumSuccessors() + 1)
> return false;
>
> - // Build up the final weights that will be used in a temporary buffer, but
> - // don't add them until all weights are present. Each weight value is clamped
> - // to [1, getMaxWeightFor(BB)].
> - uint32_t WeightLimit = getMaxWeightFor(BB);
> + // Build up the final weights that will be used in a temporary buffer.
> + // Compute the sum of all weights to later decide whether they need to
> + // be scaled to fit in 32 bits.
> + uint64_t WeightSum = 0;
> SmallVector<uint32_t, 2> Weights;
> Weights.reserve(TI->getNumSuccessors());
> for (unsigned i = 1, e = WeightsNode->getNumOperands(); i != e; ++i) {
> ConstantInt *Weight =
> mdconst::dyn_extract<ConstantInt>(WeightsNode->getOperand(i));
> if (!Weight)
> return false;
> - Weights.push_back(Weight->getLimitedValue(WeightLimit));
> + assert(Weight->getValue().getActiveBits() <= 32 &&
> + "Too many bits for uint32_t");
> + Weights.push_back(Weight->getZExtValue());
> + WeightSum += Weights.back();
> }
> assert(Weights.size() == TI->getNumSuccessors() && "Checked above");
> - for (unsigned i = 0, e = TI->getNumSuccessors(); i != e; ++i)
> - setEdgeWeight(BB, i, Weights[i]);
> +
> + // If the sum of weights does not fit in 32 bits, scale every weight down
> + // accordingly.
> + uint64_t ScalingFactor =
> + (WeightSum > UINT32_MAX) ? WeightSum / UINT32_MAX + 1 : 1;
> +
> + WeightSum = 0;
> + for (unsigned i = 0, e = TI->getNumSuccessors(); i != e; ++i) {
> + uint32_t W = Weights[i] / ScalingFactor;
> + WeightSum += W;
> + setEdgeWeight(BB, i, W);
> + }
> + assert(WeightSum <= UINT32_MAX &&
> + "Expected weights to scale down to 32 bits");
>
> return true;
> }
> Index: lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
> ===================================================================
> --- lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
> +++ lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
> @@ -8014,7 +8014,7 @@
> // here, use the same flooring mechanism previously used by BPI.
> Weight = std::max(
> 1u, BPI->getEdgeWeight(SI.getParent(), I.getSuccessorIndex()));
> - assert(Weight <= UINT32_MAX / SI.getNumSuccessors());
> + assert(Weight < UINT32_MAX);
The new assertion doesn't check the sum though, and the sum can now
overflow because of the floor.
In scenarios with lots of successors that have a weight of 0 (and one
very big successor), the floor here will produce a sum that's over
`UINT32_MAX`. This could overflow in the `visitSwitch()` code.
E.g., `UINT32_MAX - 1` successors, s.t.:
- `1` successor has a weight of `3`.
- `UINT32_MAX - 2` successors have a weight of `0` each.
The sum will be `UINT32_MAX + 1`, which overflows.
I don't think we can increase the `ScalingFactor` above (and loosen this
assertion) until the floor has been removed.
>From your email in the other thread:
> > (Or we otherwise need to check that we have space to add 1 to all the
> > weights of 0.)
>
> I've gone this way in the update to the BPI scaling patch
> (http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20150504/275088.html).
I think if you want to go this way, you need to:
1. Check that sum in use is small enough after adding 1 to all the
weights.
2. If it overflows, rescale everything locally.
It's probably simpler just to split the two changes in BPI into two
patches, or roll in the change to `visitSwitch()`.
More information about the llvm-commits
mailing list