[PATCH] Allow 0-weight branches in BranchProbabilityInfo.

Wed May 6 10:09:21 PDT 2015

> On 2015 May 6, at 12:42, Diego Novillo <dnovillo at google.com> wrote:
> 
> On Wed, May 6, 2015 at 12:41 PM, Hans Wennborg <hans at chromium.org> wrote:
>> ================
>> Comment at: lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:8024
>> @@ -8013,1 +8023,3 @@
>> +      Weight = std::max(
>> +          1u, BPI->getEdgeWeight(SI.getParent(), I.getSuccessorIndex()));
>>       assert(Weight <= UINT32_MAX / SI.getNumSuccessors());
>> ----------------
>> This looks good. I can address this TODO later.
>> 
>> I figured this would be the only change necessary for the switch lowering code. Are the changes to getEdgeWeight() and lowerWorkItem() also necessary?
> 
> I don't know. I took the machine gun approach and changed all the
> places that called into BPI. I can remove the ones that don't cause
> the assertion in the tests, if you want.
> 
> 
> Diego.

Same comment as Hans, but I don't think the change the right one
for switch lowering:

> Index: lib/Analysis/BranchProbabilityInfo.cpp
> ===================================================================
> --- lib/Analysis/BranchProbabilityInfo.cpp
> +++ lib/Analysis/BranchProbabilityInfo.cpp
> @@ -201,8 +201,7 @@
>          mdconst::dyn_extract<ConstantInt>(WeightsNode->getOperand(i));
>      if (!Weight)
>        return false;
> -    Weights.push_back(
> -      std::max<uint32_t>(1, Weight->getLimitedValue(WeightLimit)));
> +    Weights.push_back(Weight->getLimitedValue(WeightLimit));
>    }
>    assert(Weights.size() == TI->getNumSuccessors() && "Checked above");
>    for (unsigned i = 0, e = TI->getNumSuccessors(); i != e; ++i)
> @@ -553,7 +552,7 @@
>      uint32_t PrevSum = Sum;
>  
>      Sum += Weight;
> -    assert(Sum > PrevSum); (void) PrevSum;
> +    assert(Sum >= PrevSum); (void) PrevSum;
>    }
>  
>    return Sum;
> @@ -616,14 +615,17 @@
>  uint32_t BranchProbabilityInfo::
>  getEdgeWeight(const BasicBlock *Src, const BasicBlock *Dst) const {
>    uint32_t Weight = 0;
> +  bool FoundWeight = false;
>    DenseMap<Edge, uint32_t>::const_iterator MapI;
>    for (succ_const_iterator I = succ_begin(Src), E = succ_end(Src); I != E; ++I)
>      if (*I == Dst) {
>        MapI = Weights.find(std::make_pair(Src, I.getSuccessorIndex()));
> -      if (MapI != Weights.end())
> +      if (MapI != Weights.end()) {
> +        FoundWeight = true;
>          Weight += MapI->second;
> +      }
>      }
> -  return (Weight == 0) ? DEFAULT_WEIGHT : Weight;
> +  return (!FoundWeight) ? DEFAULT_WEIGHT : Weight;
>  }
>  
>  /// Set the edge weight for a given edge specified by PredBlock and an index
> Index: lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
> ===================================================================
> --- lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
> +++ lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
> @@ -1319,7 +1319,10 @@
>      return 0;
>    const BasicBlock *SrcBB = Src->getBasicBlock();
>    const BasicBlock *DstBB = Dst->getBasicBlock();
> -  return BPI->getEdgeWeight(SrcBB, DstBB);
> +  // TODO - BPI used to guarantee non-zero weights, but this produces
> +  // information loss (see PR 22718). Since we can't handle zero weights here,
> +  // use the same flooring mechanism previously used by BPI.

Why not?

> +  return std::max(1u, BPI->getEdgeWeight(SrcBB, DstBB));
>  }
>  
>  void SelectionDAGBuilder::
> @@ -7774,7 +7777,12 @@
>          addSuccessorWithWeight(
>              SwitchMBB, DefaultMBB,
>              // The default destination is the first successor in IR.
> -            BPI ? BPI->getEdgeWeight(SwitchMBB->getBasicBlock(), (unsigned)0)
> +            // TODO - BPI used to guarantee non-zero weights, but this produces
> +            // information loss (see PR 22718). Since we can't handle zero
> +            // weights here, use the same flooring mechanism previously used by
> +            // BPI.
> +            BPI ? std::max(1u, BPI->getEdgeWeight(SwitchMBB->getBasicBlock(),
> +                                                  (unsigned)0))
>                  : 0);

Same here: why not?  Shouldn't we fix that first?

>  
>          // Insert the true branch.
> @@ -8009,7 +8017,11 @@
>      const ConstantInt *CaseVal = I.getCaseValue();
>      uint32_t Weight = 1;
>      if (BPI) {
> -      Weight = BPI->getEdgeWeight(SI.getParent(), I.getSuccessorIndex());
> +      // TODO - BPI used to guarantee non-zero weights, but this produces
> +      // information loss (see PR 22718). Since we can't handle zero weights
> +      // here, use the same flooring mechanism previously used by BPI.
> +      Weight = std::max(
> +          1u, BPI->getEdgeWeight(SI.getParent(), I.getSuccessorIndex()));

This looks like `visitSwitch()`.  I don't see why we can't handle a
weight of 0 here.

It would be easy to change `SelectionDAGBuilder::splitWorkItem` to be
resilient to weights of 0 instead:

    diff --git a/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
    index 5b6efe0..9b147cf 100644
    --- a/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
    +++ b/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
    @@ -7990,23 +7990,33 @@ void SelectionDAGBuilder::splitWorkItem(SwitchWorkList &WorkList,
       // Mehlhorn "Nearly Optimal Binary Search Trees" (1975).
       CaseClusterIt LastLeft = W.FirstCluster;
       CaseClusterIt FirstRight = W.LastCluster;
    -  uint32_t LeftWeight = LastLeft->Weight;
    -  uint32_t RightWeight = FirstRight->Weight;
    +
    +  auto getWorkingWeight = [](const CaseCluster &Cluster) {
    +    // Zero-weight nodes would naturally cause skewed trees since they don't
    +    // affect LeftWeight or RightWeight.  Shift weights up 32-bits, and give
    +    // zero-weights a value of 1.
    +    uint64_t W = Cluster.Weight;
    +    if (W)
    +      return W << 32;
    +    return 1;
    +  };
    +  uint64_t WorkingLeftWeight = getWorkingWeight(LastLeft);
    +  uint64_t WorkingRightWeight = getWorkingWeight(FirstRight);

       // Move LastLeft and FirstRight towards each other from opposite directions to
       // find a partitioning of the clusters which balances the weight on both
       // sides.
       while (LastLeft + 1 < FirstRight) {
    -    // Zero-weight nodes would cause skewed trees since they don't affect
    -    // LeftWeight or RightWeight.
    -    assert(LastLeft->Weight != 0);
    -    assert(FirstRight->Weight != 0);
    -
    -    if (LeftWeight < RightWeight)
    -      LeftWeight += (++LastLeft)->Weight;
    +    if (WorkingLeftWeight < WorkingRightWeight)
    +      WorkingLeftWeight += getWorkingWeight(++LastLeft);
         else
    -      RightWeight += (--FirstRight)->Weight;
    +      WorkingRightWeight += getWorkingWeight(--FirstRight);
       }
    +
    +  // Scale back down to 32-bit numbers.
    +  uint32_t LeftWeight = WorkingLeftWeight >> 32;
    +  uint32_t RightWeight = WorkingRightWeight >> 32;
    +
       assert(LastLeft + 1 == FirstRight);
       assert(LastLeft >= W.FirstCluster);
       assert(FirstRight <= W.LastCluster);

>        assert(Weight <= UINT32_MAX / SI.getNumSuccessors());

Hmm.  I don't think BPI should guarantee this anymore.  This is
unnecessary for keeping the sum under `UINT32_MAX`.

>      }
>      Clusters.push_back(CaseCluster::range(CaseVal, CaseVal, Succ, Weight));