[PATCH] D12603: Use fixed-point representation for BranchProbability

Tue Sep 8 21:32:42 PDT 2015

On Tue, Sep 8, 2015 at 2:42 PM, Duncan P. N. Exon Smith <
dexonsmith at apple.com> wrote:

>
> > On 2015-Sep-08, at 10:53, Cong Hou <congh at google.com> wrote:
> >
> > On Tue, Sep 8, 2015 at 9:39 AM, Duncan P. N. Exon Smith <
> dexonsmith at apple.com> wrote:
> >
> > > On 2015-Sep-05, at 22:41, David Li <davidxl at google.com> wrote:
> > >
> > >
> > > ================
> > > Comment at: test/Analysis/BlockFrequencyInfo/basic.ll:71
> > > @@ -70,3 +70,3 @@
> > > ; The 'case_c' branch is predicted more likely via branch weight
> metadata.
> > > -; CHECK-NEXT: case_c: float = 0.8,
> > > +; CHECK-NEXT: case_c: float = 0.8
> > > case_c:
> > > ----------------
> > > where does the coma come from?
> >
> > The comma was there to confirm that we get *exactly* 0.8.  Why has
> > that changed?
> >
> > With fixed point representation, we won't get exactly 0.8 here but
> 0.8000000001. Should I restrict the output precision when printing this
> value, or use regex or exact 0.8000000001 in the test case?
>
> I don't like that we don't get exactly 0.8 here, and frankly I don't
> quite understand it.  It's certainly possible to get exactly 0.8:
> --
> $ bc
> scale=100
> 3435973836/4294967295
> .8000000000000000000000000000000000000000000000000000000000000000000\
> 000000000000000000000000000000000
> --
>
> What's going wrong?
>

Thank you very much for the review, Duncan!

David pointed out that there is a rounding issue when calculating the
numerator. Instead of

     uint64_t Prob64 = (n * static_cast<uint64_t>(D)) / d;

I should use

    uint64_t Prob64 = (n * static_cast<uint64_t>(D) + d / 2) / d;

To get the numerator. With this issue fixed, we still could not get exact
0.8. I think this is because that the frequency is calculated from branch
probability and the precision loss is accumulated and enlarged. (here 0.8
is not branch probability but block frequency represented by a scaled
number). Another possibility is that some other probability could not be
expressed as precisely as 0.8: 0.7 is such an example:

>>> 3006477106/4294967295.
0.6999999998835846
>>> 3006477107/4294967295.
0.7000000001164153

> (If it really is unavoidable, I think restricting output precision makes
> the most sense, so we get nice predictable output.  But I want to be sure
> it's actually necessary, first.)
>

To get 0.8 I need to adjust the precision when outputting the scaled number
that represents the block frequency. The default precision is 10, and after
changing it to a smaller number we will get exact 0.8 here
(otherwise 0.8000000001, which exact has 10 digits after decimal).

>
> I have a few more comments on the patch below.
>
> > Index: test/Analysis/BlockFrequencyInfo/bad_input.ll
> > ===================================================================
> > --- test/Analysis/BlockFrequencyInfo/bad_input.ll
> > +++ test/Analysis/BlockFrequencyInfo/bad_input.ll
> > @@ -10,7 +10,7 @@
> >    br label %for.body
> >
> >  ; Check that we get 1,4 instead of 0,3.
> > -; CHECK-NEXT: for.body: float = 4.0,
> > +; CHECK-NEXT: for.body: float = 3.99
> >  for.body:
> >    %i = phi i32 [ 0, %entry ], [ %inc, %for.body ]
> >    call void @g(i32 %i)
> > @@ -43,7 +43,7 @@
> >
> >  ; Check that the exit weight is half of entry, since half is lost in the
> >  ; infinite loop above.
> > -; CHECK-NEXT: for.end: float = 0.5,
> > +; CHECK-NEXT: for.end: float = 0.49
>
> All these testcases being slightly off is suspicous to me.  32 bits
> ought to be enough to represent these closely.
>

With the rounding issue fixed we won't get such problems anymore (but I
still need to restrict the output precision of the scaled number).

>
> >  for.end:
> >    ret void
> >  }
> > Index: test/Transforms/SampleProfile/propagate.ll
> > ===================================================================
> > --- test/Transforms/SampleProfile/propagate.ll
> > +++ test/Transforms/SampleProfile/propagate.ll
> > @@ -97,8 +97,8 @@
> >    %div4 = sdiv i64 %10, 4, !dbg !22
> >    %cmp5 = icmp sgt i64 %9, %div4, !dbg !22
> >    br i1 %cmp5, label %if.then6, label %if.else7, !dbg !22
> > -; CHECK: edge if.end -> if.then6 probability is 3 / 6342 = 0.0473037%
> > -; CHECK: edge if.end -> if.else7 probability is 6339 / 6342 = 99.9527%
> [HOT edge]
> > +; CHECK: edge if.end -> if.then6 probability is 0%
>
> This is a fairly lossy change.  We've lost all the significant digits
> of 0.0473037%, and made it look like 1:0.  I hope that's unnecessary.
>

This is because previously I only output the rounded integer as the
percentage. In the updated patch, I have changed it to output as %.2f.

> > +; CHECK: edge if.end -> if.else7 probability is 100%
> >
> >  if.then6:                                         ; preds = %if.end
> >    %11 = load i32, i32* %y.addr, align 4, !dbg !24
> > Index: lib/Support/BranchProbability.cpp
> > ===================================================================
> > --- lib/Support/BranchProbability.cpp
> > +++ lib/Support/BranchProbability.cpp
> > @@ -15,16 +15,42 @@
> >  #include "llvm/Support/Debug.h"
> >  #include "llvm/Support/Format.h"
> >  #include "llvm/Support/raw_ostream.h"
> > +#include <cassert>
> >
> >  using namespace llvm;
> >
> >  raw_ostream &BranchProbability::print(raw_ostream &OS) const {
> > -  return OS << N << " / " << D << " = "
>
> This was nice because it gave us something exact for debugging.
>
> I suggest you maintain that, by printing `N` in hexidecimal with padding
> (so that things line up nicely):
> --
> 0x00000000
> 0x023ad039
> 0xffffffff
> --
> This is similar to how BlockMass is printed.
>

This is a nice suggestion. Done.

>
> > -            << format("%g%%", ((double)N / D) * 100.0);
> > +  return OS << format("%d%%",
> > +                      static_cast<int>(std::round(((double)N / D) *
> 100.0)));
>
> Why change the way these are rounded and printed?
>

This is changed to output as %.2f now (otherwise we may get strange result
like 9.1242e-10%).

>
> >  }
> >
> >  void BranchProbability::dump() const { print(dbgs()) << '\n'; }
> >
> > +BranchProbability::BranchProbability(uint32_t n, uint32_t d) {
> > +  assert(d > 0 && "Denominator cannot be 0!");
> > +  assert(n <= d && "Probability cannot be bigger than 1!");
> > +  if (d == D)
> > +    N = n;
> > +  else {
> > +    uint64_t Prob64 = n * static_cast<uint64_t>(D) / d;
> > +    N = static_cast<uint32_t>(Prob64);
>
> It doesn't look like this is rounding correctly.  Can you add some
> unit tests?
>

This is fixed as stated above. The patch already failed many branch
probability unit tests. I will fix them soon with a patch update. Those
unit tests should cover the correctness of this constructor.

>
> > +  }
> > +}
> > +
> > +BranchProbability &BranchProbability::operator+=(BranchProbability RHS)
> {
> > +  assert(N <= D - RHS.N &&
> > +         "The sum of branch probabilities should not exceed one!");
> > +  N += RHS.N;
> > +  return *this;
> > +}
> > +
> > +BranchProbability &BranchProbability::operator-=(BranchProbability RHS)
> {
> > +  assert(N >= RHS.N &&
> > +         "Can only subtract a smaller probability from a larger one!");
> > +  N -= RHS.N;
> > +  return *this;
> > +}
> > +
> >  static uint64_t scale(uint64_t Num, uint32_t N, uint32_t D) {
> >    assert(D && "divide by 0");
> >
> > Index: include/llvm/Support/BranchProbability.h
> > ===================================================================
> > --- include/llvm/Support/BranchProbability.h
> > +++ include/llvm/Support/BranchProbability.h
> > @@ -15,36 +15,44 @@
> >  #define LLVM_SUPPORT_BRANCHPROBABILITY_H
> >
> >  #include "llvm/Support/DataTypes.h"
> > -#include <cassert>
> >
> >  namespace llvm {
> >
> >  class raw_ostream;
> >
> > -// This class represents Branch Probability as a non-negative fraction.
> > +// This class represents Branch Probability as a non-negative fraction
> that is
> > +// no greater than 1. It uses a fixed-point-like implementation, in
> which the
> > +// denominator is always a constant value (here we use UINT32_MAX for
> maximum
> > +// precision).
> >  class BranchProbability {
> >    // Numerator
> >    uint32_t N;
> >
> > -  // Denominator
> > -  uint32_t D;
> > +  // Denominator, which is a constant value.
> > +  static const uint32_t D = UINT32_MAX;
> > +
> > +  // Construct a BranchProbability with only numerator assuming the
> denominator
> > +  // is UINT32_MAX. For internal use only.
> > +  explicit BranchProbability(uint32_t n) : N(n) {}
>
> It seems nice to expose this functionality publicly without having to
> pass in `UINT32_MAX` to the denominator.
> --
> static BranchProbability getRaw(uint32_t N)
> --
>
> Or maybe you can come up with a better name than `getRaw()`.
>
>
I have added this public static function. I could not think out a better
name so let's use getRaw() now.

> >
> >  public:
> > -  BranchProbability(uint32_t n, uint32_t d) : N(n), D(d) {
> > -    assert(d > 0 && "Denominator cannot be 0!");
> > -    assert(n <= d && "Probability cannot be bigger than 1!");
> > -  }
> > +  BranchProbability() : N(0) {}
> > +  BranchProbability(uint32_t n, uint32_t d);
>
> Can you fix the style here in a separate NFC commit?  The parameters
> should really be 'N' and 'D'.
>

OK. As N and D are used as data members, I need to use longer names like
Num and Denom.

>
> > +
> > +  bool isZero() const { return N == 0; }
> >
> > -  static BranchProbability getZero() { return BranchProbability(0, 1); }
> > -  static BranchProbability getOne() { return BranchProbability(1, 1); }
> > +  static BranchProbability getZero() { return BranchProbability(0); }
> > +  static BranchProbability getOne() { return BranchProbability(D); }
> > +
> > +  // Normalize given probabilties so that the sum of them becomes one.
> > +  template <class ProbabilityList>
> > +  static void normalizeProbabilities(ProbabilityList &Probs);
> >
> >    uint32_t getNumerator() const { return N; }
> > -  uint32_t getDenominator() const { return D; }
> > +  static uint32_t getDenominator() { return D; }
> >
> >    // Return (1 - Probability).
> > -  BranchProbability getCompl() const {
> > -    return BranchProbability(D - N, D);
> > -  }
> > +  BranchProbability getCompl() const { return BranchProbability(D - N);
> }
> >
> >    raw_ostream &print(raw_ostream &OS) const;
> >
> > @@ -66,24 +74,67 @@
> >    /// \return \c Num divided by \c this.
> >    uint64_t scaleByInverse(uint64_t Num) const;
> >
> > -  bool operator==(BranchProbability RHS) const {
> > -    return (uint64_t)N * RHS.D == (uint64_t)D * RHS.N;
> > +  BranchProbability &operator+=(BranchProbability RHS);
> > +  BranchProbability &operator-=(BranchProbability RHS);
> > +  BranchProbability &operator*=(BranchProbability RHS) {
> > +    N = static_cast<uint64_t>(N) * RHS.N / D;
> > +    return *this;
> >    }
> > -  bool operator!=(BranchProbability RHS) const {
> > -    return !(*this == RHS);
> > +
> > +  BranchProbability operator+(BranchProbability RHS) const {
> > +    BranchProbability Prob(*this);
> > +    return Prob += RHS;
> >    }
> > -  bool operator<(BranchProbability RHS) const {
> > -    return (uint64_t)N * RHS.D < (uint64_t)D * RHS.N;
> > +
> > +  BranchProbability operator-(BranchProbability RHS) const {
> > +    BranchProbability Prob(*this);
> > +    return Prob -= RHS;
> >    }
> > +
> > +  BranchProbability operator*(BranchProbability RHS) const {
> > +    BranchProbability Prob(*this);
> > +    return Prob *= RHS;
> > +  }
> > +
> > +  bool operator==(BranchProbability RHS) const { return N == RHS.N; }
> > +  bool operator!=(BranchProbability RHS) const { return !(*this ==
> RHS); }
> > +  bool operator<(BranchProbability RHS) const { return N < RHS.N; }
> >    bool operator>(BranchProbability RHS) const { return RHS < *this; }
> >    bool operator<=(BranchProbability RHS) const { return !(RHS < *this);
> }
> >    bool operator>=(BranchProbability RHS) const { return !(*this < RHS);
> }
> > +
> > +  friend BranchProbability operator*(BranchProbability LHS, uint32_t
> RHS);
> > +  friend BranchProbability operator/(BranchProbability LHS, uint32_t
> RHS);
>
> Why do these need to be friends?  Can't they reach in and access the
> numerator via `getNumerator()`?
>

Friends are removed now and getRaw() is used to construct the new object.

> >  };
> >
> > -inline raw_ostream &operator<<(raw_ostream &OS, const BranchProbability
> &Prob) {
> > +inline raw_ostream &operator<<(raw_ostream &OS, BranchProbability Prob)
> {
>
> Why change this?  If you really prefer passing by-value, please do it in
> a separate NFC commit so it doesn't add noise to this patch.
>
> >    return Prob.print(OS);
> >  }
> >
> > +inline BranchProbability operator*(BranchProbability LHS, uint32_t RHS)
> {
> > +  LHS.N *= RHS;
>
> Why do we need these?  They seem error-prone.  It's also new API we
> didn't need before, so I'm a little skeptical...
>

Those are needed when I changed weights to probabilities in MBB. But at
this moment there is no use case at all so I can remove them now and add
them later.

>
> What's the story for overflow?  I imagine we'd want saturation
> behaviour, or, possibly more robust, an assertion.
>

Yes, I think saturation is a proper behavior.

>
> > +  return LHS;
> > +}
> > +
> > +inline BranchProbability operator/(BranchProbability LHS, uint32_t RHS)
> {
> > +  LHS.N /= RHS;
>
> Assertion about 0?  Rounding?  (Do we even need this?)
>

Yes. Rounding is necessary here. As this is not used so I have removed this
function and will add it back later.

>
> > +  return LHS;
> > +}
> > +
> > +inline BranchProbability operator*(uint32_t LHS, BranchProbability RHS)
> {
> > +  return RHS * LHS;
> > +}
> > +
> > +template <class ProbabilityList>
> > +void BranchProbability::normalizeProbabilities(ProbabilityList &Probs) {
> > +  uint64_t Sum = 0;
> > +  for (auto Prob : Probs)
> > +    Sum += Prob.N;
> > +  if (Sum > 0)
>
> Early return instead of branching?
>

OK.

>
> Actually, this looks like bad input if `Sum` is 0.  Should this just
> assert?
>

There are some cases in the follow-up patch (replacing weights with
probabilities) in which the sum can be zero. I will look into it further
but here let's add an assert.

>
> > +    for (auto &Prob : Probs)
> > +      Prob.N = Prob.N * uint64_t(D) / Sum;
>
> I don't think these are guaranteed to equal `UINT32_MAX`, since there
> can be rounding errors.  You need to dither somehow.  Can you add some
> unit tests and fix the bugs?
>

It seems it is not necessary to make the sum of resulted probabilities to
exact one, which is difficult. I have update the comment of this function
saying we are getting approximate one as the sum.

The rounding problem is fixed by changing it to Prob.N = (Prob.N *
uint64_t(D) + Sum / 2) / Sum;

>
> > +}
> > +
> >  }
> >
> >  #endif
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150908/d208b9e3/attachment.html>