[PATCH] D12603: Use fixed-point representation for BranchProbability

Tue Sep 8 14:42:26 PDT 2015

> On 2015-Sep-08, at 10:53, Cong Hou <congh at google.com> wrote:
> 
> On Tue, Sep 8, 2015 at 9:39 AM, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote:
> 
> > On 2015-Sep-05, at 22:41, David Li <davidxl at google.com> wrote:
> >
> >
> > ================
> > Comment at: test/Analysis/BlockFrequencyInfo/basic.ll:71
> > @@ -70,3 +70,3 @@
> > ; The 'case_c' branch is predicted more likely via branch weight metadata.
> > -; CHECK-NEXT: case_c: float = 0.8,
> > +; CHECK-NEXT: case_c: float = 0.8
> > case_c:
> > ----------------
> > where does the coma come from?
> 
> The comma was there to confirm that we get *exactly* 0.8.  Why has
> that changed?
> 
> With fixed point representation, we won't get exactly 0.8 here but 0.8000000001. Should I restrict the output precision when printing this value, or use regex or exact 0.8000000001 in the test case?

I don't like that we don't get exactly 0.8 here, and frankly I don't
quite understand it.  It's certainly possible to get exactly 0.8:
--
$ bc
scale=100
3435973836/4294967295
.8000000000000000000000000000000000000000000000000000000000000000000\
000000000000000000000000000000000
--

What's going wrong?

(If it really is unavoidable, I think restricting output precision makes
the most sense, so we get nice predictable output.  But I want to be sure
it's actually necessary, first.)

I have a few more comments on the patch below.

> Index: test/Analysis/BlockFrequencyInfo/bad_input.ll
> ===================================================================
> --- test/Analysis/BlockFrequencyInfo/bad_input.ll
> +++ test/Analysis/BlockFrequencyInfo/bad_input.ll
> @@ -10,7 +10,7 @@
>    br label %for.body
>  
>  ; Check that we get 1,4 instead of 0,3.
> -; CHECK-NEXT: for.body: float = 4.0,
> +; CHECK-NEXT: for.body: float = 3.99
>  for.body:
>    %i = phi i32 [ 0, %entry ], [ %inc, %for.body ]
>    call void @g(i32 %i)
> @@ -43,7 +43,7 @@
>  
>  ; Check that the exit weight is half of entry, since half is lost in the
>  ; infinite loop above.
> -; CHECK-NEXT: for.end: float = 0.5,
> +; CHECK-NEXT: for.end: float = 0.49

All these testcases being slightly off is suspicous to me.  32 bits
ought to be enough to represent these closely.

>  for.end:
>    ret void
>  }
> Index: test/Transforms/SampleProfile/propagate.ll
> ===================================================================
> --- test/Transforms/SampleProfile/propagate.ll
> +++ test/Transforms/SampleProfile/propagate.ll
> @@ -97,8 +97,8 @@
>    %div4 = sdiv i64 %10, 4, !dbg !22
>    %cmp5 = icmp sgt i64 %9, %div4, !dbg !22
>    br i1 %cmp5, label %if.then6, label %if.else7, !dbg !22
> -; CHECK: edge if.end -> if.then6 probability is 3 / 6342 = 0.0473037%
> -; CHECK: edge if.end -> if.else7 probability is 6339 / 6342 = 99.9527% [HOT edge]
> +; CHECK: edge if.end -> if.then6 probability is 0%

This is a fairly lossy change.  We've lost all the significant digits
of 0.0473037%, and made it look like 1:0.  I hope that's unnecessary.

> +; CHECK: edge if.end -> if.else7 probability is 100%
>  
>  if.then6:                                         ; preds = %if.end
>    %11 = load i32, i32* %y.addr, align 4, !dbg !24
> Index: lib/Support/BranchProbability.cpp
> ===================================================================
> --- lib/Support/BranchProbability.cpp
> +++ lib/Support/BranchProbability.cpp
> @@ -15,16 +15,42 @@
>  #include "llvm/Support/Debug.h"
>  #include "llvm/Support/Format.h"
>  #include "llvm/Support/raw_ostream.h"
> +#include <cassert>
>  
>  using namespace llvm;
>  
>  raw_ostream &BranchProbability::print(raw_ostream &OS) const {
> -  return OS << N << " / " << D << " = "

This was nice because it gave us something exact for debugging.

I suggest you maintain that, by printing `N` in hexidecimal with padding
(so that things line up nicely):
--
0x00000000
0x023ad039
0xffffffff
--
This is similar to how BlockMass is printed.

> -            << format("%g%%", ((double)N / D) * 100.0);
> +  return OS << format("%d%%",
> +                      static_cast<int>(std::round(((double)N / D) * 100.0)));

Why change the way these are rounded and printed?

>  }
>  
>  void BranchProbability::dump() const { print(dbgs()) << '\n'; }
>  
> +BranchProbability::BranchProbability(uint32_t n, uint32_t d) {
> +  assert(d > 0 && "Denominator cannot be 0!");
> +  assert(n <= d && "Probability cannot be bigger than 1!");
> +  if (d == D)
> +    N = n;
> +  else {
> +    uint64_t Prob64 = n * static_cast<uint64_t>(D) / d;
> +    N = static_cast<uint32_t>(Prob64);

It doesn't look like this is rounding correctly.  Can you add some
unit tests?

> +  }
> +}
> +
> +BranchProbability &BranchProbability::operator+=(BranchProbability RHS) {
> +  assert(N <= D - RHS.N &&
> +         "The sum of branch probabilities should not exceed one!");
> +  N += RHS.N;
> +  return *this;
> +}
> +
> +BranchProbability &BranchProbability::operator-=(BranchProbability RHS) {
> +  assert(N >= RHS.N &&
> +         "Can only subtract a smaller probability from a larger one!");
> +  N -= RHS.N;
> +  return *this;
> +}
> +
>  static uint64_t scale(uint64_t Num, uint32_t N, uint32_t D) {
>    assert(D && "divide by 0");
>  
> Index: include/llvm/Support/BranchProbability.h
> ===================================================================
> --- include/llvm/Support/BranchProbability.h
> +++ include/llvm/Support/BranchProbability.h
> @@ -15,36 +15,44 @@
>  #define LLVM_SUPPORT_BRANCHPROBABILITY_H
>  
>  #include "llvm/Support/DataTypes.h"
> -#include <cassert>
>  
>  namespace llvm {
>  
>  class raw_ostream;
>  
> -// This class represents Branch Probability as a non-negative fraction.
> +// This class represents Branch Probability as a non-negative fraction that is
> +// no greater than 1. It uses a fixed-point-like implementation, in which the
> +// denominator is always a constant value (here we use UINT32_MAX for maximum
> +// precision).
>  class BranchProbability {
>    // Numerator
>    uint32_t N;
>  
> -  // Denominator
> -  uint32_t D;
> +  // Denominator, which is a constant value.
> +  static const uint32_t D = UINT32_MAX;
> +
> +  // Construct a BranchProbability with only numerator assuming the denominator
> +  // is UINT32_MAX. For internal use only.
> +  explicit BranchProbability(uint32_t n) : N(n) {}

It seems nice to expose this functionality publicly without having to
pass in `UINT32_MAX` to the denominator.
--
static BranchProbability getRaw(uint32_t N)
--

Or maybe you can come up with a better name than `getRaw()`.

>  
>  public:
> -  BranchProbability(uint32_t n, uint32_t d) : N(n), D(d) {
> -    assert(d > 0 && "Denominator cannot be 0!");
> -    assert(n <= d && "Probability cannot be bigger than 1!");
> -  }
> +  BranchProbability() : N(0) {}
> +  BranchProbability(uint32_t n, uint32_t d);

Can you fix the style here in a separate NFC commit?  The parameters
should really be 'N' and 'D'.

> +
> +  bool isZero() const { return N == 0; }
>  
> -  static BranchProbability getZero() { return BranchProbability(0, 1); }
> -  static BranchProbability getOne() { return BranchProbability(1, 1); }
> +  static BranchProbability getZero() { return BranchProbability(0); }
> +  static BranchProbability getOne() { return BranchProbability(D); }
> +
> +  // Normalize given probabilties so that the sum of them becomes one.
> +  template <class ProbabilityList>
> +  static void normalizeProbabilities(ProbabilityList &Probs);
>  
>    uint32_t getNumerator() const { return N; }
> -  uint32_t getDenominator() const { return D; }
> +  static uint32_t getDenominator() { return D; }
>  
>    // Return (1 - Probability).
> -  BranchProbability getCompl() const {
> -    return BranchProbability(D - N, D);
> -  }
> +  BranchProbability getCompl() const { return BranchProbability(D - N); }
>  
>    raw_ostream &print(raw_ostream &OS) const;
>  
> @@ -66,24 +74,67 @@
>    /// \return \c Num divided by \c this.
>    uint64_t scaleByInverse(uint64_t Num) const;
>  
> -  bool operator==(BranchProbability RHS) const {
> -    return (uint64_t)N * RHS.D == (uint64_t)D * RHS.N;
> +  BranchProbability &operator+=(BranchProbability RHS);
> +  BranchProbability &operator-=(BranchProbability RHS);
> +  BranchProbability &operator*=(BranchProbability RHS) {
> +    N = static_cast<uint64_t>(N) * RHS.N / D;
> +    return *this;
>    }
> -  bool operator!=(BranchProbability RHS) const {
> -    return !(*this == RHS);
> +
> +  BranchProbability operator+(BranchProbability RHS) const {
> +    BranchProbability Prob(*this);
> +    return Prob += RHS;
>    }
> -  bool operator<(BranchProbability RHS) const {
> -    return (uint64_t)N * RHS.D < (uint64_t)D * RHS.N;
> +
> +  BranchProbability operator-(BranchProbability RHS) const {
> +    BranchProbability Prob(*this);
> +    return Prob -= RHS;
>    }
> +
> +  BranchProbability operator*(BranchProbability RHS) const {
> +    BranchProbability Prob(*this);
> +    return Prob *= RHS;
> +  }
> +
> +  bool operator==(BranchProbability RHS) const { return N == RHS.N; }
> +  bool operator!=(BranchProbability RHS) const { return !(*this == RHS); }
> +  bool operator<(BranchProbability RHS) const { return N < RHS.N; }
>    bool operator>(BranchProbability RHS) const { return RHS < *this; }
>    bool operator<=(BranchProbability RHS) const { return !(RHS < *this); }
>    bool operator>=(BranchProbability RHS) const { return !(*this < RHS); }
> +
> +  friend BranchProbability operator*(BranchProbability LHS, uint32_t RHS);
> +  friend BranchProbability operator/(BranchProbability LHS, uint32_t RHS);

Why do these need to be friends?  Can't they reach in and access the
numerator via `getNumerator()`?

>  };
>  
> -inline raw_ostream &operator<<(raw_ostream &OS, const BranchProbability &Prob) {
> +inline raw_ostream &operator<<(raw_ostream &OS, BranchProbability Prob) {

Why change this?  If you really prefer passing by-value, please do it in
a separate NFC commit so it doesn't add noise to this patch.

>    return Prob.print(OS);
>  }
>  
> +inline BranchProbability operator*(BranchProbability LHS, uint32_t RHS) {
> +  LHS.N *= RHS;

Why do we need these?  They seem error-prone.  It's also new API we
didn't need before, so I'm a little skeptical...

What's the story for overflow?  I imagine we'd want saturation
behaviour, or, possibly more robust, an assertion.

> +  return LHS;
> +}
> +
> +inline BranchProbability operator/(BranchProbability LHS, uint32_t RHS) {
> +  LHS.N /= RHS;

Assertion about 0?  Rounding?  (Do we even need this?)

> +  return LHS;
> +}
> +
> +inline BranchProbability operator*(uint32_t LHS, BranchProbability RHS) {
> +  return RHS * LHS;
> +}
> +
> +template <class ProbabilityList>
> +void BranchProbability::normalizeProbabilities(ProbabilityList &Probs) {
> +  uint64_t Sum = 0;
> +  for (auto Prob : Probs)
> +    Sum += Prob.N;
> +  if (Sum > 0)

Early return instead of branching?

Actually, this looks like bad input if `Sum` is 0.  Should this just
assert?

> +    for (auto &Prob : Probs)
> +      Prob.N = Prob.N * uint64_t(D) / Sum;

I don't think these are guaranteed to equal `UINT32_MAX`, since there
can be rounding errors.  You need to dither somehow.  Can you add some
unit tests and fix the bugs?

> +}
> +
>  }
>  
>  #endif
>