[PATCH] D12603: Use fixed-point representation for BranchProbability

Cong Hou via llvm-commits llvm-commits at lists.llvm.org
Mon Sep 21 11:07:28 PDT 2015


On Fri, Sep 11, 2015 at 5:34 PM, Duncan P. N. Exon Smith <
dexonsmith at apple.com> wrote:

>
> > On 2015-Sep-10, at 14:15, Xinliang David Li <davidxl at google.com> wrote:
> >
> > Cong,  It is more common to use power of 2 as the scaling factor in
> > fixed point representation for computation efficiency.  For 32bit
> > width, 0x80000000 is common (noted as Q31).  Here are the benefits
> >
> > 1) it is much faster to do the scale operation with Q31 -- instead of
> > using the current slow version of scale, it can call a faster version
> > to be introduced.
> >
> > This will probably help resolve bug
> > https://llvm.org/bugs/show_bug.cgi?id=24620 where 20% of the time is
> > spent in the slow version of the scale method.
>
> Is there a way to speed up scaling with UINT32_MAX?  I suspect so, but
> haven't worked it out.  (E.g., you could scale to UINT64_C(1) << 32 and
> ignore the first dither; if there's no dithering, subtract 1 from the
> last.)
>

I found on X86 a division by  UINT32_MAX is translated into by both
LLVM/GCC:

movabs $0x8000000080000001,%rdx
mul    %rdx
shr    $0x1f,%rdx

I think this is to keep the most precision. This is much faster than
division but still has one more 64-bit move and one multiply instruction.
The method proposed by you is also ok in terms that we don't need that much
precision in branch probabilities, but this also means whether choosing
UINT32_MAX or 1<<31 is a minor issue (consider how we use branch
probabilities: if we only use it as a ratio to scale values for most
precision, a rational class may be needed in LLVM).


> > 2) it allows negative fractional number to be represented
>
> Not relevant for probabilities, though.
>
> > 3) it is more precise for lots of common cases. For instance,
> > representation of 0.5, 0.25, 0.75 (i.e. 1/2, 1/4, 3/4) etc are
> > actually lossless with Q31 while the current choice of using
> > uint32_max can not represent them precisely.
>
> It's less precise for lots of common cases, too.  2^32's only prime
> factor is 2, but 2^32-1 has 3, 5, 17, etc.
>
> This argument seems weak either way.
>

A benefit to precisely represent 50% is that this is a common case in
non-PGO mode for many branches, and hence this will have least impacts to
results of those cases. This is not a strong argument though.



>
> >
> > (if needed in the future, we can also extend this to be a template
> > class where number of bits in the scaling factor is a template
> > parameter to support values > 1
> >   template <int bits> class FixedPoint {
> >     static const int ScalingFactor = 1 << bits;
> > };
> > )
>
> Sure, applies to either.
>
> >
> > In the IR dump, I find it is easier to read
> > 1) percentage,
> > 2) M/N form when N is less than 10.
>
> I think we should always print (and check) for a percentage, but if you're
> debugging something, you need to see exactly what's in memory, and the
> hex is precise and easy to reason about.
>
> >
> > Current dump of 0x10ffff/0xffffffff form is hard to read -- can then
> > be simplified to make the denominator < 10?
> >
> >
> > Duncan, what is your opinion?
>
> If I'm going to be tracking down some bug by looking at DEBUG prints, I
> suspect I'll find 0xffffffff easier to reason about than 0x80000000.  So
> if we can keep UINT32_MAX, I'd prefer that.
>
> >
> > thanks,
> >
> > David
> >
> >
> >
> >
> >
> > On Thu, Sep 10, 2015 at 12:21 PM, Cong Hou <congh at google.com> wrote:
> >> congh updated this revision to Diff 34474.
> >> congh added a comment.
> >>
> >> Update the patch by fixing block frequency / branch probability unit
> tests failures.
> >>
> >> I am not sure if this is the right way to update the unit tests. The
> fixed-point representation has precision issues comparing to the previous
> rational representation and that is why we need to update the comparison
> right answers in unit tests.
> >>
> >>
> >> http://reviews.llvm.org/D12603
> >>
> >> Files:
> >>  include/llvm/Analysis/BlockFrequencyInfoImpl.h
> >>  include/llvm/Support/BranchProbability.h
> >>  lib/Support/BranchProbability.cpp
> >>  lib/Target/ARM/ARMBaseInstrInfo.cpp
> >>  test/Analysis/BlockFrequencyInfo/basic.ll
> >>  test/Analysis/BlockFrequencyInfo/loops_with_profile_info.ll
> >>  test/Analysis/BranchProbabilityInfo/basic.ll
> >>  test/Analysis/BranchProbabilityInfo/loop.ll
> >>  test/Analysis/BranchProbabilityInfo/noreturn.ll
> >>  test/Analysis/BranchProbabilityInfo/pr18705.ll
> >>  test/Analysis/BranchProbabilityInfo/pr22718.ll
> >>  test/CodeGen/AArch64/fast-isel-branch-cond-split.ll
> >>  test/CodeGen/ARM/2013-10-11-select-stalls.ll
> >>  test/CodeGen/ARM/ifcvt4.ll
> >>  test/CodeGen/ARM/sjlj-prepare-critical-edge.ll
> >>  test/CodeGen/ARM/tail-merge-branch-weight.ll
> >>  test/CodeGen/ARM/test-sharedidx.ll
> >>  test/CodeGen/Thumb2/thumb2-ifcvt1.ll
> >>  test/Transforms/SampleProfile/branch.ll
> >>  test/Transforms/SampleProfile/calls.ll
> >>  test/Transforms/SampleProfile/discriminator.ll
> >>  test/Transforms/SampleProfile/fnptr.ll
> >>  test/Transforms/SampleProfile/propagate.ll
> >>  unittests/Support/BlockFrequencyTest.cpp
> >>  unittests/Support/BranchProbabilityTest.cpp
> >>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150921/6d937642/attachment.html>


More information about the llvm-commits mailing list