<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Fri, Sep 11, 2015 at 5:34 PM, Duncan P. N. Exon Smith <span dir="ltr"><<a href="mailto:dexonsmith@apple.com" target="_blank">dexonsmith@apple.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span class=""><br>

> On 2015-Sep-10, at 14:15, Xinliang David Li <<a href="mailto:davidxl@google.com">davidxl@google.com</a>> wrote:<br>

><br>

> Cong,  It is more common to use power of 2 as the scaling factor in<br>

> fixed point representation for computation efficiency.  For 32bit<br>

> width, 0x80000000 is common (noted as Q31).  Here are the benefits<br>

><br>

> 1) it is much faster to do the scale operation with Q31 -- instead of<br>

> using the current slow version of scale, it can call a faster version<br>

> to be introduced.<br>

><br>

> This will probably help resolve bug<br>

> <a href="https://llvm.org/bugs/show_bug.cgi?id=24620" rel="noreferrer" target="_blank">https://llvm.org/bugs/show_bug.cgi?id=24620</a> where 20% of the time is<br>

> spent in the slow version of the scale method.<br>

<br>

</span>Is there a way to speed up scaling with UINT32_MAX?  I suspect so, but<br>

haven't worked it out.  (E.g., you could scale to UINT64_C(1) << 32 and<br>

ignore the first dither; if there's no dithering, subtract 1 from the<br>

last.)<br></blockquote><div><br></div><div>I found on X86 a division by  UINT32_MAX is translated into by both LLVM/GCC: </div><div><br></div><div><div>movabs $0x8000000080000001,%rdx</div><div>mul    %rdx</div><div>shr    $0x1f,%rdx</div></div><div><br></div><div>I think this is to keep the most precision. This is much faster than division but still has one more 64-bit move and one multiply instruction. The method proposed by you is also ok in terms that we don't need that much precision in branch probabilities, but this also means whether choosing UINT32_MAX or 1<<31 is a minor issue (consider how we use branch probabilities: if we only use it as a ratio to scale values for most precision, a rational class may be needed in LLVM).</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

<span class=""><br>

> 2) it allows negative fractional number to be represented<br>

<br>

</span>Not relevant for probabilities, though.<br>

<span class=""><br>

> 3) it is more precise for lots of common cases. For instance,<br>

> representation of 0.5, 0.25, 0.75 (i.e. 1/2, 1/4, 3/4) etc are<br>

> actually lossless with Q31 while the current choice of using<br>

> uint32_max can not represent them precisely.<br>

<br>

</span>It's less precise for lots of common cases, too.  2^32's only prime<br>

factor is 2, but 2^32-1 has 3, 5, 17, etc.<br>

<br>

This argument seems weak either way.<br></blockquote><div><br></div><div>A benefit to precisely represent 50% is that this is a common case in non-PGO mode for many branches, and hence this will have least impacts to results of those cases. This is not a strong argument though.</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

<span class=""><br>

><br>

> (if needed in the future, we can also extend this to be a template<br>

> class where number of bits in the scaling factor is a template<br>

> parameter to support values > 1<br>

>   template <int bits> class FixedPoint {<br>

>     static const int ScalingFactor = 1 << bits;<br>

> };<br>

> )<br>

<br>

</span>Sure, applies to either.<br>

<span class=""><br>

><br>

> In the IR dump, I find it is easier to read<br>

> 1) percentage,<br>

> 2) M/N form when N is less than 10.<br>

<br>

</span>I think we should always print (and check) for a percentage, but if you're<br>

debugging something, you need to see exactly what's in memory, and the<br>

hex is precise and easy to reason about.<br>

<span class=""><br>

><br>

> Current dump of 0x10ffff/0xffffffff form is hard to read -- can then<br>

> be simplified to make the denominator < 10?<br>

><br>

><br>

> Duncan, what is your opinion?<br>

<br>

</span>If I'm going to be tracking down some bug by looking at DEBUG prints, I<br>

suspect I'll find 0xffffffff easier to reason about than 0x80000000.  So<br>

if we can keep UINT32_MAX, I'd prefer that.<br>

<div class=""><div class="h5"><br>

><br>

> thanks,<br>

><br>

> David<br>

><br>

><br>

><br>

><br>

><br>

> On Thu, Sep 10, 2015 at 12:21 PM, Cong Hou <<a href="mailto:congh@google.com">congh@google.com</a>> wrote:<br>

>> congh updated this revision to Diff 34474.<br>

>> congh added a comment.<br>

>><br>

>> Update the patch by fixing block frequency / branch probability unit tests failures.<br>

>><br>

>> I am not sure if this is the right way to update the unit tests. The fixed-point representation has precision issues comparing to the previous rational representation and that is why we need to update the comparison right answers in unit tests.<br>

>><br>

>><br>

>> <a href="http://reviews.llvm.org/D12603" rel="noreferrer" target="_blank">http://reviews.llvm.org/D12603</a><br>

>><br>

>> Files:<br>

>>  include/llvm/Analysis/BlockFrequencyInfoImpl.h<br>

>>  include/llvm/Support/BranchProbability.h<br>

>>  lib/Support/BranchProbability.cpp<br>

>>  lib/Target/ARM/ARMBaseInstrInfo.cpp<br>

>>  test/Analysis/BlockFrequencyInfo/basic.ll<br>

>>  test/Analysis/BlockFrequencyInfo/loops_with_profile_info.ll<br>

>>  test/Analysis/BranchProbabilityInfo/basic.ll<br>

>>  test/Analysis/BranchProbabilityInfo/loop.ll<br>

>>  test/Analysis/BranchProbabilityInfo/noreturn.ll<br>

>>  test/Analysis/BranchProbabilityInfo/pr18705.ll<br>

>>  test/Analysis/BranchProbabilityInfo/pr22718.ll<br>

>>  test/CodeGen/AArch64/fast-isel-branch-cond-split.ll<br>

>>  test/CodeGen/ARM/2013-10-11-select-stalls.ll<br>

>>  test/CodeGen/ARM/ifcvt4.ll<br>

>>  test/CodeGen/ARM/sjlj-prepare-critical-edge.ll<br>

>>  test/CodeGen/ARM/tail-merge-branch-weight.ll<br>

>>  test/CodeGen/ARM/test-sharedidx.ll<br>

>>  test/CodeGen/Thumb2/thumb2-ifcvt1.ll<br>

>>  test/Transforms/SampleProfile/branch.ll<br>

>>  test/Transforms/SampleProfile/calls.ll<br>

>>  test/Transforms/SampleProfile/discriminator.ll<br>

>>  test/Transforms/SampleProfile/fnptr.ll<br>

>>  test/Transforms/SampleProfile/propagate.ll<br>

>>  unittests/Support/BlockFrequencyTest.cpp<br>

>>  unittests/Support/BranchProbabilityTest.cpp<br>

>><br>

<br>

</div></div></blockquote></div><br></div></div>