[LLVMdev] fix for loop scale limiting in BFI

Sun Mar 29 12:22:44 PDT 2015

Special casing infinite loops sounds reasonable for now. Maybe just stick
to the current 4096 scale limit for them. Infinite loops do exist (e.g. in
server code), but I don't yet have data to show how important it is to
handle them more precisely.

thanks,

David

On Sun, Mar 29, 2015 at 11:02 AM, Duncan P. N. Exon Smith <
dexonsmith at apple.com> wrote:

>
> > On 2015 Mar 27, at 11:52, Diego Novillo <dnovillo at google.com> wrote:
> >
> > I've been trying to get rid of the loop scale limiting problem during
> > BFI. Initially, this was saturating frequencies to the max side of the
> > scale, so a double nested loop would get max frequencies in all the
> > blocks (e.g., llvm/test/CodeGen/X86/lsr-i386.ll). This made the inner
> > loop no hotter than the outer loop, so block placement would not
> > bother aligning them.
> >
> > In convertFloatingToInteger() we are scaling the BFI frequencies so
> > they fit an integer. The function tries to choose a scaling factor and
> > warns about being careful so RA doesn't get confused. It chooses a
> > scaling factor of 1 / Min, which almost always turns up to be 1. This
> > was causing me grief in the double nested loop case because the inner
> > loop had a freq of about 6e20 while the outer blocks had a frequency
> > of 2e19. With a scaling factor of 1, we were saturating everything to
> > UINT64_MAX.
> >
> > I changed it so it uses a scaling factor that puts the frequencies in
> > [1, UINT32_MAX], but only if the Max frequency is outside that range.
> > This is causing two failures in the testsuite, which seem to be caused
> > by RA spilling differently. I believe that in CodeGen/X86/lsr-i386.ll
> > we are hoisting into the wrong loop now, but I'm not sure.
> >
> > The other failure is in CodeGen/Thumb2/v8_IT_5.ll is different block
> > placement. Which is a bit odd. The frequencies given by my changes are
> > certainly different, but the body of the loop is given a
> > disproportionately larger frequency than the others (much like in the
> > original case).  Though, I think what's going on here is that my
> > changes are causing the smaller frequencies to be saturated down to 1:
> >
> > Original:
> > float-to-int: min = 0.0000004768367035, max = 2047.994141, factor =
> 16777232.0
> >
> > Printing analysis 'Block Frequency Analysis' for function 't':
> >  block-frequency-info: t
> >   - entry: float = 1.0, int = 16777232
> >   - if.then: float = 0.0000009536743164, int = 16
> >   - if.else: float = 0.9999990463, int = 16777216
> >   - if.then15: float = 0.0000009536734069, int = 16
> >   - if.else18: float = 0.9999980927, int = 16777200
> >   - if.then102: float = 0.0000009536734069, int = 16
> >   - cond.true10.i: float = 0.0000004768367035, int = 8
> >   - t.exit: float = 0.0000009536734069, int = 16
> >   - if.then115: float = 0.4999985695, int = 8388592
> >   - if.else145: float = 0.2499992847, int = 4194296
> >   - if.else163: float = 0.2499992847, int = 4194296
> >   - while.body172: float = 2047.994141, int = 34359672832
> >
> >
> >                                                           -
> > if.else173: float = 0.4999985695, int = 8388592
> >
> >
> >
> > My patch:
> > float-to-int: min = 0.0000004768367035, max = 9223345648592486401.0,
> > factor = 0.0000000004656626195
> >
> > block-frequency-info: t
> > - entry: float = 1.0, int = 1
> > - if.then: float = 0.0000009536743164, int = 1
> > - if.else: float = 0.9999990463, int = 1
> > - if.then15: float = 0.0000009536734069, int = 1
> > - if.else18: float = 0.9999980927, int = 1
> > - if.then102: float = 0.0000009536734069, int = 1
> > - cond.true10.i: float = 0.0000004768367035, int = 1
> > - t.exit: float = 0.0000009536734069, int = 1
> > - if.then115: float = 0.4999985695, int = 1
> > - if.else145: float = 0.2499992847, int = 1
> > - if.else163: float = 0.2499992847, int = 1
> > - while.body172: float = 9223345648592486401.0, int = 4294967295
> > - if.else173: float = 0.4999985695, int = 1
> >
> >
> >
> > The scaling factor is so minuscule that I end up squashing every "low"
> > frequency to 1. I think I need to smooth this better. In the meantime,
> > I wanted to pick your brain. Maybe I'm completely off-base in my
> > analysis.
> >
> >
> > Thanks.  Diego.
> > <0001-Remove-4-096-loop-scale-limitation.patch>
>
> Your change to `convertFloatingToInteger()` looks basically correct to
> me.  If we need to saturate at one end or the other, it makes more sense
> to be saturating at the bottom.
>
> I think you can pick a higher number than UINT32_MAX.  I'd start with
> something like 2^60 and lower it (or fix backend optimizations to
> handle bigger numbers) from there if you hit problems.
>
> I'm also not sure the logic leftover for shifting the scale up makes
> sense after your change.  Maybe something like this:
>
>     constexpr unsigned MaxBits = 60;
>     unsigned SpreadBits = (Max / Min).lg();
>     Scaled64 ScalingFactor;
>     if (SpreadBits <= MaxBits - 3)
>       ScalingFactor = Min.inverse() << 3; // Make the minimum 8.
>     else if (SpreadBits <= MaxBits)
>       ScalingFactor = Min.inverse(); // Make the minimum 1.
>     else
>       ScalingFactor = Scaled64(1, MaxBits) / Max; // Saturate at 1.
>
> Infinite loops are a corner case, so I think we should just handle them
> specially.  One way to do this is to change `computeLoopScale()` to
> give an arbitrary finite scale to infinite loops:
>
>     // Choose an arbitrary, finite loop scale for infinite loops.
>     constexpr LoopScale Infinite(64, 0);
>
>     // LoopScale == 1 / ExitMass
>     // ExitMass == HeadMass - BackedgeMass
>     BlockMass ExitMass = BlockMass::getFull() - Loop.BackedgeMass;
>
>     // Block scale stores the inverse of the scale.
>     Loop.Scale = ExitMass.isEmpty() ? Infinite
>                                     : ExitMass.toScaled().inverse();
>
> How much does the exact loop scale matter anyway for infinite loops?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150329/a86e657d/attachment.html>