[LLVMdev] fix for loop scale limiting in BFI

Sun Mar 29 11:02:12 PDT 2015

> On 2015 Mar 27, at 11:52, Diego Novillo <dnovillo at google.com> wrote:
> 
> I've been trying to get rid of the loop scale limiting problem during
> BFI. Initially, this was saturating frequencies to the max side of the
> scale, so a double nested loop would get max frequencies in all the
> blocks (e.g., llvm/test/CodeGen/X86/lsr-i386.ll). This made the inner
> loop no hotter than the outer loop, so block placement would not
> bother aligning them.
> 
> In convertFloatingToInteger() we are scaling the BFI frequencies so
> they fit an integer. The function tries to choose a scaling factor and
> warns about being careful so RA doesn't get confused. It chooses a
> scaling factor of 1 / Min, which almost always turns up to be 1. This
> was causing me grief in the double nested loop case because the inner
> loop had a freq of about 6e20 while the outer blocks had a frequency
> of 2e19. With a scaling factor of 1, we were saturating everything to
> UINT64_MAX.
> 
> I changed it so it uses a scaling factor that puts the frequencies in
> [1, UINT32_MAX], but only if the Max frequency is outside that range.
> This is causing two failures in the testsuite, which seem to be caused
> by RA spilling differently. I believe that in CodeGen/X86/lsr-i386.ll
> we are hoisting into the wrong loop now, but I'm not sure.
> 
> The other failure is in CodeGen/Thumb2/v8_IT_5.ll is different block
> placement. Which is a bit odd. The frequencies given by my changes are
> certainly different, but the body of the loop is given a
> disproportionately larger frequency than the others (much like in the
> original case).  Though, I think what's going on here is that my
> changes are causing the smaller frequencies to be saturated down to 1:
> 
> Original:
> float-to-int: min = 0.0000004768367035, max = 2047.994141, factor = 16777232.0
> 
> Printing analysis 'Block Frequency Analysis' for function 't':
>  block-frequency-info: t
>   - entry: float = 1.0, int = 16777232
>   - if.then: float = 0.0000009536743164, int = 16
>   - if.else: float = 0.9999990463, int = 16777216
>   - if.then15: float = 0.0000009536734069, int = 16
>   - if.else18: float = 0.9999980927, int = 16777200
>   - if.then102: float = 0.0000009536734069, int = 16
>   - cond.true10.i: float = 0.0000004768367035, int = 8
>   - t.exit: float = 0.0000009536734069, int = 16
>   - if.then115: float = 0.4999985695, int = 8388592
>   - if.else145: float = 0.2499992847, int = 4194296
>   - if.else163: float = 0.2499992847, int = 4194296
>   - while.body172: float = 2047.994141, int = 34359672832
> 
> 
>                                                           -
> if.else173: float = 0.4999985695, int = 8388592
> 
> 
> 
> My patch:
> float-to-int: min = 0.0000004768367035, max = 9223345648592486401.0,
> factor = 0.0000000004656626195
> 
> block-frequency-info: t
> - entry: float = 1.0, int = 1
> - if.then: float = 0.0000009536743164, int = 1
> - if.else: float = 0.9999990463, int = 1
> - if.then15: float = 0.0000009536734069, int = 1
> - if.else18: float = 0.9999980927, int = 1
> - if.then102: float = 0.0000009536734069, int = 1
> - cond.true10.i: float = 0.0000004768367035, int = 1
> - t.exit: float = 0.0000009536734069, int = 1
> - if.then115: float = 0.4999985695, int = 1
> - if.else145: float = 0.2499992847, int = 1
> - if.else163: float = 0.2499992847, int = 1
> - while.body172: float = 9223345648592486401.0, int = 4294967295
> - if.else173: float = 0.4999985695, int = 1
> 
> 
> 
> The scaling factor is so minuscule that I end up squashing every "low"
> frequency to 1. I think I need to smooth this better. In the meantime,
> I wanted to pick your brain. Maybe I'm completely off-base in my
> analysis.
> 
> 
> Thanks.  Diego.
> <0001-Remove-4-096-loop-scale-limitation.patch>

Your change to `convertFloatingToInteger()` looks basically correct to
me.  If we need to saturate at one end or the other, it makes more sense
to be saturating at the bottom.

I think you can pick a higher number than UINT32_MAX.  I'd start with
something like 2^60 and lower it (or fix backend optimizations to
handle bigger numbers) from there if you hit problems.

I'm also not sure the logic leftover for shifting the scale up makes
sense after your change.  Maybe something like this:

    constexpr unsigned MaxBits = 60;
    unsigned SpreadBits = (Max / Min).lg();
    Scaled64 ScalingFactor;
    if (SpreadBits <= MaxBits - 3)
      ScalingFactor = Min.inverse() << 3; // Make the minimum 8.
    else if (SpreadBits <= MaxBits)
      ScalingFactor = Min.inverse(); // Make the minimum 1.
    else
      ScalingFactor = Scaled64(1, MaxBits) / Max; // Saturate at 1.

Infinite loops are a corner case, so I think we should just handle them
specially.  One way to do this is to change `computeLoopScale()` to
give an arbitrary finite scale to infinite loops:

    // Choose an arbitrary, finite loop scale for infinite loops.
    constexpr LoopScale Infinite(64, 0);

    // LoopScale == 1 / ExitMass
    // ExitMass == HeadMass - BackedgeMass
    BlockMass ExitMass = BlockMass::getFull() - Loop.BackedgeMass;

    // Block scale stores the inverse of the scale.
    Loop.Scale = ExitMass.isEmpty() ? Infinite
                                    : ExitMass.toScaled().inverse();

How much does the exact loop scale matter anyway for infinite loops?