<div dir="ltr">Special casing infinite loops sounds reasonable for now. Maybe just stick to the current 4096 scale limit for them. Infinite loops do exist (e.g. in server code), but I don't yet have data to show how important it is to handle them more precisely.<div><br></div><div>thanks,</div><div><br></div><div>David</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Sun, Mar 29, 2015 at 11:02 AM, Duncan P. N. Exon Smith <span dir="ltr"><<a href="mailto:dexonsmith@apple.com" target="_blank">dexonsmith@apple.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5"><br>

> On 2015 Mar 27, at 11:52, Diego Novillo <<a href="mailto:dnovillo@google.com">dnovillo@google.com</a>> wrote:<br>

><br>

> I've been trying to get rid of the loop scale limiting problem during<br>

> BFI. Initially, this was saturating frequencies to the max side of the<br>

> scale, so a double nested loop would get max frequencies in all the<br>

> blocks (e.g., llvm/test/CodeGen/X86/lsr-i386.ll). This made the inner<br>

> loop no hotter than the outer loop, so block placement would not<br>

> bother aligning them.<br>

><br>

> In convertFloatingToInteger() we are scaling the BFI frequencies so<br>

> they fit an integer. The function tries to choose a scaling factor and<br>

> warns about being careful so RA doesn't get confused. It chooses a<br>

> scaling factor of 1 / Min, which almost always turns up to be 1. This<br>

> was causing me grief in the double nested loop case because the inner<br>

> loop had a freq of about 6e20 while the outer blocks had a frequency<br>

> of 2e19. With a scaling factor of 1, we were saturating everything to<br>

> UINT64_MAX.<br>

><br>

> I changed it so it uses a scaling factor that puts the frequencies in<br>

> [1, UINT32_MAX], but only if the Max frequency is outside that range.<br>

> This is causing two failures in the testsuite, which seem to be caused<br>

> by RA spilling differently. I believe that in CodeGen/X86/lsr-i386.ll<br>

> we are hoisting into the wrong loop now, but I'm not sure.<br>

><br>

> The other failure is in CodeGen/Thumb2/v8_IT_5.ll is different block<br>

> placement. Which is a bit odd. The frequencies given by my changes are<br>

> certainly different, but the body of the loop is given a<br>

> disproportionately larger frequency than the others (much like in the<br>

> original case).  Though, I think what's going on here is that my<br>

> changes are causing the smaller frequencies to be saturated down to 1:<br>

><br>

> Original:<br>

> float-to-int: min = 0.0000004768367035, max = 2047.994141, factor = 16777232.0<br>

><br>

> Printing analysis 'Block Frequency Analysis' for function 't':<br>

>  block-frequency-info: t<br>

>   - entry: float = 1.0, int = 16777232<br>

>   - if.then: float = 0.0000009536743164, int = 16<br>

>   - if.else: float = 0.9999990463, int = 16777216<br>

>   - if.then15: float = 0.0000009536734069, int = 16<br>

>   - if.else18: float = 0.9999980927, int = 16777200<br>

>   - if.then102: float = 0.0000009536734069, int = 16<br>

>   - cond.true10.i: float = 0.0000004768367035, int = 8<br>

>   - t.exit: float = 0.0000009536734069, int = 16<br>

>   - if.then115: float = 0.4999985695, int = 8388592<br>

>   - if.else145: float = 0.2499992847, int = 4194296<br>

>   - if.else163: float = 0.2499992847, int = 4194296<br>

>   - while.body172: float = 2047.994141, int = 34359672832<br>

><br>

><br>

>                                                           -<br>

> if.else173: float = 0.4999985695, int = 8388592<br>

><br>

><br>

><br>

> My patch:<br>

> float-to-int: min = 0.0000004768367035, max = 9223345648592486401.0,<br>

> factor = 0.0000000004656626195<br>

><br>

> block-frequency-info: t<br>

> - entry: float = 1.0, int = 1<br>

> - if.then: float = 0.0000009536743164, int = 1<br>

> - if.else: float = 0.9999990463, int = 1<br>

> - if.then15: float = 0.0000009536734069, int = 1<br>

> - if.else18: float = 0.9999980927, int = 1<br>

> - if.then102: float = 0.0000009536734069, int = 1<br>

> - cond.true10.i: float = 0.0000004768367035, int = 1<br>

> - t.exit: float = 0.0000009536734069, int = 1<br>

> - if.then115: float = 0.4999985695, int = 1<br>

> - if.else145: float = 0.2499992847, int = 1<br>

> - if.else163: float = 0.2499992847, int = 1<br>

> - while.body172: float = 9223345648592486401.0, int = 4294967295<br>

> - if.else173: float = 0.4999985695, int = 1<br>

><br>

><br>

><br>

> The scaling factor is so minuscule that I end up squashing every "low"<br>

> frequency to 1. I think I need to smooth this better. In the meantime,<br>

> I wanted to pick your brain. Maybe I'm completely off-base in my<br>

> analysis.<br>

><br>

><br>

> Thanks.  Diego.<br>

</div></div>> <0001-Remove-4-096-loop-scale-limitation.patch><br>

<br>

Your change to `convertFloatingToInteger()` looks basically correct to<br>

me.  If we need to saturate at one end or the other, it makes more sense<br>

to be saturating at the bottom.<br>

<br>

I think you can pick a higher number than UINT32_MAX.  I'd start with<br>

something like 2^60 and lower it (or fix backend optimizations to<br>

handle bigger numbers) from there if you hit problems.<br>

<br>

I'm also not sure the logic leftover for shifting the scale up makes<br>

sense after your change.  Maybe something like this:<br>

<br>

    constexpr unsigned MaxBits = 60;<br>

    unsigned SpreadBits = (Max / Min).lg();<br>

    Scaled64 ScalingFactor;<br>

    if (SpreadBits <= MaxBits - 3)<br>

      ScalingFactor = Min.inverse() << 3; // Make the minimum 8.<br>

    else if (SpreadBits <= MaxBits)<br>

      ScalingFactor = Min.inverse(); // Make the minimum 1.<br>

    else<br>

      ScalingFactor = Scaled64(1, MaxBits) / Max; // Saturate at 1.<br>

<br>

Infinite loops are a corner case, so I think we should just handle them<br>

specially.  One way to do this is to change `computeLoopScale()` to<br>

give an arbitrary finite scale to infinite loops:<br>

<br>

    // Choose an arbitrary, finite loop scale for infinite loops.<br>

    constexpr LoopScale Infinite(64, 0);<br>

<br>

    // LoopScale == 1 / ExitMass<br>

    // ExitMass == HeadMass - BackedgeMass<br>

    BlockMass ExitMass = BlockMass::getFull() - Loop.BackedgeMass;<br>

<br>

    // Block scale stores the inverse of the scale.<br>

    Loop.Scale = ExitMass.isEmpty() ? Infinite<br>

                                    : ExitMass.toScaled().inverse();<br>

<br>

How much does the exact loop scale matter anyway for infinite loops?</blockquote></div><br></div>