[PATCH] Remove 4,096 loop scale limitation.
Duncan P. N. Exon Smith
dexonsmith at apple.com
Tue Mar 31 06:58:03 PDT 2015
> On 2015 Mar 31, at 05:07, Diego Novillo <dnovillo at google.com> wrote:
>
> Hi dexonsmith,
>
> This is part 1 of fixes to address the problems described in
> https://llvm.org/bugs/show_bug.cgi?id=22719.
>
> The restriction to limit loop scales to 4,096 does not really prevent
> overflows anymore, as the underlying algorithm has changed and does
> not seem to suffer from this problem.
>
> Additionally, artificially restricting loop scales to such a low number
> skews frequency information, making loops of equal hotness appear to
> have very different hotness properties.
>
> The only loops that are artificially restricted to a scale of 4096 are
> infinite loops (those loops with an exit mass of 0). This prevents
> infinite loops from skewing the frequencies of other regions in the CFG.
>
> At the end of propagation, frequencies are scaled to values that take no
> more than 32 bits to represent. This is to prevent issues with
> transformations that cannot deal with large numbers.
>
> Tested on x86_64.
>
This LGTM with some minor comments below.
> http://reviews.llvm.org/D8718
>
> Files:
> include/llvm/Analysis/BlockFrequencyInfoImpl.h
> lib/Analysis/BlockFrequencyInfoImpl.cpp
> test/Analysis/BlockFrequencyInfo/bad_input.ll
> test/Analysis/BlockFrequencyInfo/loops_with_profile_info.ll
> Index: include/llvm/Analysis/BlockFrequencyInfoImpl.h
> ===================================================================
> --- include/llvm/Analysis/BlockFrequencyInfoImpl.h
> +++ include/llvm/Analysis/BlockFrequencyInfoImpl.h
> @@ -718,9 +718,6 @@
> ///
> /// It has some known flaws.
> ///
> -/// - Loop scale is limited to 4096 per loop (2^12) to avoid exhausting
> -/// BlockFrequency's 64-bit integer precision.
> -///
Nice :).
> /// - The model of irreducible control flow is a rough approximation.
> ///
> /// Modelling irreducible control flow exactly involves setting up and
> Index: lib/Analysis/BlockFrequencyInfoImpl.cpp
> ===================================================================
> --- lib/Analysis/BlockFrequencyInfoImpl.cpp
> +++ lib/Analysis/BlockFrequencyInfoImpl.cpp
> @@ -331,32 +331,35 @@
> return true;
> }
>
> -/// \brief Get the maximum allowed loop scale.
> -///
> -/// Gives the maximum number of estimated iterations allowed for a loop. Very
> -/// large numbers cause problems downstream (even within 64-bits).
> -static Scaled64 getMaxLoopScale() { return Scaled64(1, 12); }
> -
> /// \brief Compute the loop scale for a loop.
> void BlockFrequencyInfoImplBase::computeLoopScale(LoopData &Loop) {
> // Compute loop scale.
> DEBUG(dbgs() << "compute-loop-scale: " << getLoopName(Loop) << "\n");
>
> + // Infinite loops need special handling. If we give the back edge an infinite
> + // mass, they may saturate all the other scales in the function down to 1,
> + // making all the other region temperatures look exactly the same. Choose an
> + // arbitrary scale to avoid these issues.
> + //
> + // TODO: An alternate way would be to select a symbolic scale which is later
> + // replaced to be the maximum of all computed scales plus 1. This would
> + // appropriately describe the loop as having a large scale, without skewing
> + // the final frequency computation.
> + const Scaled64 InifiniteLoopScale(1, 12);
> +
> // LoopScale == 1 / ExitMass
> // ExitMass == HeadMass - BackedgeMass
> BlockMass ExitMass = BlockMass::getFull() - Loop.BackedgeMass;
>
> - // Block scale stores the inverse of the scale.
> - Loop.Scale = ExitMass.toScaled().inverse();
> + // Block scale stores the inverse of the scale. If this is an
Line seems short. Reflow?
> + // infinite loop, its exit mass will be zero. In this case, use an
> + // arbitrary scale for the loop scale.
> + Loop.Scale =
> + ExitMass.isEmpty() ? InifiniteLoopScale : ExitMass.toScaled().inverse();
>
> DEBUG(dbgs() << " - exit-mass = " << ExitMass << " (" << BlockMass::getFull()
> << " - " << Loop.BackedgeMass << ")\n"
> << " - scale = " << Loop.Scale << "\n");
> -
> - if (Loop.Scale > getMaxLoopScale()) {
> - Loop.Scale = getMaxLoopScale();
> - DEBUG(dbgs() << " - reduced-to-max-scale: " << getMaxLoopScale() << "\n");
> - }
> }
>
> /// \brief Package up a loop.
> @@ -427,12 +430,19 @@
> // differentiated. However, the register allocator currently deals poorly
> // with large numbers. Instead, push Min up a little from 1 to give some
> // room to differentiate small, unequal numbers.
Scaling based on `Min` is probably fine long-term (when it doesn't cause
saturation at the top). Maybe this comment should be reworked?
BTW, I haven't looked recently, but I suspect the problem downstream is
that some optimizations add frequencies together and don't deal well
with `BlockFrequency` saturating at UINT64_MAX (or possibly don't use
`BlockFrequency` for math and don't expect overflows). Doesn't seem
like it would be too hard to expose some API similar to
`Distribution::normalize()` and fix them.
> - //
> - // TODO: fix issues downstream so that ScalingFactor can be
> - // Scaled64(1,64)/Max.
> + const unsigned MaxBits = 32;
Given that we have 64-bits available (and in the absence of data), I
would have chosen a large range (such as 60) and reduced it as necessary
(and/or fixed optimizations that couldn't handle it). Is there a reason
you chose 32?
> + const unsigned SpreadBits = (Max / Min).lg();
> Scaled64 ScalingFactor = Min.inverse();
`Min.inverse()` uses long division; it kind of feels wrong to compute it
just to ignore the result (even though this wouldn't show up in a
profile). Up to you though. Maybe frugality makes the code too ugly.
> - if ((Max / Min).lg() < 60)
> + if (SpreadBits <= MaxBits - 3) {
> + // If the values are small enough, make the scaling factor at least 8 to
> + // allow distinguishing small values.
> ScalingFactor <<= 3;
> + } else if (SpreadBits > MaxBits) {
> + // If the values need more than MaxBits to be represented, saturate small
> + // frequency values down to 1 by using a scaling factor that benefits large
> + // frequency values.
> + ScalingFactor = Scaled64(1, MaxBits) / Max;
> + }
>
> // Translate the floats to integers.
> DEBUG(dbgs() << "float-to-int: min = " << Min << ", max = " << Max
More information about the llvm-commits
mailing list