[llvm-dev] High memory use and LVI/Correlated Value Propagation
Hal Finkel via llvm-dev
llvm-dev at lists.llvm.org
Thu Jan 14 12:05:54 PST 2016
----- Original Message -----
> From: "Daniel Berlin" <dberlin at dberlin.org>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: "Joerg Sonnenberger" <joerg at britannica.bec.de>, "llvm-dev"
> <llvm-dev at lists.llvm.org>
> Sent: Thursday, January 14, 2016 1:05:56 PM
> Subject: Re: [llvm-dev] High memory use and LVI/Correlated Value
> Propagation
> On Thu, Jan 14, 2016 at 10:32 AM, Hal Finkel < hfinkel at anl.gov >
> wrote:
> > ----- Original Message -----
>
> > > From: "Daniel Berlin via llvm-dev" < llvm-dev at lists.llvm.org >
>
> > > To: "Joerg Sonnenberger" < joerg at britannica.bec.de >, "llvm-dev"
>
> > > < llvm-dev at lists.llvm.org >
>
> > > Sent: Thursday, January 14, 2016 10:38:10 AM
>
> > > Subject: Re: [llvm-dev] High memory use and LVI/Correlated Value
>
> > > Propagation
>
> > > On Thu, Jan 14, 2016 at 5:57 AM, Joerg Sonnenberger via llvm-dev
> > > <
>
> > > llvm-dev at lists.llvm.org > wrote:
>
> > > > On Wed, Jan 13, 2016 at 04:28:03PM -0800, Daniel Berlin wrote:
>
> > >
>
> > > > > On Wed, Jan 13, 2016 at 4:23 PM, Joerg Sonnenberger via
> > > > > llvm-dev
>
> > > > > <
>
> > >
>
> > > > > llvm-dev at lists.llvm.org > wrote:
>
> > >
>
> > > > >
>
> > >
>
> > > > > > On Wed, Jan 13, 2016 at 03:38:24PM -0800, Philip Reames
> > > > > > wrote:
>
> > >
>
> > > > > > > I don't think that arbitrary limiting the complexity of
> > > > > > > the
>
> > > > > > > search is the
>
> > >
>
> > > > > > > right approach. There are numerous ways the LVI
>
> > > > > > > infrastructure
>
> > > > > > > could be
>
> > >
>
> > > > > > > made more memory efficient. Fixing the existing code to
> > > > > > > be
>
> > > > > > > memory
>
> > >
>
> > > > > > efficient
>
> > >
>
> > > > > > > is the right approach. Only once there's no more low
> > > > > > > hanging
>
> > > > > > > fruit
>
> > >
>
> > > > > > should
>
> > >
>
> > > > > > > we even consider clamping the search.
>
> > >
>
> > > > > >
>
> > >
>
> > > > > > Memory efficiency is only half of the problem. I.e.
> > > > > > groonga's
>
> > > > > > expr.c
>
> > >
>
> > > > > > needs 4m to build on my laptop, a 2.7GHz i7. That doesn't
> > > > > > sound
>
> > >
>
> > > > > > reasonable for a -O2. Unlike the GVN issue, the cases I
> > > > > > have
>
> > > > > > run
>
> > > > > > into do
>
> > >
>
> > > > > > finish after a(n unreasonable) while, so at least it is not
>
> > > > > > trivially
>
> > >
>
> > > > > > superlinear.
>
> > >
>
> > > > > >
>
> > >
>
> > > > >
>
> > >
>
> > > > >
>
> > >
>
> > > > > Okay, so rather than artificially limit stuff, we should see
> > > > > if
>
> > > > > we
>
> > > > > can fix
>
> > >
>
> > > > > the efficiency of the algorithms.
>
> > >
>
> > > > >
>
> > >
>
> > > > > CVP is an O(N*lattice height) pass problem. It sounds like it
> > > > > is
>
> > > > > being more
>
> > >
>
> > > > > than that for you.
>
> > >
>
> > > > I assume you mean something like #BB * #variables?
>
> > >
>
> > > No.
>
> > > ;)
>
> > > I mean the theoretical complexity of performing the optimization
> > > CVP
>
> > > performs (regardless of how it is currently implemented).
>
> > > What CVP does is doable in linear time on number of variables *
>
> > > lattice height. It has some memory cost to do this.
>
> > > Assume for a second the lattice height is fixed (IE we do not
> > > have
>
> > > completely symbolic ranges).
>
> > > It is possible, in linear time, to build an SSA-like form for the
>
> > > ranges it is computing.
>
> > > It is then possible, in linear time, to propagate those ranges to
> > > all
>
> > > affected variables.
>
> > > This is O(N) + O(N).
>
> > > Doing so may change the ranges (they only get *better*, however).
>
> > > So you may need to do this until they fall all the way down the
>
> > > lattice.
>
> > > This means you may repeat this lattice height number of times.
>
> > What is the height of the lattice here?
>
> The current LVI is a bit hard to analyze, i'm not sure it's really as
> fixed height as one would think.
> > I understand that it goes from top -> defined -> overdefined, and
> > that's three, but defined is not itself one state. Because it is
> > tracking integer ranges, those ranges could undergo 2^#bits
> > refinements in the worst case (thus making the lattice height quite
> > large), no?
>
> If it tracks ranges on a per-bit basis, the max refinements for a
> given range is is O(bits * 3), assuming things do not go the wrong
> way on the lattice.
> IE either they are underdefined, or they are 1, or they are 0, or
> they are overdefined.
> The lattice is
> underdefined
> / \
> 0 1
> \ /
> overdefined
As a separate matter, we *should* also do this (i.e. have a fixed-point known-bits analysis), but we don't.
> The only way to end up with 2**n refinements is if things can go the
> other direction on the lattice.
> Note: This is the standard fixpoint algorithm (and what LVI is
> emulating, AFAIK). There are range constraint based solvers that are
> significantly more complicated, but that's not what LVI does.
Right. LVI handles only single constant-bounded ranges for each variable independently.
> (see https://www.cs.berkeley.edu/~daw/papers/range-tacas04.pdf , and
> the paper i cite below, which does a variant of that on llvm in
> pretty much linear time)
Interesting; thanks for the references! For both, the analysis collapses SCCs in the control flow and, while the SCC analysis is quadratic, SCCs are generally small so they don't observe a problem in practice.
-Hal
> Also, the worst case here is one where every variable is affected by
> a single range.
> (it's also possible to lazily construct the ssa form, and only update
> changed variables, in the same time bound. It's probably complicated
> enough to implement that it may not be worth it).
> In practice, you can do even complicated ranges very fast.
> http://homepages.dcc.ufmg.br/~fernando/publications/papers/CGO13_raphael.pdf
> (the time is basically linear in practice, even for large programs,
> code is on github)
> Ignore the part where they use it in the end for something else that
> we don't care about, etc.
> They are also solving for a much more complex set of ranges than we
> are talking about for CVP, and for all the variables, so they use a
> real range constraint solver in the end instead of a fixpoint with a
> fixed height lattice.
--
--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory
More information about the llvm-dev
mailing list