[llvm-dev] High memory use and LVI/Correlated Value Propagation

Thu Jan 14 12:05:54 PST 2016

----- Original Message ----- 

> From: "Daniel Berlin" <dberlin at dberlin.org>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: "Joerg Sonnenberger" <joerg at britannica.bec.de>, "llvm-dev"
> <llvm-dev at lists.llvm.org>
> Sent: Thursday, January 14, 2016 1:05:56 PM
> Subject: Re: [llvm-dev] High memory use and LVI/Correlated Value
> Propagation

> On Thu, Jan 14, 2016 at 10:32 AM, Hal Finkel < hfinkel at anl.gov >
> wrote:

> > ----- Original Message -----
> 

> > > From: "Daniel Berlin via llvm-dev" < llvm-dev at lists.llvm.org >
> 
> > > To: "Joerg Sonnenberger" < joerg at britannica.bec.de >, "llvm-dev"
> 
> > > < llvm-dev at lists.llvm.org >
> 
> > > Sent: Thursday, January 14, 2016 10:38:10 AM
> 
> > > Subject: Re: [llvm-dev] High memory use and LVI/Correlated Value
> 
> > > Propagation
> 

> > > On Thu, Jan 14, 2016 at 5:57 AM, Joerg Sonnenberger via llvm-dev
> > > <
> 
> > > llvm-dev at lists.llvm.org > wrote:
> 

> > > > On Wed, Jan 13, 2016 at 04:28:03PM -0800, Daniel Berlin wrote:
> 
> > >
> 
> > > > > On Wed, Jan 13, 2016 at 4:23 PM, Joerg Sonnenberger via
> > > > > llvm-dev
> 
> > > > > <
> 
> > >
> 
> > > > > llvm-dev at lists.llvm.org > wrote:
> 
> > >
> 
> > > > >
> 
> > >
> 
> > > > > > On Wed, Jan 13, 2016 at 03:38:24PM -0800, Philip Reames
> > > > > > wrote:
> 
> > >
> 
> > > > > > > I don't think that arbitrary limiting the complexity of
> > > > > > > the
> 
> > > > > > > search is the
> 
> > >
> 
> > > > > > > right approach. There are numerous ways the LVI
> 
> > > > > > > infrastructure
> 
> > > > > > > could be
> 
> > >
> 
> > > > > > > made more memory efficient. Fixing the existing code to
> > > > > > > be
> 
> > > > > > > memory
> 
> > >
> 
> > > > > > efficient
> 
> > >
> 
> > > > > > > is the right approach. Only once there's no more low
> > > > > > > hanging
> 
> > > > > > > fruit
> 
> > >
> 
> > > > > > should
> 
> > >
> 
> > > > > > > we even consider clamping the search.
> 
> > >
> 
> > > > > >
> 
> > >
> 
> > > > > > Memory efficiency is only half of the problem. I.e.
> > > > > > groonga's
> 
> > > > > > expr.c
> 
> > >
> 
> > > > > > needs 4m to build on my laptop, a 2.7GHz i7. That doesn't
> > > > > > sound
> 
> > >
> 
> > > > > > reasonable for a -O2. Unlike the GVN issue, the cases I
> > > > > > have
> 
> > > > > > run
> 
> > > > > > into do
> 
> > >
> 
> > > > > > finish after a(n unreasonable) while, so at least it is not
> 
> > > > > > trivially
> 
> > >
> 
> > > > > > superlinear.
> 
> > >
> 
> > > > > >
> 
> > >
> 
> > > > >
> 
> > >
> 
> > > > >
> 
> > >
> 
> > > > > Okay, so rather than artificially limit stuff, we should see
> > > > > if
> 
> > > > > we
> 
> > > > > can fix
> 
> > >
> 
> > > > > the efficiency of the algorithms.
> 
> > >
> 
> > > > >
> 
> > >
> 
> > > > > CVP is an O(N*lattice height) pass problem. It sounds like it
> > > > > is
> 
> > > > > being more
> 
> > >
> 
> > > > > than that for you.
> 
> > >
> 

> > > > I assume you mean something like #BB * #variables?
> 
> > >
> 
> > > No.
> 
> > > ;)
> 
> > > I mean the theoretical complexity of performing the optimization
> > > CVP
> 
> > > performs (regardless of how it is currently implemented).
> 

> > > What CVP does is doable in linear time on number of variables *
> 
> > > lattice height. It has some memory cost to do this.
> 

> > > Assume for a second the lattice height is fixed (IE we do not
> > > have
> 
> > > completely symbolic ranges).
> 

> > > It is possible, in linear time, to build an SSA-like form for the
> 
> > > ranges it is computing.
> 
> > > It is then possible, in linear time, to propagate those ranges to
> > > all
> 
> > > affected variables.
> 

> > > This is O(N) + O(N).
> 

> > > Doing so may change the ranges (they only get *better*, however).
> 
> > > So you may need to do this until they fall all the way down the
> 
> > > lattice.
> 

> > > This means you may repeat this lattice height number of times.
> 

> > What is the height of the lattice here?
> 
> The current LVI is a bit hard to analyze, i'm not sure it's really as
> fixed height as one would think.

> > I understand that it goes from top -> defined -> overdefined, and
> > that's three, but defined is not itself one state. Because it is
> > tracking integer ranges, those ranges could undergo 2^#bits
> > refinements in the worst case (thus making the lattice height quite
> > large), no?
> 

> If it tracks ranges on a per-bit basis, the max refinements for a
> given range is is O(bits * 3), assuming things do not go the wrong
> way on the lattice.

> IE either they are underdefined, or they are 1, or they are 0, or
> they are overdefined.

> The lattice is

> underdefined
> / \
> 0 1
> \ /
> overdefined

As a separate matter, we *should* also do this (i.e. have a fixed-point known-bits analysis), but we don't.

> The only way to end up with 2**n refinements is if things can go the
> other direction on the lattice.

> Note: This is the standard fixpoint algorithm (and what LVI is
> emulating, AFAIK). There are range constraint based solvers that are
> significantly more complicated, but that's not what LVI does.

Right. LVI handles only single constant-bounded ranges for each variable independently.

> (see https://www.cs.berkeley.edu/~daw/papers/range-tacas04.pdf , and
> the paper i cite below, which does a variant of that on llvm in
> pretty much linear time)

Interesting; thanks for the references! For both, the analysis collapses SCCs in the control flow and, while the SCC analysis is quadratic, SCCs are generally small so they don't observe a problem in practice. 

 -Hal

> Also, the worst case here is one where every variable is affected by
> a single range.

> (it's also possible to lazily construct the ssa form, and only update
> changed variables, in the same time bound. It's probably complicated
> enough to implement that it may not be worth it).

> In practice, you can do even complicated ranges very fast.

> http://homepages.dcc.ufmg.br/~fernando/publications/papers/CGO13_raphael.pdf

> (the time is basically linear in practice, even for large programs,
> code is on github)

> Ignore the part where they use it in the end for something else that
> we don't care about, etc.

> They are also solving for a much more complex set of ranges than we
> are talking about for CVP, and for all the variables, so they use a
> real range constraint solver in the end instead of a fixpoint with a
> fixed height lattice.

-- 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory