[LLVMdev] RFC: A plan for stateful alias analysis in LLVM

Thu Jul 16 13:35:29 PDT 2015

----- Original Message -----
> From: "Pete Cooper" <peter_cooper at apple.com>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: "Chandler Carruth" <chandlerc at google.com>, "Daniel Berlin" <dannyb at google.com>, "LLVM Developers Mailing List"
> <llvmdev at cs.uiuc.edu>
> Sent: Thursday, July 16, 2015 3:32:43 PM
> Subject: Re: [LLVMdev] RFC: A plan for stateful alias analysis in LLVM
> 
> 
> > On Jul 16, 2015, at 12:36 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> > 
> > ----- Original Message -----
> >> From: "Chandler Carruth" <chandlerc at google.com>
> >> To: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu>, "Hal
> >> Finkel" <hfinkel at anl.gov>, "Daniel Berlin"
> >> <dannyb at google.com>
> >> Sent: Monday, July 13, 2015 9:59:10 PM
> >> Subject: RFC: A plan for stateful alias analysis in LLVM
> >> 
> >> 
> >> 
> >> # Front matter # First, I want to emphasize that this is not a
> >> short-term plan. This is a long-term plan. When I was looking at
> >> refactoring the existing Alias Analysis infrastructure in LLVM in
> >> ways that would break stateful AA, Hal Finkel asked me to figure
> >> out
> >> how things would work with the new pass manager and make sure that
> >> was sane.
> >> 
> >> 
> >> This plan assumes the new pass manager. There is nothing deeply or
> >> intrinsically tied to it, but the pieces will fit together
> >> naturally
> >> there, and I don't see how to make them fit naturally in the
> >> existing pass manager. I've not invested a lot of time in figuring
> >> that out as it seems a sunk cost.
> >> 
> >> 
> >> # Core ideas of what stateful AA should look like: #
> >> 1) An alias analysis *is* an analysis pass, and if it needs to
> >> build
> >> up state, it must do so for a "unit" of IR at a time (using the
> >> new
> >> PM's terminology).
> >> 2) Per-IR-unit invalidation from the pass manager is the primary
> >> mechanism to clear out stale (and thus invalid) state from any
> >> alias
> >> analysis. That means that most passes will not preserve AA by
> >> default (unlike today).
> >> 3) To be reasonably scalable, most alias analyses which retain
> >> state
> >> will need to handle these invalidation events in a way that
> >> gracefully degrades their results rather than wiping out all
> >> state.
> >> 
> >> 
> >> I think everything in-tree but GlobalsModRef fits these
> >> requirements
> >> already. Notably, this will work well for CFL, Andersen, or
> >> generally any "intersection" based AA. A "union" based AA could
> >> work
> >> here, but would have to do considerable work to be able to
> >> tolerate
> >> updates that "invalidate" its state without simply forced
> >> recomputation. I think this is a reasonable tradeoff as it
> >> preserves
> >> our ability to experiment, while simplifying the update model.
> >> 
> >> 
> >> The rest of this email contains much more detailed musing on this.
> >> Feel free to stop reading when necessary.
> >> 
> >> 
> >> The last section touches on the core functionality that would be
> >> needed to support something like GlobalsModRef in its current
> >> form.
> >> I'm much less confident in this part, so it's not part of my core
> >> requirements above.
> >> 
> >> 
> >> # Details about analysis passes and pass manager invalidation #
> >> My thought is that an alias analysis pass in the new model would
> >> be
> >> an analysis pass. It would produce an analysis result that
> >> contained
> >> any state necessary to vend AA queries. This result would be
> >> cached
> >> just like all the other analysis results are in the pass manager.
> >> It
> >> would have the same invalidation rules too.
> >> 
> >> 
> >> A really important consequence of this is that analysis results in
> >> the new scheme get notified of invalidation events and can attempt
> >> to update their internal state to *avoid* becoming invalid. AAs
> >> can
> >> leverage this to do fast incremental updates to their internal
> >> structures if appropriate.
> >> 
> >> 
> >> An AA could even use ValueHandles now to automatically stay
> >> up-to-date with a conservatively correct data structure, and just
> >> drop all the invalidation requests. In fact, I suspect many
> >> stateful
> >> AAs will take exactly this approach.
> >> 
> >> 
> >> # How will users of AA information query it? #
> >> This will work much like the normal analysis queries. I plan to
> >> provide an alias analysis manager at each IR unit that will
> >> provide
> >> a single query interface for code to use. This layer can then be
> >> configured to query any AnalysisManager<T> for an IR unit T for
> >> the
> >> relevant actual AliasAnalysis implementations.
> >> 
> >> 
> >> The AA manager layer will be responsible for the delegation that
> >> is
> >> currently done through chaining of AA implementations. It will
> >> also
> >> be responsible for enforcing that queries at a particular layer
> >> don't violate IR unit abstraction boundaries. For example, you
> >> cannot query a FuntionAAManager for alias information of two
> >> Values
> >> from two different functions.
> > 
> > One thing I'd like to clarify is what is responsible for
> > 'requiring' the various alias-analysis pass, so that they'll be
> > recomputed if they're invalidated. So if I have an AliasManager,
> > and I've requested that it manage a BasicAA and an SCEVAA (for
> > example), and something happens such that SCEVAA is invalidated,
> > then before future AA queries are processed, the AliasManager is
> > responsible for triggering the re-computation of SCEVAA as part of
> > the chaining process (or before the first AA query). That's my
> > expectation anyway; does that match this model?
> That could get expensive if we keep recomputing AA we end up not
> needing.  We might need to try find a way to be lazier about it.

I don't disagree, but to be clear, I'm envisioning some version of SCEVAA that is otherwise efficient enough to be used within the standard pipeline.

 -Hal

> 
> Take TBAA for example, if it required any kind of computation, it
> would be better to delay it until we actually query AA on a
> load/store which has the metadata.  I’m pretty sure TBAA doesn’t
> have state to recompute, but imagine it did and hopefully what i
> said makes sense.
> 
> Cheers,
> Pete
> > 
> > Thanks again,
> > Hal
> > 
> >> 
> >> 
> >> # Why GlobalsModRef doesn't fit, and what we could do about it #
> >> There is a fundamental construct in GlobalsModRef that exists
> >> today,
> >> and seems somewhat useful, but which doesn't fit the above: using
> >> a
> >> global analysis of which memory locations are *not* escaped by any
> >> function, in order to conclude that within a given function,
> >> arguments, returns, and loads from globals cannot produce an
> >> aliasing memory location. Thus, if you always directly index or
> >> manipulate a global within a function, and it never escapes that
> >> function (including through integer hacks), you know that some
> >> other
> >> function can't produce an aliasing location by loading a pointer
> >> from a global or receiving a pointer in an argument.
> >> 
> >> 
> >> In *practice* this is actually totally reasonable. Passes don't
> >> run
> >> around escaping pointers just because they feel like it. However,
> >> we
> >> don't have a good way to really formally reason about such things.
> >> 
> >> 
> >> My suggested way to reason about it is to define a new constraint
> >> on
> >> transformation passes over an IR unit T (ie, Function, Module,
> >> etc.). This constraint is that a pass may not *observably* escape
> >> a
> >> pointer value P from the unit being transformed. Now, "observably"
> >> is the key word here. The goal is to still allow fundamental
> >> transformations like Reg2Mem and instrumentation passes. However,
> >> the guarantee must be that the passes do not introduce new ways to
> >> observe an escape of a pointer. Because any actual escapes cannot
> >> be
> >> observed, it is safe for them to happen and for some other
> >> function
> >> pass (for example) to transform based on the non-escaping property
> >> computed prior to the transformation. I *think* this is a
> >> reasonable
> >> model, but I've not really had enough time thinking about it to be
> >> 100% certain.
> >> 
> >> 
> >> With such a model, we could regain most of the power of
> >> GlobalsModRef
> >> w.r.t. non-escaping pointers and interprocedural aliasing
> >> inference.
> >> We would change it to be a two-pronged analysis. On one hand, a
> >> local analysis would have to prove that one location came from a
> >> module-level memory location (global) and the other location came
> >> from some dynamic input to the function (argument, return, or a
> >> load
> >> of memory that is def'ed outside the function), and then the other
> >> hand would provide the interprocedural information about what
> >> global
> >> memory locations were non-escaping.
> >> 
> >> 
> >> 
> >> 
> >> Ok, those are all the thoughts I have currently about AA, and
> >> where
> >> I'm headed. This should hopefully provide context for some of the
> >> upcoming patches.
> >> 
> >> 
> >> I'm also going to write a separate email about specific tactical
> >> problems I've found with GlobalsModRef today (nothing to do with
> >> the
> >> above theoretical problems) and how I plan to approach them.
> >> 
> >> 
> >> Thanks,
> >> -Chandler
> > 
> > --
> > Hal Finkel
> > Assistant Computational Scientist
> > Leadership Computing Facility
> > Argonne National Laboratory
> > _______________________________________________
> > LLVM Developers mailing list
> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory