[LLVMdev] RFC: A plan for stateful alias analysis in LLVM

Thu Jul 16 12:36:23 PDT 2015

----- Original Message -----
> From: "Chandler Carruth" <chandlerc at google.com>
> To: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu>, "Hal Finkel" <hfinkel at anl.gov>, "Daniel Berlin"
> <dannyb at google.com>
> Sent: Monday, July 13, 2015 9:59:10 PM
> Subject: RFC: A plan for stateful alias analysis in LLVM
> 
> 
> 
> # Front matter # First, I want to emphasize that this is not a
> short-term plan. This is a long-term plan. When I was looking at
> refactoring the existing Alias Analysis infrastructure in LLVM in
> ways that would break stateful AA, Hal Finkel asked me to figure out
> how things would work with the new pass manager and make sure that
> was sane.
> 
> 
> This plan assumes the new pass manager. There is nothing deeply or
> intrinsically tied to it, but the pieces will fit together naturally
> there, and I don't see how to make them fit naturally in the
> existing pass manager. I've not invested a lot of time in figuring
> that out as it seems a sunk cost.
> 
> 
> # Core ideas of what stateful AA should look like: #
> 1) An alias analysis *is* an analysis pass, and if it needs to build
> up state, it must do so for a "unit" of IR at a time (using the new
> PM's terminology).
> 2) Per-IR-unit invalidation from the pass manager is the primary
> mechanism to clear out stale (and thus invalid) state from any alias
> analysis. That means that most passes will not preserve AA by
> default (unlike today).
> 3) To be reasonably scalable, most alias analyses which retain state
> will need to handle these invalidation events in a way that
> gracefully degrades their results rather than wiping out all state.
> 
> 
> I think everything in-tree but GlobalsModRef fits these requirements
> already. Notably, this will work well for CFL, Andersen, or
> generally any "intersection" based AA. A "union" based AA could work
> here, but would have to do considerable work to be able to tolerate
> updates that "invalidate" its state without simply forced
> recomputation. I think this is a reasonable tradeoff as it preserves
> our ability to experiment, while simplifying the update model.
> 
> 
> The rest of this email contains much more detailed musing on this.
> Feel free to stop reading when necessary.
> 
> 
> The last section touches on the core functionality that would be
> needed to support something like GlobalsModRef in its current form.
> I'm much less confident in this part, so it's not part of my core
> requirements above.
> 
> 
> # Details about analysis passes and pass manager invalidation #
> My thought is that an alias analysis pass in the new model would be
> an analysis pass. It would produce an analysis result that contained
> any state necessary to vend AA queries. This result would be cached
> just like all the other analysis results are in the pass manager. It
> would have the same invalidation rules too.
> 
> 
> A really important consequence of this is that analysis results in
> the new scheme get notified of invalidation events and can attempt
> to update their internal state to *avoid* becoming invalid. AAs can
> leverage this to do fast incremental updates to their internal
> structures if appropriate.
> 
> 
> An AA could even use ValueHandles now to automatically stay
> up-to-date with a conservatively correct data structure, and just
> drop all the invalidation requests. In fact, I suspect many stateful
> AAs will take exactly this approach.
> 
> 
> # How will users of AA information query it? #
> This will work much like the normal analysis queries. I plan to
> provide an alias analysis manager at each IR unit that will provide
> a single query interface for code to use. This layer can then be
> configured to query any AnalysisManager<T> for an IR unit T for the
> relevant actual AliasAnalysis implementations.
> 
> 
> The AA manager layer will be responsible for the delegation that is
> currently done through chaining of AA implementations. It will also
> be responsible for enforcing that queries at a particular layer
> don't violate IR unit abstraction boundaries. For example, you
> cannot query a FuntionAAManager for alias information of two Values
> from two different functions.

One thing I'd like to clarify is what is responsible for 'requiring' the various alias-analysis pass, so that they'll be recomputed if they're invalidated. So if I have an AliasManager, and I've requested that it manage a BasicAA and an SCEVAA (for example), and something happens such that SCEVAA is invalidated, then before future AA queries are processed, the AliasManager is responsible for triggering the re-computation of SCEVAA as part of the chaining process (or before the first AA query). That's my expectation anyway; does that match this model?

Thanks again,
Hal

> 
> 
> # Why GlobalsModRef doesn't fit, and what we could do about it #
> There is a fundamental construct in GlobalsModRef that exists today,
> and seems somewhat useful, but which doesn't fit the above: using a
> global analysis of which memory locations are *not* escaped by any
> function, in order to conclude that within a given function,
> arguments, returns, and loads from globals cannot produce an
> aliasing memory location. Thus, if you always directly index or
> manipulate a global within a function, and it never escapes that
> function (including through integer hacks), you know that some other
> function can't produce an aliasing location by loading a pointer
> from a global or receiving a pointer in an argument.
> 
> 
> In *practice* this is actually totally reasonable. Passes don't run
> around escaping pointers just because they feel like it. However, we
> don't have a good way to really formally reason about such things.
> 
> 
> My suggested way to reason about it is to define a new constraint on
> transformation passes over an IR unit T (ie, Function, Module,
> etc.). This constraint is that a pass may not *observably* escape a
> pointer value P from the unit being transformed. Now, "observably"
> is the key word here. The goal is to still allow fundamental
> transformations like Reg2Mem and instrumentation passes. However,
> the guarantee must be that the passes do not introduce new ways to
> observe an escape of a pointer. Because any actual escapes cannot be
> observed, it is safe for them to happen and for some other function
> pass (for example) to transform based on the non-escaping property
> computed prior to the transformation. I *think* this is a reasonable
> model, but I've not really had enough time thinking about it to be
> 100% certain.
> 
> 
> With such a model, we could regain most of the power of GlobalsModRef
> w.r.t. non-escaping pointers and interprocedural aliasing inference.
> We would change it to be a two-pronged analysis. On one hand, a
> local analysis would have to prove that one location came from a
> module-level memory location (global) and the other location came
> from some dynamic input to the function (argument, return, or a load
> of memory that is def'ed outside the function), and then the other
> hand would provide the interprocedural information about what global
> memory locations were non-escaping.
> 
> 
> 
> 
> Ok, those are all the thoughts I have currently about AA, and where
> I'm headed. This should hopefully provide context for some of the
> upcoming patches.
> 
> 
> I'm also going to write a separate email about specific tactical
> problems I've found with GlobalsModRef today (nothing to do with the
> above theoretical problems) and how I plan to approach them.
> 
> 
> Thanks,
> -Chandler

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory