[LLVMdev] RFC: A plan for stateful alias analysis in LLVM

Thu Jul 16 13:32:43 PDT 2015

> On Jul 16, 2015, at 12:36 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> 
> ----- Original Message -----
>> From: "Chandler Carruth" <chandlerc at google.com>
>> To: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu>, "Hal Finkel" <hfinkel at anl.gov>, "Daniel Berlin"
>> <dannyb at google.com>
>> Sent: Monday, July 13, 2015 9:59:10 PM
>> Subject: RFC: A plan for stateful alias analysis in LLVM
>> 
>> 
>> 
>> # Front matter # First, I want to emphasize that this is not a
>> short-term plan. This is a long-term plan. When I was looking at
>> refactoring the existing Alias Analysis infrastructure in LLVM in
>> ways that would break stateful AA, Hal Finkel asked me to figure out
>> how things would work with the new pass manager and make sure that
>> was sane.
>> 
>> 
>> This plan assumes the new pass manager. There is nothing deeply or
>> intrinsically tied to it, but the pieces will fit together naturally
>> there, and I don't see how to make them fit naturally in the
>> existing pass manager. I've not invested a lot of time in figuring
>> that out as it seems a sunk cost.
>> 
>> 
>> # Core ideas of what stateful AA should look like: #
>> 1) An alias analysis *is* an analysis pass, and if it needs to build
>> up state, it must do so for a "unit" of IR at a time (using the new
>> PM's terminology).
>> 2) Per-IR-unit invalidation from the pass manager is the primary
>> mechanism to clear out stale (and thus invalid) state from any alias
>> analysis. That means that most passes will not preserve AA by
>> default (unlike today).
>> 3) To be reasonably scalable, most alias analyses which retain state
>> will need to handle these invalidation events in a way that
>> gracefully degrades their results rather than wiping out all state.
>> 
>> 
>> I think everything in-tree but GlobalsModRef fits these requirements
>> already. Notably, this will work well for CFL, Andersen, or
>> generally any "intersection" based AA. A "union" based AA could work
>> here, but would have to do considerable work to be able to tolerate
>> updates that "invalidate" its state without simply forced
>> recomputation. I think this is a reasonable tradeoff as it preserves
>> our ability to experiment, while simplifying the update model.
>> 
>> 
>> The rest of this email contains much more detailed musing on this.
>> Feel free to stop reading when necessary.
>> 
>> 
>> The last section touches on the core functionality that would be
>> needed to support something like GlobalsModRef in its current form.
>> I'm much less confident in this part, so it's not part of my core
>> requirements above.
>> 
>> 
>> # Details about analysis passes and pass manager invalidation #
>> My thought is that an alias analysis pass in the new model would be
>> an analysis pass. It would produce an analysis result that contained
>> any state necessary to vend AA queries. This result would be cached
>> just like all the other analysis results are in the pass manager. It
>> would have the same invalidation rules too.
>> 
>> 
>> A really important consequence of this is that analysis results in
>> the new scheme get notified of invalidation events and can attempt
>> to update their internal state to *avoid* becoming invalid. AAs can
>> leverage this to do fast incremental updates to their internal
>> structures if appropriate.
>> 
>> 
>> An AA could even use ValueHandles now to automatically stay
>> up-to-date with a conservatively correct data structure, and just
>> drop all the invalidation requests. In fact, I suspect many stateful
>> AAs will take exactly this approach.
>> 
>> 
>> # How will users of AA information query it? #
>> This will work much like the normal analysis queries. I plan to
>> provide an alias analysis manager at each IR unit that will provide
>> a single query interface for code to use. This layer can then be
>> configured to query any AnalysisManager<T> for an IR unit T for the
>> relevant actual AliasAnalysis implementations.
>> 
>> 
>> The AA manager layer will be responsible for the delegation that is
>> currently done through chaining of AA implementations. It will also
>> be responsible for enforcing that queries at a particular layer
>> don't violate IR unit abstraction boundaries. For example, you
>> cannot query a FuntionAAManager for alias information of two Values
>> from two different functions.
> 
> One thing I'd like to clarify is what is responsible for 'requiring' the various alias-analysis pass, so that they'll be recomputed if they're invalidated. So if I have an AliasManager, and I've requested that it manage a BasicAA and an SCEVAA (for example), and something happens such that SCEVAA is invalidated, then before future AA queries are processed, the AliasManager is responsible for triggering the re-computation of SCEVAA as part of the chaining process (or before the first AA query). That's my expectation anyway; does that match this model?
That could get expensive if we keep recomputing AA we end up not needing.  We might need to try find a way to be lazier about it.

Take TBAA for example, if it required any kind of computation, it would be better to delay it until we actually query AA on a load/store which has the metadata.  I’m pretty sure TBAA doesn’t have state to recompute, but imagine it did and hopefully what i said makes sense.

Cheers,
Pete
> 
> Thanks again,
> Hal
> 
>> 
>> 
>> # Why GlobalsModRef doesn't fit, and what we could do about it #
>> There is a fundamental construct in GlobalsModRef that exists today,
>> and seems somewhat useful, but which doesn't fit the above: using a
>> global analysis of which memory locations are *not* escaped by any
>> function, in order to conclude that within a given function,
>> arguments, returns, and loads from globals cannot produce an
>> aliasing memory location. Thus, if you always directly index or
>> manipulate a global within a function, and it never escapes that
>> function (including through integer hacks), you know that some other
>> function can't produce an aliasing location by loading a pointer
>> from a global or receiving a pointer in an argument.
>> 
>> 
>> In *practice* this is actually totally reasonable. Passes don't run
>> around escaping pointers just because they feel like it. However, we
>> don't have a good way to really formally reason about such things.
>> 
>> 
>> My suggested way to reason about it is to define a new constraint on
>> transformation passes over an IR unit T (ie, Function, Module,
>> etc.). This constraint is that a pass may not *observably* escape a
>> pointer value P from the unit being transformed. Now, "observably"
>> is the key word here. The goal is to still allow fundamental
>> transformations like Reg2Mem and instrumentation passes. However,
>> the guarantee must be that the passes do not introduce new ways to
>> observe an escape of a pointer. Because any actual escapes cannot be
>> observed, it is safe for them to happen and for some other function
>> pass (for example) to transform based on the non-escaping property
>> computed prior to the transformation. I *think* this is a reasonable
>> model, but I've not really had enough time thinking about it to be
>> 100% certain.
>> 
>> 
>> With such a model, we could regain most of the power of GlobalsModRef
>> w.r.t. non-escaping pointers and interprocedural aliasing inference.
>> We would change it to be a two-pronged analysis. On one hand, a
>> local analysis would have to prove that one location came from a
>> module-level memory location (global) and the other location came
>> from some dynamic input to the function (argument, return, or a load
>> of memory that is def'ed outside the function), and then the other
>> hand would provide the interprocedural information about what global
>> memory locations were non-escaping.
>> 
>> 
>> 
>> 
>> Ok, those are all the thoughts I have currently about AA, and where
>> I'm headed. This should hopefully provide context for some of the
>> upcoming patches.
>> 
>> 
>> I'm also going to write a separate email about specific tactical
>> problems I've found with GlobalsModRef today (nothing to do with the
>> above theoretical problems) and how I plan to approach them.
>> 
>> 
>> Thanks,
>> -Chandler
> 
> -- 
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev