[llvm-commits] [PATCH] Teach the inline cost analysis to forward stores of constants in the caller to loads in the callee.

Sat Dec 29 01:05:00 PST 2012

On Fri, Dec 28, 2012 at 11:55 PM, Chris Lattner <clattner at apple.com> wrote:

> On Dec 27, 2012, at 9:53 AM, Chandler Carruth <chandlerc at gmail.com> wrote:
> >  Fix typos spotted by Joey Gouly.
> >
> >  Thanks for the review!
> >
> > http://llvm-reviews.chandlerc.com/D246
> >
> > CHANGE SINCE LAST DIFF
> >  http://llvm-reviews.chandlerc.com/D246?vs=598&id=600#toc
>
> Hey Chandler,
>
> Thanks for tackling this, getting this would be a really great improvement
> in the inliner!  This review is based on a quick look over the patch, I
> apologize if I misunderstand how it's working.
>
> My big concern about this is that you've written this in a very general
> way, using Alias Analysis and some amount of brute force to check the sets
> of values using the interval map.  I'm concerned about compile time impact
> and also the complexity/fragility of the code.
>

I'm concerned about the complexity too, but I suspect it is worth a certain
amount of complexity. I also have quite a few ideas that will simplify
things but that really should be follow-up work. For example, a lot of the
current complexity stems from me optimizing for a dense, small, and fast
implementation. I actually think one way to make this easier to understand
is to build a reasonable store-forwarding abstraction that can be used in
any pass that needs the information, and give it a nice interface. That
will isolate the logic away from the inliner and make everything a bit
easier to understand.

Anyways, that's essentially orthogonal to your ideas below. On the the
particulars...

> In practice, the inliner is going to be using BasicAA which is going to
> get "uniquely identified" objects and maybe TBAA information.

Yes, I've been essentially testing it with BasicAA, as it seems to do
everything I want (essentially, partitioning obviously non-aliased
pointers).

> Also, in practice, the most common cases will be "type safe" without
> terrible pointer casting and stuff (I say this without actually looking at
> the std::vector code, I hope its true!).
>

I'm actually not at all sure this is true. I see a large number of pointer
casting and other cruft that makes it quite hard to analyze these patterns
without some reasonably powerful machinery. More details on this below.

>
> Would it be enough to dramatically simplify this down to something like
> the following:
> 1. When looking at a call site, determine which arguments are known to be
> "uniquely identified" distinct objects.  These are your candidates for
> "store forwarding".  In practice, to handle the common "this" case, you
> should handle "uniquely identified objects" and at most one other
> non-uniquely identified object (commonly "this"), which still makes the
> aliasing trivial.
>

I actually started with this exact idea. However, Richard Smith had a
comment that convinced me go for something a touch more general to
specifically handle the cases of wrapper objects like rich iterators or
anything that looks like a pimpl. There we have a pointer to a distinct
object, and the first store we want to forward is a member pointer to some
other object. Then we want to forward stores of that object....

Essentially, if we're going to do this, I'd like to do in a generic way,
and limit the extent of the analysis rather than the scope. That way
changing from pattern A to B in your code doesn't surprisingly decrease the
optimizations, but adding another layer of indirection may. Hopefully
that's less suprising... it is to me.

> 2. Do a backwards scan from the callsite to collect stores, like you're
> doing in collectConstantOffsetStores.  It looks like
> collectConstantOffsetStores can be made to bail out earlier than your scan
> does right now (stopping when it finds a store to an unknown-alias of the
> aliased "this" pointer).
>

I'd like to improve the backward scan as well, but I'm already hitting
cases in std::vector where we don't scan far enough. =/ That said, I'd like
to think about this and other issues with the backward scan and see if I
can simplify it some, irrespective of the rest of this discussion. This was
really a rough cut first attempt.

3. I'd simplify your datastructure tracking stores (instead of using
> intervals) to only handle the "type safe" (+ memset as a followon patch, by
> scalarizing stores) cases, and use GEP paths (similar to what the argument
> promotion pass does).  What I mean is to just assume that the pointer type
> in the call is "correct" and look for stores to members of that LLVM IR
> type.  This means that the AvailableStores mapping is just
> "pair<Pointer+ConstantOffset> -> Constant*" since there can only be one
> (non-empty!) field in an IR type at a given offset, and using the argument
> pointer type will give you a unique and usually-good LLVM type for the
> constant to use.
>

I spent a lot of time thinking about this. I don't think we can use the IR
types at all, for the same reason SROA can't: the IR types of pointers are
lies. Consider the simple case of the small string optimization: there is a
pointer element and a character array occupying the same slot in the IR
type. Which do we use? Either will be wrong a decent percentage of the
time. We see this with every union, every character array with placement
new, every chunk of data allocated past the end of a class, etc.

I also don't think this will simplify very much. The interval map is a
pretty excellent way to track these, and it should be quite fast. Most of
the rough edges could be fixed by making a proper available stores
abstraction.

4. Determine which loads are candidates for forwarding in the callee, and
> compute which available stores get invalidated in the call.  When analyzing
> the callee body, you now know that the "unique" pointers coming in are
> non-aliasing with anything else, so the aliasing code logic is
> straight-forward.  To optimize for compile time, check to see if the callee
> is "readonly" which means it can't mutate any arguments (actually, is this
> true?  Can't a readonly function mutate something in the caller and then
> restore it before returning?).

I'm not yet sure what simplifications you're aiming for here. I'll have to
take a more detailed look at the code to see if there are general things
that could be done to simplify this.

One problem I ran into when testing this was with recursive functions. You
get into situations where the same alloca is in the caller and the callee.
I'm not even sure I've gotten that case correct yet. =/ In general you can
have the same base pointer (Value *) in both the caller and callee, and in
some cases, they don't alias (alloca), in some case the may alias (random
pointers returned from functions), and in some case they are the same
pointer (arguments passed along). Sadly, both the first and the last happen
quite frequently, and so are very worthwhile to handle with care.

I'm skeptical of using the attributes for the reason you highlight.

To actually compute the invalidations (and therefore the forwardable loads)
> I think you might have to use a depth first walk (separate from the breadth
> first search the instvisitor uses) to avoid losing information.
>

I'm interested in the right iteration order too... Note that we don't use
the instvisitor iteration order at all, we use our own CFG-driven walk, and
it's tricky because the CFG of the function is not actually the CFG we walk
-- a primary simplification we're performing is pruning dead CFG edges.

I'll look into simplifying this some, but again, I'm not certain how much
simplification will be practical and still catch the various patterns. Do
you think this is worth a certain amount of complexity? (Hopefully at least
factored into a less fragile set of tools and abstractions...)

Clearly, the compile time cost is a concern, but that also needs thorough
benchmarking. This was a very early stage patch. =]

Thanks for all the thoughts on this!
-Chandler
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20121229/71797e01/attachment.html>