[llvm-commits] [PATCH] Teach the inline cost analysis to forward stores of constants in the caller to loads in the callee.
clattner at apple.com
Thu Jan 3 15:42:19 PST 2013
On Dec 29, 2012, at 1:05 AM, Chandler Carruth <chandlerc at google.com> wrote:
> In practice, the inliner is going to be using BasicAA which is going to get "uniquely identified" objects and maybe TBAA information.
> Yes, I've been essentially testing it with BasicAA, as it seems to do everything I want (essentially, partitioning obviously non-aliased pointers).
My point is that you can get the same result with a simpler implementation (not using AA at all), which might lead to follow-on simplifications.
> Would it be enough to dramatically simplify this down to something like the following:
> 1. When looking at a call site, determine which arguments are known to be "uniquely identified" distinct objects. These are your candidates for "store forwarding". In practice, to handle the common "this" case, you should handle "uniquely identified objects" and at most one other non-uniquely identified object (commonly "this"), which still makes the aliasing trivial.
> I actually started with this exact idea. However, Richard Smith had a comment that convinced me go for something a touch more general to specifically handle the cases of wrapper objects like rich iterators or anything that looks like a pimpl. There we have a pointer to a distinct object, and the first store we want to forward is a member pointer to some other object. Then we want to forward stores of that object….
I don't understand how the two are contradictory.
> Essentially, if we're going to do this, I'd like to do in a generic way, and limit the extent of the analysis rather than the scope. That way changing from pattern A to B in your code doesn't surprisingly decrease the optimizations, but adding another layer of indirection may. Hopefully that's less suprising... it is to me.
I don't see what basicaa has to do with this. If anything, doing your own custom "unique object" analysis is more powerful, because you can take advantage of already known values.
> 3. I'd simplify your datastructure tracking stores (instead of using intervals) to only handle the "type safe" (+ memset as a followon patch, by scalarizing stores) cases, and use GEP paths (similar to what the argument promotion pass does). What I mean is to just assume that the pointer type in the call is "correct" and look for stores to members of that LLVM IR type. This means that the AvailableStores mapping is just "pair<Pointer+ConstantOffset> -> Constant*" since there can only be one (non-empty!) field in an IR type at a given offset, and using the argument pointer type will give you a unique and usually-good LLVM type for the constant to use.
> I spent a lot of time thinking about this. I don't think we can use the IR types at all, for the same reason SROA can't: the IR types of pointers are lies. Consider the simple case of the small string optimization: there is a pointer element and a character array occupying the same slot in the IR type. Which do we use? Either will be wrong a decent percentage of the time. We see this with every union, every character array with placement new, every chunk of data allocated past the end of a class, etc.
The small string optimization example is compelling. Still, I cling to the hope that there is a way to simplify this. Even if the static type is not useful, in practice there is a useful stable dynamic type, and it is implied by the actual stores performed by the program.
> I also don't think this will simplify very much. The interval map is a pretty excellent way to track these, and it should be quite fast. Most of the rough edges could be fixed by making a proper available stores abstraction.
Ok, I have nothing against IntervalMap (particularly if it drops its fully-inclusive representation as Jakob suggested), so long as its the right answer. I just really hope there is a simpler approach that can work here.
> 4. Determine which loads are candidates for forwarding in the callee, and compute which available stores get invalidated in the call. When analyzing the callee body, you now know that the "unique" pointers coming in are non-aliasing with anything else, so the aliasing code logic is straight-forward. To optimize for compile time, check to see if the callee is "readonly" which means it can't mutate any arguments (actually, is this true? Can't a readonly function mutate something in the caller and then restore it before returning?).
> I'm not yet sure what simplifications you're aiming for here. I'll have to take a more detailed look at the code to see if there are general things that could be done to simplify this.
> One problem I ran into when testing this was with recursive functions. You get into situations where the same alloca is in the caller and the callee. I'm not even sure I've gotten that case correct yet. =/ In general you can have the same base pointer (Value *) in both the caller and callee, and in some cases, they don't alias (alloca), in some case the may alias (random pointers returned from functions), and in some case they are the same pointer (arguments passed along). Sadly, both the first and the last happen quite frequently, and so are very worthwhile to handle with care.
Yet another reason not to use AliasAnalysis :-). It definitely isn't set up to handle this.
> I'll look into simplifying this some, but again, I'm not certain how much simplification will be practical and still catch the various patterns. Do you think this is worth a certain amount of complexity? (Hopefully at least factored into a less fragile set of tools and abstractions…)
Yes, I agree with you completely that this is an important optimization and worth some complexity. I just want to minimize it wherever possible. One random thought: for classes with a vtable, is this powerful enough to propagate the "vtable store" in a caller into vtable loads in the callee, allowing the inline cost code to consider the cost of the devirtualized code? Do we get a strong bonus for this?
It would probably help to introduce some new classes (replacing the typedef of containers) and use member functions to clean up the code. When you get another version of the patch, I'll take another look.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-commits