<div style="font-family: arial, helvetica, sans-serif; font-size: 10pt"><div dir="ltr"><div class="gmail_default" style>On Fri, Dec 28, 2012 at 11:55 PM, Chris Lattner <span dir="ltr"><<a href="mailto:clattner@apple.com" target="_blank" class="cremed">clattner@apple.com</a>></span> wrote:<br>

</div><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">On Dec 27, 2012, at 9:53 AM, Chandler Carruth <<a href="mailto:chandlerc@gmail.com" class="cremed">chandlerc@gmail.com</a>> wrote:<br>


>  Fix typos spotted by Joey Gouly.<br>

><br>

>  Thanks for the review!<br>

><br>

> <a href="http://llvm-reviews.chandlerc.com/D246" target="_blank" class="cremed">http://llvm-reviews.chandlerc.com/D246</a><br>

><br>

> CHANGE SINCE LAST DIFF<br>

>  <a href="http://llvm-reviews.chandlerc.com/D246?vs=598&id=600#toc" target="_blank" class="cremed">http://llvm-reviews.chandlerc.com/D246?vs=598&id=600#toc</a><br>

<br>

</div>Hey Chandler,<br>

<br>

Thanks for tackling this, getting this would be a really great improvement in the inliner!  This review is based on a quick look over the patch, I apologize if I misunderstand how it's working.<br>

<br>

My big concern about this is that you've written this in a very general way, using Alias Analysis and some amount of brute force to check the sets of values using the interval map.  I'm concerned about compile time impact and also the complexity/fragility of the code.<br>

</blockquote><div><br></div><div style>I'm concerned about the complexity too, but I suspect it is worth a certain amount of complexity. I also have quite a few ideas that will simplify things but that really should be follow-up work. For example, a lot of the current complexity stems from me optimizing for a dense, small, and fast implementation. I actually think one way to make this easier to understand is to build a reasonable store-forwarding abstraction that can be used in any pass that needs the information, and give it a nice interface. That will isolate the logic away from the inliner and make everything a bit easier to understand.</div>

<div style><br></div><div style>Anyways, that's essentially orthogonal to your ideas below. On the the particulars...</div><div style> <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

In practice, the inliner is going to be using BasicAA which is going to get "uniquely identified" objects and maybe TBAA information.</blockquote><div><br></div><div style>Yes, I've been essentially testing it with BasicAA, as it seems to do everything I want (essentially, partitioning obviously non-aliased pointers).</div>

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Also, in practice, the most common cases will be "type safe" without terrible pointer casting and stuff (I say this without actually looking at the std::vector code, I hope its true!).<br>

</blockquote><div><br></div><div style>I'm actually not at all sure this is true. I see a large number of pointer casting and other cruft that makes it quite hard to analyze these patterns without some reasonably powerful machinery. More details on this below.</div>

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

Would it be enough to dramatically simplify this down to something like the following:<br>

1. When looking at a call site, determine which arguments are known to be "uniquely identified" distinct objects.  These are your candidates for "store forwarding".  In practice, to handle the common "this" case, you should handle "uniquely identified objects" and at most one other non-uniquely identified object (commonly "this"), which still makes the aliasing trivial.<br>

</blockquote><div><br></div><div style>I actually started with this exact idea. However, Richard Smith had a comment that convinced me go for something a touch more general to specifically handle the cases of wrapper objects like rich iterators or anything that looks like a pimpl. There we have a pointer to a distinct object, and the first store we want to forward is a member pointer to some other object. Then we want to forward stores of that object....</div>

<div style><br></div><div style>Essentially, if we're going to do this, I'd like to do in a generic way, and limit the extent of the analysis rather than the scope. That way changing from pattern A to B in your code doesn't surprisingly decrease the optimizations, but adding another layer of indirection may. Hopefully that's less suprising... it is to me.</div>

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

2. Do a backwards scan from the callsite to collect stores, like you're doing in collectConstantOffsetStores.  It looks like collectConstantOffsetStores can be made to bail out earlier than your scan does right now (stopping when it finds a store to an unknown-alias of the aliased "this" pointer).<br>

</blockquote><div><br></div><div style>I'd like to improve the backward scan as well, but I'm already hitting cases in std::vector where we don't scan far enough. =/ That said, I'd like to think about this and other issues with the backward scan and see if I can simplify it some, irrespective of the rest of this discussion. This was really a rough cut first attempt.</div>

<div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">3. I'd simplify your datastructure tracking stores (instead of using intervals) to only handle the "type safe" (+ memset as a followon patch, by scalarizing stores) cases, and use GEP paths (similar to what the argument promotion pass does).  What I mean is to just assume that the pointer type in the call is "correct" and look for stores to members of that LLVM IR type.  This means that the AvailableStores mapping is just "pair<Pointer+ConstantOffset> -> Constant*" since there can only be one (non-empty!) field in an IR type at a given offset, and using the argument pointer type will give you a unique and usually-good LLVM type for the constant to use.<br>

</blockquote><div><br></div><div style>I spent a lot of time thinking about this. I don't think we can use the IR types at all, for the same reason SROA can't: the IR types of pointers are lies. Consider the simple case of the small string optimization: there is a pointer element and a character array occupying the same slot in the IR type. Which do we use? Either will be wrong a decent percentage of the time. We see this with every union, every character array with placement new, every chunk of data allocated past the end of a class, etc.</div>

<div style><br></div><div style>I also don't think this will simplify very much. The interval map is a pretty excellent way to track these, and it should be quite fast. Most of the rough edges could be fixed by making a proper available stores abstraction.</div>

<div style><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">4. Determine which loads are candidates for forwarding in the callee, and compute which available stores get invalidated in the call.  When analyzing the callee body, you now know that the "unique" pointers coming in are non-aliasing with anything else, so the aliasing code logic is straight-forward.  To optimize for compile time, check to see if the callee is "readonly" which means it can't mutate any arguments (actually, is this true?  Can't a readonly function mutate something in the caller and then restore it before returning?).</blockquote>

<div><br></div><div style>I'm not yet sure what simplifications you're aiming for here. I'll have to take a more detailed look at the code to see if there are general things that could be done to simplify this.</div>

<div style><br></div><div style>One problem I ran into when testing this was with recursive functions. You get into situations where the same alloca is in the caller and the callee. I'm not even sure I've gotten that case correct yet. =/ In general you can have the same base pointer (Value *) in both the caller and callee, and in some cases, they don't alias (alloca), in some case the may alias (random pointers returned from functions), and in some case they are the same pointer (arguments passed along). Sadly, both the first and the last happen quite frequently, and so are very worthwhile to handle with care.</div>

<div style><br></div><div style>I'm skeptical of using the attributes for the reason you highlight.</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

To actually compute the invalidations (and therefore the forwardable loads) I think you might have to use a depth first walk (separate from the breadth first search the instvisitor uses) to avoid losing information.<br></blockquote>

<div style><br></div><div style>I'm interested in the right iteration order too... Note that we don't use the instvisitor iteration order at all, we use our own CFG-driven walk, and it's tricky because the CFG of the function is not actually the CFG we walk -- a primary simplification we're performing is pruning dead CFG edges.</div>

<div style><br></div><div style><br></div><div style>I'll look into simplifying this some, but again, I'm not certain how much simplification will be practical and still catch the various patterns. Do you think this is worth a certain amount of complexity? (Hopefully at least factored into a less fragile set of tools and abstractions...)</div>

<div><br></div><div style>Clearly, the compile time cost is a concern, but that also needs thorough benchmarking. This was a very early stage patch. =]</div><div style><br></div><div style><br></div><div style>Thanks for all the thoughts on this!</div>

<div style>-Chandler</div></div></div></div></div>