<div dir="ltr"><div class="gmail_quote"><div dir="ltr">On Tue, Aug 2, 2016 at 3:01 PM Xinliang David Li via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Tue, Aug 2, 2016 at 2:32 PM, Chandler Carruth <span dir="ltr"><<a href="mailto:chandlerc@gmail.com" target="_blank">chandlerc@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>Sorry I missed these comments in my first read through David.</div><br><div class="gmail_quote"><span><div dir="ltr">On Mon, Aug 1, 2016 at 1:06 AM Xinliang David Li via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>> wrote:<br></div></span><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span>On Sun, Jul 31, 2016 at 9:47 PM, Chandler Carruth via llvm-dev <span dir="ltr"><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>></span> wrote:<br></span><span><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>Thoughts? The code changes are easy and mechanical. My plan would be:</div></div></blockquote><div><br></div></span></div></div></div><span><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>There is one caveat: stop doing stack merging in inliner will change the inliner cost analysis which may have affect inlining decisions and performance in an unexpected way.  For instance, the allocatedSize estimate won't be as accurate due to double counting.</div></div></div></div></span></blockquote><div><br></div><div>There is the possibility of this, but I think it is relatively unlikely for a few reasons.</div><div><br></div><div>The first is that I don't think the alloca merging is helping that much here. Specifically, we only merge certain kinds of allocas and we only merge them in fairly limited cases. So if the thresholds were very sensitive to this, I would expect us to have seen these thresholds blocking things more often than we have.</div><div><br></div><div>Also, as you mention, there are two places we might hit this. One is due to the increased number of alloca instructions causing skew in the inline cost. I think we should "fix" this by stopping counting *static* alloca instructions for the purpose of inline cost estimation. I think this will be a net win. The alloca instructions are completely synthetic when static. All of them will be folded into a single stack adjustment that is even shared with register spills etc. I think we should model them as free. (Clearly, *dynamic* alloca instructions are different here!) But I don't think this is likely enough to be urgent to fix. I think we could do both changes in parallel.</div></div></div></blockquote><div><br></div></div></div></div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>Yes, I agree with this. In fact, not only should we not model static alloca as 'cost', if we have not already model them as part of the call overhead (stack pointer adjustment in prologue), we should probably also model all static allocas collectively in the callee as potential savings after inlining.    If you don't have time to get to it (at least the first part), Easwaran can probably help with this adjustment.</div></div></div></div></blockquote><div><br></div><div>Sure, if Easwaran can look at that independently, it'd be great.</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_quote"><div><br></div><div>The other limit is the allocated size limit, and I think that is the more likely issue (it sounds like it was the one you were worried about as well). This one might be an issue, but I also think we can adjust it pretty freely. The goal of this limit, from my recollection of when it went in, is more to avoid run-away stack bloating, and the value of the limit is somewhat arbitrary. And it only impacts inlining into recursive functions. So I'm less worried about this in general. If it proves to be a problem in practice, we can do something about it, and if that happens it would probably be good to know because we shouldn't be relying on merging to model this anyways.</div><span><div><br></div></span></div></div></blockquote><div><br></div></div></div></div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>The current limit (for recursive caller) is indeed arbitrary, but I am more worried about a more general case which applies to any deep call chains: we may want to implement heuristic to throttle inlining in general due to excessive stack frame growth. For this reason, the tracking of stack size need to match closely to actual stack usage -- otherwise we end up in either missing inlinings or runtime errors (due to out of stack problem).</div></div></div></div></blockquote><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><br></div><div>One way to solve this without requiring merging is to track the current node's frame size by combining the caller's original stack size plus the max of its callee's frame size (not sum).</div></div></div></div></blockquote><div><br></div><div>So there are two ways I can see the threshold being used.</div><div><br></div><div>One is to prevent completely nuts run-away stack growth. That use case can be served with a fairly arbitrary approximation of stack growth and an arbitrary threshold.</div><div><br></div><div>But the other use case is, as you say, to be reasonably careful about inlining "too much" and hurting stack size. There, I agree with you that we need a fairly precise way to model this and I don't think LLVM has one at the moment.</div><div><br></div><div>My belief is that at best we are currently addressing the first of these goals, and that the difference between merging or not merging could be easily accomodated by changes to the threshold.</div><div><br></div><div>I'm happy for folks to look at going after the second goal, but the merging thing isn't going to be nearly enough so I don't think that should really hold up this move.</div></div></div>