Hello,<div><br></div><div>This is a generic refactoring patch that shouldn't change any functionality. Just sinking the threshold into the inline cost analysis, and cleaning up the API of the InlineCost objects so that they can wrap up both the cost and the threshold.</div>

<div><br></div><div>Below I've got a bit of background about what I'm working on, in case folks are curious what's motivating this refactoring, and where I'm headed with the inline cost stuff. Sorry for the long ramble that follows...</div>

<div><div><br></div><div><br></div><div>Duncan and I had a long discussion about how to more accurately compute the inline cost for callsites. The particular problem I'm aiming at are functions which look something like:</div>

<div><br></div><div>void foo(int size) {</div><div><div>  if (size == 1) { /* something small */ return ...; }</div></div><div><div>  if (size == 2) { /* something small */ return ...; }</div></div><div><div>  if (size == 3) { /* something small */ return ...; }</div>

</div><div><div>  if (size == 4) { /* something small */ return ...; }</div></div><div>  /* something huge for the general case, involving loops, function calls, all kinds of madness */</div><div>  return ...;</div><div>}</div>

<div><br></div><div>Here, a few unfortunate things happen with the current inline cost system:</div><div><br></div><div>1) We compute a single 'weight' for the function when size is a constant, regardless of what the constant is.</div>

<div>2) We compute the weight by looking at each branch on a comparison of size w/ a constant, and subtracting the average of the two sides from the total function cost</div><div>3) We compute this weight even if, for example, the very first basic block is too large to ever inline.</div>

<div>4) We use ad-hoc folding logic to determine exactly what happens with the constant because we don't have an *actual* constant.</div><div><br></div><div><br></div><div>I think we have a good idea of how to address these issues. It's a bit high risk, but fortunately all but the last steps seem strict improvements to the world anyways.</div>

<div><br></div><div>The general idea is to switch to a per-call-site analysis of the inline cost, computing significantly less per-function ahead of time. Under this model, we can walk the potentially-live basic blocks in CFG order, propagating the actual constant arguments of the particular callsite through the function. When a branch is proven through this propagation to not be taken, we won't even look at it to compute the cost. The result is that the computed cost at each callsite will reflect the exact maximum code paths left after inlining. Even if the fuction has *wildly* divergent costs on two different code paths, they will be properly accounted. This will both inline more often when the code path that results is short, and less often when the code path that results is very large.</div>

<div><br></div><div>Now the problem with this approach in general is that it is *expensive*. It scales very badly as described. We have some good ideas about how to carefully limit the cost though. The core of the cost limiting is to have the threshold available *while* computing the cost. The moment we cross the threshold, we can early exit without looking any farther. This means we'll only ever walk a range of the function proportional to the range we're willing to inline and optimize at that callsite anyways. Next up, we can memoize the results of the analysis per-callsite so we don't ever re-compute the same cost metric twice. Finally, we can build up helper tables about the function ahead of time that essentially allow the analysis to work on a per-basic-block granularity, and per-*folded*-instruction.</div>

<div><br></div><div><br></div><div>Anyways, I'm rather optimistic that we can make the per-callsite analysis sufficiently fast, and sufficiently well cached that it will be tractable in terms of compile time, and the benefit to accuracy of cost estimation is *huge* for code patterns that tend to show up in hot parts of the code base, such as hybrid generic algorithms. It also has the potentially to cut off some overly eager inlining due to misbehaved bonuses when in fact the giant slow path is the one which will be selected.</div>

</div><div><br></div><div>-Chandler</div>