[PATCHES] A module inliner pass with a greedy call site queue
yinma at codeaurora.org
Wed Aug 27 14:02:07 PDT 2014
I understand iteratively applying inliner and other opts until converging or hit cut off
may get better results. But indeed, it is very expensive. For the application, such as
chromium you mentioned, it may take hours to finish one round of compilation.
We can have the complete iterative framework but 1 or 2 rounds may be a good
heuristic as the default setup.
I also know a little of chromium, in my opinion, having the iterative framework is not
enough to bring up the performance because the critical path usually only touch a very
small portion of the whole code and hot functions spread in many files. In order to
ensure a certain cluster of functions to be inlined, we have to increase threshold to make
sure all functions get inlined. As the result, the total size become much larger. So profile
guided inlining should do better job here.
And for chromium, top-down approach should do better job than the current bottom up
Approach especially with –Os option. There are so many C++ function calls. Because
CallPenalty in the current SCC inliner is 25 and the threshold of –Os is very low, once a
Function has one or two call to other function, the inlining will stop. This will leave a lot
of functions not get inlined. And a lot of leaf functions are not hot functions. Bottom up
approach can do better job for chromium.
From: Chandler Carruth [mailto:chandlerc at google.com]
Sent: Wednesday, August 27, 2014 12:33 PM
To: Yin Ma
Cc: Xinliang David Li; Hal Finkel; Jiangning Liu; LLVM Commits
Subject: Re: [PATCHES] A module inliner pass with a greedy call site queue
On Wed, Aug 27, 2014 at 11:52 AM, Yin Ma <yinma at codeaurora.org <mailto:yinma at codeaurora.org> > wrote:
For iterative framework for inliner and other optimization, for inliner itself, I believe two
passes should be good enough, one at the very beginning of a pass queue and one at the
end. In general, the size of IR is very linear to the one after optimization.
This is definitely not the case.
Things like SROA, SSA-formation, and DCE after constants are propagated thanks to SROA, GVN etc., will often dramatically change the code size. Perhaps it is the case for very small isolated benchmarks that have already been heavily optimized by hand, but across almost all the code and benchmarks I care about in the real world (sadly, SPEC is not the real world or even close) the size of IR after optimization has much higher variance -- sometimes it is *much* smaller.
increases the ir size, such as unrolling. For increased Ir callees, actually, for most of time, we
do not inline it again in the iterative framework. So if we only add another pass to catch
all decreased ir callees, it should be enough in my opinion.
You're not addressing the case when inlining itself *changes* how much the optimizer decreases the IR... When that happens, you really need to iterate with the inliner to get the best results.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-commits