[PATCHES] A module inliner pass with a greedy call site queue

Wed Aug 27 14:02:07 PDT 2014

Hi Chandler,

I understand iteratively applying inliner and other opts until converging or hit cut off

may get better results. But indeed, it is very expensive. For the application, such as

chromium you mentioned, it may take hours to finish one round of compilation.

We can have the complete iterative framework but 1 or 2 rounds may be a good

heuristic as the default setup.

I also know a little of chromium, in my opinion, having the iterative framework is not 

enough to bring up the performance because the critical path usually only touch a very

small portion of the whole code and hot functions spread in many files. In order to

ensure a certain cluster of functions to be inlined, we have to increase threshold to make

sure all functions get inlined. As the result, the total size become much larger. So profile

guided inlining should do better job here.

And for chromium, top-down approach should do better job than the current bottom up

Approach especially with –Os option. There are so many C++ function calls. Because 

CallPenalty in the current SCC inliner is 25 and the threshold of –Os is very low, once a

Function has one or two call to other function, the inlining will stop. This will leave a lot

of functions not get inlined. And a lot of leaf functions are not hot functions. Bottom up

approach can do better job for chromium.

Yin 

From: Chandler Carruth [mailto:chandlerc at google.com] 
Sent: Wednesday, August 27, 2014 12:33 PM
To: Yin Ma
Cc: Xinliang David Li; Hal Finkel; Jiangning Liu; LLVM Commits
Subject: Re: [PATCHES] A module inliner pass with a greedy call site queue

On Wed, Aug 27, 2014 at 11:52 AM, Yin Ma <yinma at codeaurora.org <mailto:yinma at codeaurora.org> > wrote:

For iterative framework for inliner and other optimization, for inliner itself, I believe two

passes should be good enough, one at the very beginning of a pass queue and one at the 

end.  In general, the size of IR is very linear to the one after optimization.

This is definitely not the case.

Things like SROA, SSA-formation, and DCE after constants are propagated thanks to SROA, GVN etc., will often dramatically change the code size. Perhaps it is the case for very small isolated benchmarks that have already been heavily optimized by hand, but across almost all the code and benchmarks I care about in the real world (sadly, SPEC is not the real world or even close) the size of IR after optimization has much higher variance -- sometimes it is *much* smaller.

Some optimizations

increases the ir size, such as unrolling. For increased Ir callees, actually, for most of time, we 

do not inline it again in the iterative framework. So if we only add another pass to catch 

all decreased ir callees, it should be enough in my opinion.

You're not addressing the case when inlining itself *changes* how much the optimizer decreases the IR... When that happens, you really need to iterate with the inliner to get the best results.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140827/683df779/attachment.html>