[PATCHES] A module inliner pass with a greedy call site queue

Wed Aug 27 14:29:40 PDT 2014

Hi Chandler,

The last sentence was a typo. We did observe using a kind of top down approach in greedy inliner 

resulted in less number of symbols remained, smaller overall code size and similar or better performance

compared with SCC inliner for chromium with –Os. 

We don’t want to increase threshold to improve performance neither, it is why I tried to use

priority queue to address the hot function before cold function. 

Inlining is a heuristic. I understand the current inliner works very well in many cases. Maybe

a more flexible Inlining framework is indeed what we need?

Yin 

From: Chandler Carruth [mailto:chandlerc at google.com] 
Sent: Wednesday, August 27, 2014 2:08 PM
To: Yin Ma
Cc: Xinliang David Li; Hal Finkel; Jiangning Liu; LLVM Commits
Subject: Re: [PATCHES] A module inliner pass with a greedy call site queue

On Wed, Aug 27, 2014 at 2:02 PM, Yin Ma <yinma at codeaurora.org <mailto:yinma at codeaurora.org> > wrote:

I understand iteratively applying inliner and other opts until converging or hit cut off

may get better results. But indeed, it is very expensive.

The current LLVM inliner *is* such an iterative framework...

For the application, such as

chromium you mentioned, it may take hours to finish one round of compilation.

We can have the complete iterative framework but 1 or 2 rounds may be a good

heuristic as the default setup.

In the past when working on LLVM's existing iterative inlining framework we needed many more rounds. The current system is designed to be iterative while remaining efficient.

I also know a little of chromium, in my opinion, having the iterative framework is not 

enough to bring up the performance because the critical path usually only touch a very

small portion of the whole code and hot functions spread in many files. In order to

ensure a certain cluster of functions to be inlined, we have to increase threshold to make

sure all functions get inlined. As the result, the total size become much larger. So profile

guided inlining should do better job here.

I'm not sure what you're trying to say here, but note that Chromium is only one C++ project and I'm not sure it is even representative (it may be, I just don't know). I usually look at a pretty wide range of C++ code bases and patterns when considering something like the inliner, ranging from the size of LLVM up to large C++ servers that contain many times the amount of code Chromium does.

It is certainly not my experience that we have to increase thresholds to make sure all functions get inlined for performance. It is more often that *not* inlining cold functions is the key to performance, but even that is a gross oversimplification.

And for chromium, top-down approach should do better job than the current bottom up

Approach especially with –Os option. There are so many C++ function calls. Because 

CallPenalty in the current SCC inliner is 25 and the threshold of –Os is very low, once a

Function has one or two call to other function, the inlining will stop. This will leave a lot

of functions not get inlined. And a lot of leaf functions are not hot functions. Bottom up

approach can do better job for chromium.

This paragraph seems to contradict its first sentence. Certainly, the reason why I am a strong proponent of bottom-up inlining is due to the nature of call graphs formed by C++ code and its ability to optimize those graphs in an efficient and effective manner.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140827/74f2d566/attachment.html>