[PATCH] Inliner Enhancement
Xinliang David Li
xinliangli at gmail.com
Thu Mar 19 20:25:16 PDT 2015
On Thu, Mar 19, 2015 at 7:34 PM, Jiangning Liu <liujiangning1 at gmail.com>
> Comparing even more complicated algorithm, yes it is simple, but comparing
>>> with the original solution, it can avoid code bloat of abnormal programs
>>> like repeatedly calling the same function a lot of times.
>> so this heuristic is to reduce cold bloat?
> If you read my initial post again, you would have noticed it is from
> (2.b), which is for code size.
yes -- I had a typo : cold --> code.
>> Adding simple heuristic is easy, the problem with them (without clear
>> benefit, cost analysis) is that you can easily find a counter example (low
>> SNR). Those heuristic are common for compilers that do benchmark hacks.
> I don't think so. From engineering point of view, the complicated code is
> hard to be maintained and further developed. Also, even if more complicated
> logic can obtain some more benefit for some specific benchmark, I still
> don't think it is a good practice, because it would be likely to leave a
> lot of corner cases or holes exposing counter example to other engineers.
This is just matter of engineering and implementation.
> Yes, my heuristic rules are simple, so why can't we just take it if it is
> a low hanging fruit? I have specific cases showing the similar threshold
> enlarging can bring different performance impact significantly for
> different benchmarks. For one case, if you enlarge threshold to be above A,
> you could get ~5% performance improvement, but for another case if you
> further enlarge it to be up to B, you would have 3~5% regression. The
> regression is just because the increase of register spilling code. At
> inlining time, it's hard to give accurate register cost model, even if you
> say you can capture some for some cases. It would be the same argument that
> it would be easy to find counter case for more complicated algorithm. For
> the example I gave (Sorry, I can't share with you the details because of
> SPEC license issue), first it's hard to guess which callee is hot, and
> second even if you know both are hot, it would be still hard to decide your
> threshold should be between A and B. Therefore, now the question is you
> want to capture this case or not?
As I have said, no inline heuristic can be noise free, but the more
analysis can help greatly improve signal to noise ratio. In theory, a
better analysis should detect a set inline candidates which is a superset
of the 'good' subset of the inline candidates selected by a simple
heuristic, but not including the bad candidates, so an incremental inliner
improvement with simple (and good) inline heuristics sounds good to me.
Regarding your patch, I think the simple loop heuristic may be a reasonable
incremental improvement, but you really need to wait for Chandler's pass
manager change instead of doing all the ordering tricks. On the other hand,
I don't see the 'same callee' heuristics is generally valid. How much size
contribution does it have? Is it just skewed by one benchmark?
>>> I'm not sure what those more sophisticated heuristic rules you are
>>> talking about are, but if you tends to tune them for specific benchmark, I
>>> would say it will definitely not make sense. I would be appreciative if you
>>> can share your result with community. Then we can see how we
>>> can reasonably move on.
>> The plan is to tune inliner (both non-PGO and PGO) for real world
>> programs and cross-validate with SPEC benchmarks. This work will
>> definitely be shared with community.
> Given that it is still a plan for you, how can you justify it without
> data? If you still think your idea can be better, I think it can be the 2nd
> step following this simple heuristic.
Don't get me wrong. I think incremental improvement is good -- it is just
there are issues with this patch (see my reply above).
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-commits