[llvm-dev] [RFC] Add IR level interprocedural outliner for code size.

Sat Jul 22 15:23:23 PDT 2017

Hi River,

Very impressive! -- thanks for working on this.

A few questions, if you don't mind.

First, on results (from goo.gl/5k6wsP). Some of them are quite surprising.
In theory, "top improvements" should be quite similar in all three
approaches ("Early&Late Outlining", "Late Outlining" and "Machine
Outliner"), with E&LO capturing most of the cases. Yet, they are very
different:

Test Suite, top improvements:
E&LO:

   -

   enc-3des: 66.31%
   -

   StatementReordering-dbl: 51.45%
   -

   Symbolics-dbl: 51.42%
   -

   Recurrences-dbl: 51.38%
   -

   Packing-dbl: 51.33%

LO:

   -

   enc-3des: 50.7%
   -

   ecbdes: 46.27%
   -

   security-rjindael:45.13%
   -

   ControlFlow-flt: 25.79%
   -

   ControlFlow-dbl: 25.74%

MO:

   -

   ecbdes: 28.22%
   -

   Expansion-flt: 22.56%
   -

   Recurrences-flt: 22.19%
   -

   StatementReordering-flt: 22.15%
   -

   Searching-flt: 21.96%

SPEC, top improvements:
E&LO:

   -

   bzip2: 9.15%
   -

   gcc: 4.03%
   -

   sphinx3: 3.8%
   -

   H264ref: 3.24%
   -

   Perlbench: 3%

LO:

   -

   bzip2: 7.27%
   -

   sphinx3: 3.65%
   -

   Namd: 3.08%
   -

   Gcc: 3.06%
   -

   H264ref: 3.05%

MO:

   -

   Namd: 7.8%
   -

   bzip2: 7.27%
   -

   libquantum: 2.99%
   -

   h264ref: 2%

Do you understand why so?

I'm especially interested in cases where MO managed to find redundancies
while E&O+LO didn't. For example, 2.99% on libquantum (or is it simply
below "top 5 results" for E&O+LO?) -- did you investigated this?

Also, it would be nice to specify full options list for SPEC (I assume SPEC
CPU2006?), similar to how results are reported on spec.org.

And a few questions on the RFC:

On Fri, Jul 21, 2017 at 12:47 AM, River Riddle via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> * Debug Info:
>
Debug information is preserved for the calls to functions which have been
> outlined but all debug info from the original outlined portions is removed,
> making them harder to debug.
>

Just to check I understand it correctly: you remove *all* debug info in
outlined functions, essentially making them undebuggable -- correct? Did
you considered copying debug info from one of outlined fragments instead?
-- at least line numbers?

The execution time results are to be expected given that the outliner,
> without profile data, will extract from whatever region it deems
> profitable. Extracting from the hot path can lead to a noticeable
> performance regression on any platform, which can be somewhat avoided by
> providing profile data during outlining.
>

Some of regressions are quite severe. It would be interesting to implement
what you stated above and measure -- both code size reductions and
performance degradations -- again.

> * LTO:
>
>    - LTO doesn’t have a code size pipeline, but %reductions over LTO are
> comparable to non LTO.
>

LTO is known to affect code size significantly (for example, by removing
redundant functions), so I'm frankly quite surprised that the results are
the same...

Yours,
Andrey
===
Compiler Architect
NXP
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170723/96fe9492/attachment.html>