[llvm-dev] RFC: Inlining report

Xinliang David Li via llvm-dev llvm-dev at lists.llvm.org
Fri Oct 23 15:19:08 PDT 2015


On Thu, Oct 22, 2015 at 12:32 PM, Hal Finkel via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

>
> ------------------------------
>
> *From: *"Robert via llvm-dev Cox" <llvm-dev at lists.llvm.org>
> *To: *llvm-dev at lists.llvm.org
> *Sent: *Thursday, October 22, 2015 1:25:05 PM
> *Subject: *[llvm-dev] RFC: Inlining report
>
> *RFC: Inlining Report *
>
>
>
> *Motivation *
>
>
>
> Making good inlining choices while optimizing an application is often key
> to achieving optimal performance.  While the compiler’s default inlining
> heuristics sometimes provide great out-of-box results, optimal performance
> is sometimes achieved only after varying the settings of certain compiler
> options related to inlining or adding “always_inline” or “noinline”
> attributes to certain functions.
>
>
>
> Before we can determine how we need change the compiler’s inlining choices
> to get better performance for an application, we need to have a clear
> picture of the compiler’s inlining choices and what motivated them.  Many
> compilers like LLVM and GCC provide *informational notes *when a function
> is inlined, but  these notes provide only a “blow by blow” description of
> what the compiler did, rather than a high level illustration of the result.
> This high level picture can be provided by an *inlining report. *
>
>
>
> Over the years, I’ve worked with several compilers that provide inlining
> reports, and I can attest that the customers using those compilers have
> found them to be invaluable tool in investigating and improving their
> applications’ performance.  In addition,  the inlining report can be used
> by compiler developers to visualize and improve the compiler’s default
> heuristics and option values.
>
> I agree, these can be extremely useful. Generically speaking, I would very
> much like to see Clang/LLVM grow the ability to provide optimization
> reports (including those where source lines are annotated with information
> on what was vectorized, eliminated, etc.).
>
> A few comments:
>
> 1. Inlining is iterative. Thus, I assume that your report might include
> information from multiple inlining passes. Is that correct?
>
> 2. Inlining costs are target specific (because it uses TTI costs), so it
> would be useful for the report to include the target architecture (as well
> as information on the LLVM version, name of the input file, etc.)
>
> 3. And this is the big one: Where should the infrastructure for this live?
>
> One of my goals when defining the 'informational note' infrastructure in
> LLVM, was to construct it such that the information was not just
> presentable to humans, but also so that it could be programmatically
> consumed. This is why we designed it with a class hierarchy: so that the
> "messages" could be more than just messages. The rationale was that there
> is information, necessary for presenting useful feedback to humans, that
> only the frontend has. For C++ codes, for example, you need to do symbol
> demangling. The frontend is probably the best place to do that. The
> frontend also knows the proper place to write output files. In addition,
> specifically for inlining information, the frontend knows where functions
> are defined without the need for debug information.
>

It would be nice to be able to produce the report without debug
information, but not sure how important that requirement is -- the
optimized build is usually done with some level of debug. debug info is
also enabled with -Rpass option, so the inline report (or more generally
optimization report) option can do the same here.

thanks,

David


>
> My preference, therefore, is to make sure that the inliner generates
> sufficiently-detailed messages using a proper class hierarchy and
> sufficient information. Then, in Clang, we can collect those messages,
> demangle function names and add source-location information, and produce a
> report.
>
> Thoughts?
>
> Thanks again,
> Hal
>
>
>
> For these reasons, I’d like to contribute code to LLVM to generate an
> inlining report as part of the inliner.
>
>
>
> *Description *
>
>
>
> The inlining report I am proposing contains the following information:
>
>
>
> (1)    The values of the principle threshold options which affect how
> much inlining is done under various circumstances
>
> (2)    Whether each function is compiled or has been eliminated by dead
> static function elimination.
>
> (3)    For each function, the call sites that were and were not inlined.
> Since inlining a call site can expose other call sites for inlining, the
> inlining report also reports on whether these exposed call sites have been
> inlined or not.  This information is presented in hierarchical manner.
>
> (4)    For each call site, we include the principle reason the call site
> was or was not inlined, together with any cost vs . threshold computation
> that was done.
>
>
>
> *High Level Design *
>
>
>
> The inline report is created if the option –inline-report=X is passed on
> command line with a positive integer value of X.  If X is 0, or this option
> is not specified, the Inliner does not create or perform any operations on
> the inline report, and there is no compile time overhead.
>
>
>
> Three main classes are used to implement the inline report:
>
>
>
> *class InlineReportCallSite *
>
>
>
> This class contains the inlining report information specific to a
> particular CallSite CS, including:
>
> (1)    A bool indicating whether or not the CallSite was or was not
> inlined
>
> (2)    An inlining reason  indicating why the CallSite was or was not
> inlined
>
> (3)    The inlining cost, outer inlining cost, and threshold values used
> in calculating the profitability of inlining
>
> (4)    A vector of InlineReportCallSite*, each of which points to an
> InlineReportCallSite for a CallSite exposed when CS was inlined.
>
>
>
> *class InlineReportFunction *
>
>
>
> This class contains the inlining report information specific to a
> particular Function F in the call graph, including:
>
> (1)    A bool indicating whether the function has been dead static
> eliminated.
>
> (2)    A vector of call InlineReportCallSite*, each of which points to an
> InlineReportCallSite for a CallSite that appeared in F before any inlining
> was applied.
>
>
>
> *class InlineReport *
>
>
>
> The main class which summarizes the high level information in the inline
> report, including:
>
> (1)    The values of the inlining threshold options
>
> (2)    The “level” of the inlining report, which is a bit vector of
> feature options.  For example, whether to print external functions and
> intrinsics, whether to print the inlining reasons, etc.
>
> (3)    A map MF from each Function* to InlineReportFunction*
>
> (4)    A map MCS from each CallSite* to InlineReportCallSite*
>
>
>
> In addition, the class InlineCost (from InlineCost.h) is augmented to
> include the primary reason a call site was inlined.
>
>
>
> The class Inliner has been augmented with an InlineReport, which is
> created when an Inliner is constructed. The InlineReport is updated using
> calls to the member functions of these three classes in Inliner::runOnSCC()
> and the functions called by it.
>
>
>
> Before any inlining is done in a particular call to runOnSCC(), the map MF
> is updated so that each Function (caller or callee) that will be examined
> for inlining has a corresponding InlineReportFunction in the map.  (The map
> MCS is also updated in a similar way, but only when a Function is actually
> inlined.)
>
>
>
> The Inliner determines if a CallSite should be inlined by first calling
> Inliner::ShouldInline().   This calls getInlineCost() which returns an
> InlineCost, which now includes the reason the call site should or should
> not be inlined.  This reason, as well and costs and threshold from the
> InlineCost are stored in the InlineReportCallSite for the CallSite.
>
>
>
> Then Inliner calls the static function InlineCallPossible(). If the
> inlining was not performed, the reason for not inlining is recorded in the
> InlineReportCallSite corresponding to the CallSite.  If the inlining was
> performed, the corresponding InlineReportCallSite is marked as inlined, and
> it is populated with the InlineReportCallSites corresponding to the newly
> exposed CallSites that were created during the inlining.
>
>
>
> The InlineReport is printed during the call to Inliner::doFinalization().
>
>
>
> Since the compiler can run any number of optimizations between two
> successive calls to runOnSCC(), the Instructions corresponding to CallSites
> can be deleted by the optimizations.  Callbacks are used to mark the
> corresponding InlineReportCallSites as deleted when this happens.
>
>
>
> *Example *
>
>
>
> Here is an example of abbreviated inlining report that is generated in my
> locally modified copy of the LLVM sources.  I generated this by compiling
> the file bzip2.c from the spec 2006 benchmark 401.bzip.  (For the sake of
> brevity, I didn’t include all of the report.  Omitted parts are indicated
> by …. in the report.)
>
>
>
> *---- Begin Inlining Report ----*
>
>
>
> *Option Values:*
>
> *  inline-threshold: 225*
>
> *  inlinehint-threshold: 325*
>
> *  inlinecold-threshold: 225*
>
> *  inlineoptsize-threshold: 15*
>
>
>
> *COMPILE FUNC: fopen_output_safely*
>
> *   -> EXTERN: open*
>
> *   -> EXTERN: fdopen*
>
> *   -> EXTERN: close*
>
>
>
> *DEAD STATIC FUNC: setExit*
>
>
>
> *DEAD STATIC FUNC: copyFileName*
>
>
>
> *DEAD STATIC FUNC: showFileNames*
>
>
>
> *DEAD STATIC FUNC: stat*
>
>
>
> *….*
>
>
>
> *COMPILE FUNC: cleanUpAndFail*
>
> *   -> llvm.lifetime.start*
>
> *      [[Callee is intrinsic]]*
>
> *   -> INLINE: stat (35<=487)*
>
> *      <<Callee is single basic block>>*
>
> *      -> EXTERN: __xstat*
>
> *   -> EXTERN: fprintf*
>
> *   -> EXTERN: fclose*
>
> *   -> EXTERN: remove*
>
> *   -> EXTERN: fprintf*
>
> *   -> EXTERN: fprintf*
>
> *   -> EXTERN: fprintf*
>
> *   -> EXTERN: fprintf*
>
> *   -> EXTERN: fprintf*
>
> *   -> EXTERN: fprintf*
>
> *   -> INLINE: setExit (15<=225)*
>
> *      <<Inlining is profitable>>*
>
> *   -> EXTERN: exit*
>
>
>
> *….*
>
>
>
> *COMPILE FUNC: outOfMemory*
>
> *   -> EXTERN: fprintf*
>
> *   -> INLINE: showFileNames (70<=225)*
>
> *      <<Inlining is profitable>>*
>
> *      -> EXTERN: fprintf*
>
> *   -> cleanUpAndFail*
>
> *      [[Callee is noreturn]]*
>
>
>
> *….*
>
>
>
> *COMPILE FUNC: snocString*
>
> *   -> INLINE: mkCell (-14920<=225)*
>
> *      <<Callee has single callsite and local linkage>>*
>
> *      -> INLINE: myMalloc (70<=225)*
>
> *         <<Inlining is profitable>>*
>
> *         -> EXTERN: malloc*
>
> *         -> outOfMemory*
>
> *            [[Callee is noreturn]]*
>
> *   -> EXTERN: strlen*
>
> *   -> INLINE: myMalloc (-14925<=225)*
>
> *      <<Callee has single callsite and local linkage>>*
>
> *      -> EXTERN: malloc*
>
> *      -> outOfMemory*
>
> *         [[Callee is noreturn]]*
>
> *   -> EXTERN: strcpy*
>
> *   -> snocString*
>
> *      [[Callee is never inline]]*
>
>
>
> *…..*
>
>
>
> *---- End Inlining Report ------*
>
>
>
> Here is an explanation of some of the features:
>
>
>
> (1)    Option values
>
>
>
> *Option Values:*
>
> *  inline-threshold: 225*
>
> *  inlinehint-threshold: 325*
>
> *  inlinecold-threshold: 225*
>
> *  inlineoptsize-threshold: 15*
>
>
>
> The report begins with a list of the most relevant option values to
> inlining.
>
>
>
> (2)    Compiled and dead functions
>
>
>
> *COMPILE FUNC: fopen_output_safely*
>
> *   -> EXTERN: open*
>
> *   -> EXTERN: fdopen*
>
> *   -> EXTERN: close*
>
>
>
> *DEAD STATIC FUNC: setExit*
>
>
>
> Functions in the file are identified as either being compiled or
> eliminated by dead static function elimination.
>
>
>
> (3)    External function calls
>
>
>
> *COMPILE FUNC: fopen_output_safely*
>
> *   -> EXTERN: open*
>
> *   -> EXTERN: fdopen*
>
> *   -> EXTERN: close*
>
>
>
> Calls to externally defined functions are indicated by the word EXTERN.
> These lines can optionally be omitted.
>
>
>
> (4)    Inlining and nesting
>
>
>
> *COMPILE FUNC: snocString*
>
> *   -> INLINE: mkCell (-14920<=225)*
>
> *      <<Callee has single callsite and local linkage>>*
>
> *      -> INLINE: myMalloc (70<=225)*
>
> *         <<Inlining is profitable>>*
>
> *         -> EXTERN: malloc*
>
>
>
> Inlined functions are marked INLINE. The inlining of a function within
> other inlined functions is shown clearly in the report using indentation.
>
>
>
> (5)    Reasons functions were and were not inlined
>
>
>
> *COMPILE FUNC: cleanUpAndFail*
>
> *   -> llvm.lifetime.start*
>
> *      [[Callee is intrinsic]]*
>
> *   -> INLINE: stat (35<=487)*
>
> *      <<Callee is single basic block>>*
>
> *      -> EXTERN: __xstat*
>
> *   -> EXTERN: fprintf*
>
> *   -> EXTERN: fclose*
>
> *   -> EXTERN: remove*
>
> *   -> EXTERN: fprintf*
>
> *   -> EXTERN: fprintf*
>
> *   -> EXTERN: fprintf*
>
> *   -> EXTERN: fprintf*
>
> *   -> EXTERN: fprintf*
>
> *   -> EXTERN: fprintf*
>
> *   -> INLINE: setExit (15<=225)*
>
> *      <<Inlining is profitable>>*
>
> *   -> EXTERN: exit*
>
>
>
> ….
>
>
>
> *COMPILE FUNC: outOfMemory*
>
> *   -> EXTERN: fprintf*
>
> *   -> INLINE: showFileNames (70<=225)*
>
> *      <<Inlining is profitable>>*
>
> *      -> EXTERN: fprintf*
>
> *   -> cleanUpAndFail*
>
> *      [[Callee is noreturn]]*
>
>
>
> The principal reason a function was or was not inlined can be optionally
> displayed in the report.  The reason a function was inlined is indicated in
> double angle brackets << >>.  The reason a function was not inlined is
> indicated in double square brackets [[ ]].   When a comparison of the cost
> and threshold was used to determine if the function should be inlined, the
> comparison done is given.   (Since intrinsics are never inlined,
> information about them can be suppressed in the report.) The reasons for or
> for not inlining can optionally be displayed on the same line as the
> function considered for inlining for easy analysis using grep, awk, etc.
>
>
>
> (6)    Line and column info
>
>
>
> *COMPILE FUNC: outOfMemory*
>
> *   -> EXTERN: fprintf   bzip2.c(1016,4)*
>
> *   -> showFileNames   bzip2.c(1019,4) [[Callee is never inline]]*
>
> *   -> cleanUpAndFail   bzip2.c(1020,4) [[Callee is never inline]]*
>
>
>
> Optionally, file, line, and column info can be provided for call sites if
> source position information is present (using –g or
>
> –gline-tables-only).
>
>
>
> I would appreciate any comments you have on whether you support the
> inclusion of an inline report in LLVM, the form and features I have
> outlined above, and your thoughts on the high level design.
>
>
>
> Thank you in advance for your comments,
>
>
>
> Robert Cox
>
> robert.cox at intel.com
>
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
>
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151023/6e955dfd/attachment-0001.html>


More information about the llvm-dev mailing list