[llvm-dev] [ThinLTO] RFC: ThinLTO distributed backend interface
Teresa Johnson via llvm-dev
llvm-dev at lists.llvm.org
Thu Apr 14 07:08:00 PDT 2016
Hi all,
Below is a proposal for refining the way we communicate between the ThinLTO
link step (the combined indexing step) and the backend processes that do
the actual importing and other summary-based optimizations in a distributed
backend process.
Mehdi, let me know if this addresses your concerns.
Peter, PTAL from the standpoint of any summary extensions needed for CFI
and make sure they can fit into this model.
Thanks,
Teresa
Background
----------------
Recent patch D18945/r266125 ([ThinLTO] Only compute imports for current
module in FunctionImport pass) triggered a discussion (mostly over IRC) on
how best to determine import/export decisions in distributed back end
compiles.
Import and export decisions are made by traversing the combined index. The
actual importing happens in the FunctionImporter class, which is passed the
set of values to import. The importer class is either invoked directly on
each backend compile, which happens in the threads launched in the libLTO
path, or via the FunctionImportPass.
The pass is currently used by the opt tool, by the gold-plugin when it
launches ThinLTO threads for single machine parallelism, and via clang when
invoked with a bitcode input file and the -fthinlto-index= option. The
latter was added in r254927 to enable launching a ThinLTO backend compile
in a separate distributed build process.
Before r266125, the FunctionImportPass was walking the entire index, but
ignoring the import results for all but the current module, and not using
the exports list. The reason to do the full index walk is that eventually
we can minimize the required static promotions in the current module (based
on whether its defined values are imported elsewhere). However, this was
costing a lot of compile time in each backend thread. On the other hand,
Mehdi would like to use the pass for testing via the opt tool, and planned
to eventually add the support for using the computed export lists to guide
promotion. Therefore, the other invocations (in the gold-plugin and from
clang for the distributed back ends) will need to either invoke the
FunctionImporter directly (as in libLTO), passing in the import/export
information, or use a new pass interface that consumes the necessary info
to compute this information.
Eventually the import/export decisions should be made a single time in the
thin link step (as is currently done for libLTO which doesn’t use the
pass), along with any other global summary-based decisions. The advantage
is that each backend isn’t doing redundant computation, and I believe it is
safer to make global decisions affecting correctness (e.g. promotion) a
single time. For the gold-plugin launched threads, it should be
straightforward to use the libLTO approach of computing these decisions and
passing the relevant information to each backend thread via a direct
invocation of the FunctionImporter, instead of using the FunctionImportPass.
However, for distributed build backends, if the decisions are to be made a
single time in the thin link step, summary based decisions need to be
serialized out in order to be used by the FunctionImporter in each backend
process (which could be invoked directly from clang, or via a possibly
modified FunctionImportPass interface). My original plan was to mark
linkage changes determined globally (such as promotion decisions) in the
combined index itself for consumption in each back end. But an advantage of
serializing out just the necessary info for each module is that the entire
combined index wouldn’t need to be staged to each distributed build node.
Individual Module Index Files
---------------------------------------
Rather than define a new format for serializing out the globally determined
information from the thin link step, we can continue to use the combined
index file format. However, we can create an individual “combined” index
file for each module. This better enables passing along any summary
information useful for backend compilations beyond just import and export
lists, which can include other linkage optimizations, and information for
transformations such as CFI. It also enables leverage of much of the
existing combined index bitcode interfaces and data structures.
An overview on what is included in an individual “primary” module’s index
file:
1) Module symbol table only includes modules imported into the primary
module.
2) Summary section only includes summaries for value definitions that
should be imported, as well as for definitions in the primary module.
3) Any desired linkage changes for both the primary module and imported
defs are recorded in the summary entry linkage fields.
Note that 1 and 2 ensure that nothing can be imported beyond those values
marked promoted during the global thin link (important since that possibly
requires promotion in the exporting module). Any value that is imported as
a declaration (because it did not have a summary entry as per 2 above), and
that has local linkage, should automatically be promoted when importing
(its primary module’s index would include a summary with the promoted
linkage recorded).
Linkage Changes
-----------------------
As described above, the linkage changes determined by the global index walk
in the thin link step will be marked in the summary entries (in all
individual index files containing that symbol). The back end will compare
the linkage types in the index to those in the materialized bitcode (both
in the primary module and in any definitions being imported) and make the
necessary adjustments.
Some possibilities include:
A. Promotion: Index will indicate external linkage, so local value will be
promoted and renamed. For imported declarations, any that are local will be
promoted.
B. Avoiding promotion by forced import: Used when the thin link step
determines it is better to force an import of a static definition and leave
it static. The index will indicate local linkage, so linkage type in IR
will not be changed when it is imported (or when compiling the exporting
module).
C. Internalization by forced import: If an external symbol has 1 or only a
very small number of external references, and all referring modules decide
to import that definition, the thin link analysis could decide that it is
better to leave all copies local. The index would indicate local linkage,
and the linkage type in the IR would then be changed to local when it is
imported (and when compiling the exporting module)
D. LinkOnce -> Weak/AvailableExternally: This is a compile time
optimization to avoid unnecessarily keeping multiple copies of a LinkOnce
value. Linkage is marked in index, and again adjusted in the backends since
it will be different than the initial linkage after parsing.
Note that pcc has made a proposal to do some of the ThinLTO promotion and
renaming up front in the compile step, so that some functions can be
eagerly compiled into text (see
http://lists.llvm.org/pipermail/llvm-dev/2016-April/098081.html). However,
that will only apply to locals referenced by functions that are deemed
unlikely to import or be exported. The remaining locals can still be
promoted lazily.
Importing Strategy
------------------------
Strategy 1: Import exactly those defs for which we have summaries
Could use simplified/reduced summaries that strip the ref/call edges, since
they won’t be used by the backends.
Strategy 2: Allow the importer some flexibility to modify import decisions
In case we find situations where it is better to let the importer to adjust
decisions based on full information (not yet known whether we need this
flexibility, but I don’t want to remove this possibility until after more
performance tuning is done on large apps). The modified decisions must be
legal based on the linkage changes decided on during the thin link step
(described in A-D in prior section):
A. Promotion - Since we can only import at most the values for which we
were given summaries, which were known to be exported at link time, we can
safely ratchet down the amount of importing without rendering those
promotion decisions incorrect (some promotions may have been unnecessary if
we decide not to import something, but they are not wrong from a
correctness standpoint).
B&C. Avoiding promotion or internalization decisions - these rely on
forced import of the local or (to be) internalized values. Simply force
import anything with a summary that is marked as having local linkage in
the summary.
D. LinkOnce -> Weak/AvailableExternally - these are not based on
importing and are unaffected by the importer’s decisions.
Incremental Builds
-------------------------
A backend compilation needs to be rebuilt when it’s individual “combined”
index changes (it includes the module hashes of all relevant modules,
including the importing module, as well as all linkage decisions).
--
Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160414/b6a26f8e/attachment.html>
More information about the llvm-dev
mailing list