[LLVMdev] RFC: ThinLTO File API and Data Structures
Teresa Johnson
tejohnson at google.com
Mon Aug 3 09:19:13 PDT 2015
This RFC describes the data structures to hold the ThinLTO function
index/summary used to support function importing. It also describes the
high-level APIs for reading and writing this information. As discussed in
the high-level ThinLTO RFC (
http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-May/086211.html), we would
like to add support for native object wrapped bitcode and ThinLTO
information. Based on comments on the mailing list, I am adding support for
ThinLTO in both normal bitcode files, as well as native-object wrapped
bitcode.
I've implemented support for the data structures in
http://reviews.llvm.org/D11721, and support for some of the APIs (not the
libLTO API's, but the underlying ThinLTOObjectFile interfaces used by both
libLTO and directly by gold) in http://reviews.llvm.org/D11723.
The file format is described in a separate RFC I am sending simultaneously,
which contains a pointer to the patch implementing the bitcode
reading/writing support.
Looking forward to your feedback. Thanks!
Teresa
RFC: ThinLTO File API and Data Structures
This RFC covers a proposed API for ThinLTO clients (e.g. gold plugin,
linkers, llvm-lto) to use when reading and writing ThinLTO information from
bitcode files (raw or wrapped). The APIs are meant to hide the underlying
format of the files, much as the existing LTOModule and IRObject interfaces
hide the format when reading bitcode (i.e. they work transparently on
native wrapped bitcode).
See the following thread for background on ThinLTO and motivation for the
APIs:
http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-June/086310.html
Current Interfaces
Currently linkers such as ld64, lld and some proprietary linkers (e.g.
Sony), as well as the llvm-lto tool, utilize libLTO to read bitcode
intermediate files. Specifically, they use the LTOModule class and
interfaces, or the C interface to these (lto_module_* routines). The gold
plugin performs many of the same steps, but interfaces directly with the
underlying IRObjectFile class, encapsulating the following in the
IRObjectFile::create static method.
-
Check if file/mem contains bitcode (including native object wrapped
bitcode)
-
static bool LTOModule::isBitcode* and lto_module_is_object* interfaces
-
Reads file into a MemoryBuffer and invokes static
IRObjectFile::findBitcodeInMemBuffer which:
-
If memory buffer is bitcode, return as is
-
If memory buffer is object file, invoke
ObjectFile::createObjectFile, then
IRObjectFile::findBitcodeInObject to
look for .llvmbc section, if found return memory buffer for
section contents
-
Return true if findBitcodeInMemBuffer returns non-null
-
Read and parse bitcode from file/mem, build Module, returning
LTOModule/lto_module_t object
-
static LTOModule * LTOModule::createFrom* and lto_module_create_from*
interfaces
-
Same steps as the isBitcode interface above, but if
findBitcodeInMemBuffer returns a memory buffer:
-
Parse into a new Module object (or get a lazy bitcode parser
Module) via BitcodeReader interfaces.
-
Create IRObjectFile object (derived from SymbolicFile), save
Module here
-
Create LTOModule object, save IRObjectFile here
-
Return LTOModule object
The classes referred to above look like (only the relevant members shown):
include/llvm/LTO/LTOModule.h:
struct LTOModule {
...
std::unique_ptr<object::IRObjectFile> IRFile;
…
};
include/llvm/Object/IRObjectFile.h:
class IRObjectFile : public SymbolicFile {
std::unique_ptr<Module> M;
...
};
include/llvm/Object/ObjectFile.h:
/// ObjectFile - This class is the base class for all object file types.
/// Concrete instances of this object are created by createObjectFile, which
/// figures out which type to create.
class ObjectFile : public SymbolicFile {
...
};
include/llvm/IR/Module.h:
class Module {
…
};
New Interfaces for ThinLTO
High-level API Descriptions
The following main interfaces (e.g. in libLTO) will be needed to support
ThinLTO (detailed interface descriptions and data structures can be found
further down):
1. Write per-module index/summary to intermediate file
This interface writes the ThinLTO function index/summary for a module to
the intermediate file containing its bitcode (either bitcode only or native
object wrapped), and is invoked by the compiler during intermediate object
emission.
Inputs:
-
ThinLTO function index/summary
-
Output stream to write it into
1. Write combined index/summary file
This interface writes the combined function index/summary to a file. This
is the combined index/summary created by the plugin step from all modules.
The output file format will depend on the format of the intermediate files
(bitcode only or native object wrapped).
Inputs:
-
ThinLTO function index/summary
-
Output file path
1. ThinLTO intermediate file identification
This interface checks if an file or memory buffer contains a ThinLTO
function index/summary. File (possibly in memory buffer) can contain either
bitcode only or native object wrapped bitcode.
Input:
-
File or memory buffer to check
Output:
-
Boolean
1. Read per-module index/summary from intermediate file
This interface will read and parse the ThinLTO function index/summary from
an intermediate file for a module (either bitcode only or native object
wrapped).
Inputs:
-
File or memory buffer to check
Output:
-
ThinLTO function index/summary for module (format discussed below)
1. Read entire combined index/summary file
This interface will read and parse the combined function index/summary
file. The file format will depend on the format of the intermediate files
(bitcode only or native object wrapped).
Inputs:
-
Combined index/summary file path
Output:
-
ThinLTO function index/summary object (format discussed below)
1. Read given function from combined index/summary file
This interface will read and parse the combined function index/summary
file, parsing and populating the summary information for a single given
function. The file format will depend on the format of the intermediate
files (bitcode only or native object wrapped).
Inputs:
-
Combined index/summary file path
-
Function name
Output:
-
ThinLTO function index/summary object (format discussed below)
Some design considerations for the new interfaces:
-
Interfaces 1, 5 and 6 are used internally by the compiler.
-
Interfaces 2, 3 and 4 are used by the linker, and therefore are callable
via libLTO and from the gold-plugin.
-
Just as the bitcode file format is transparently handled by the existing
interfaces described above, the new ThinLTO interfaces should be similarly
independent of the underlying format (bitcode only vs native object
wrapped).
-
Interfaces 5 and 6 are similar, and may not both be needed eventually,
depending on tradeoffs investigated when tuning the ThinLTO implementation.
-
The interfaces for reading and writing the function index/summary should
look very similar regardless of whether we are doing the per-module
versions (1, 4) or the combined index/summary file (2, 5, 6). This is both
for consistency and to allow the implementation to share as much code as
possible. This design goal also means the format of the index/summary in
the module and in the combined file should be similar.
ThinLTO Function Index/Summary Data Structures
In order to save time and memory in the plugin step, we don’t want to parse
the entire bitcode for a module during the plugin step (which only needs to
read the function index/summary for merging into a combined index/summary).
We also don’t need to parse the ThinLTO information out of the module’s IR
when constructing the Module object during the backend compile step (the
ThinLTO importing step will read the combined index file, not the module’s
own ThinLTO index).
Therefore any data structure created to encapsulate ThinLTO function index
and summary information should be independent of LTOModule and
IRObjectFile, as those structures are created when we read/parse the
module’s normal (non-ThinLTO) bitcode, and result in the creation of a
Module object. This also simplifies the implementation of reading/writing
interfaces that can be shared between bitcode only and native object
wrapped bitcode formats, since the latter will represent the ThinLTO
information in a separate section and in the object’s symbol table.
For a description of the bitcode and native-wrapped file formats, see the
separate “ThinLTO File Format” RFC.
The proposed new data structures are outlined below. As mentioned earlier
in the interface design considerations, they are totally independent of
where the function index/summary sits in the intermediate file (whether in
a bitcode block or a native wrapped section/symtab):
-
ThinLTOModulePathStringTable: String table to hold/own module path
strings, which additionally holds the module ID assigned to each module
during the plugin step. This can simply be a typedef StringMap<uint64_t>,
since the StringMap makes a copy of and owns inserted strings.
-
ThinLTOFunctionInfo: Class to hold function’s bitcode index and summary
info. Includes:
-
Module path StringRefs (module strings owned by
ThinLTOModulePathStringTable)
-
Bitcode index of function in the module
-
ThinLTOFunctionSummary: Function summary information to aid in
importing decisions (e.g. instruction count, profile count).
-
Transient information used while reading/writing function summary
from file (specifically the function’s offset into encoded
function summary
section)
-
ThinLTOFunctionMap: Mapping from function name to corresponding
ThinLTOFunctionInfo(s)
-
Implemented via StringMap class, which makes a copy of and owns
inserted (function name) strings.
-
There may be more than one ThinLTOFunctionInfo for a given function
name in the combined function index/summary map due to COMDATs. While the
plugin step that creates the combined function map may decide to
select one
representative COMDAT instance, we don’t want the design to preclude
holding multiple as it may be advantageous to import a particular COMDAT
from different modules in different backend instances.
-
Therefore, use StringMap< std::vector<ThinLTOFunctionInfo> >
-
ThinLTOFunctionSummaryIndex: Class to hold ThinLTOModulePathStringTable
and ThinLTOFunctionMap and encapsulate methods for operating on them
-
Includes method for combining from another given
ThinLTOFunctionSummaryIndex instance (used when creating combined
index/summary map.
-
Used to hold both per-module and combined function index/summary.
-
Class ThinLTOObjectFile
-
Analogous to IRObjectFile, but holds ThinLTOFunctionSummaryIndex
instead of Module
-
Derived from SymbolicFile
-
New libLTO class ThinLTO
The new classes referred to above will look like (only the relevant members
shown):
include/llvm/LTO/LTOThinLTO.h:
struct ThinLTO {
...
std::unique_ptr<object::ThinLTOObjectFile> ThinLTOFile;
…
};
include/llvm/Object/ThinLTOObjectFile.h:
class ThinLTOObjectFile : public SymbolicFile {
std::unique_ptr<ThinLTOFunctionSummaryIndex> Index;
...
};
include/llvm/IR/ThinLTOInfo.h:
typedef std::vector<ThinLTOFunctionInfo> ThinLTOFunctionInfoList;
typedef StringMap<ThinLTOFunctionInfoList> ThinLTOFunctionMap;
typedef StringMap<uint64_t> ThinLTOModulePathStringTable;
class ThinLTOFunctionSummaryIndex {
ThinLTOFunctionMap FunctionMap;
ThinLTOModulePathStringTable ModulePathStringTable;
...
};
class ThinLTOFunctionInfo {
StringRef ModulePath;
uint64_t BitcodeIndex;
ThinLTOFunctionSummary *FunctionSummary;
uint64_t FunctionSummarySecOffset; // Used during parsing
uint64_t FunctionSummarySecSize; // Used during parsing
...
};
class ThinLTOFunctionSummary {
// TBD (includes function size, hotness, …)
};
Detailed API Descriptions
With the above data structures in mind, this section shows the refined and
more detailed interface specifications.
Note that the bitcode reader interfaces are not shown here, since this
document focuses on the higher-level API and data structures. But this will
use a new ThinLTOBitcodeReader class which will save and populate the newly
created ThinLTOFunctionSummaryIndex object (similar to how the
BitcodeReader class saves and populates a Module object). The
ThinLTOBitcodeReader class contains methods for parsing the ThinLTO bitcode
blocks in a bitcode-only file, as well as the bitcode-encoded ThinLTO
summary and module string table sections in the native-wrapped case. This
is also discussed in the separate “ThinLTO File Format” RFC.
API Details:
1. Write per-module index/summary to intermediate file
This interface writes the ThinLTO function index/summary for a module to
the intermediate file containing its bitcode (either bitcode only or native
object wrapped), and is invoked by the compiler during intermediate object
emission (so no libLTO interfaces).
Inputs:
-
ThinLTO function index/summary object
-
Output stream to write it into
Interfaces:
-
static void WriteThinLTOBlock(const ThinLTOFunctionSummaryIndex *Index,
BitstreamWriter &Stream)
-
void llvm::WriteThinLTOToStreamer(const ThinLTOFunctionSummaryIndex
*Index, MCStreamer &MCS)
Notes:
-
The former (taking a BitstreamWriter) is used when we are writing
bitcode-only (invoked from WriteModule under the appropriate option) and
the latter (taking MCStreamer) indicates we are writing wrapped to the
given native object streamer.
1. Write combined index/summary file
This interface writes the combined function index/summary to a file. This
is the combined index/summary created by the plugin step from all modules.
The output file format will depend on the format of the intermediate files
(bitcode only or native object wrapped).
Inputs:
-
ThinLTO function index/summary
-
Output file path
-
Output triple
Outputs:
-
Error status (message or boolean)
Interfaces:
-
void ThinLTO::writeToFile(const char* path, const char* triple,
std::string &errMsg) /* libLTO C++ */
-
lto_bool_t lto_thinlto_write_to_file(const char* path, const char*
triple) /* libLTO C */
-
std::error_code ThinLTOObjectFile::writeToFile(const char* path, const
char* triple)
Notes:
-
The triple is non-null in the case where we are writing native-wrapped
bitcode, and is used to construct the appropriate MCStreamer object.
-
The gold plugin will interact directly with ThinLTOObjectFile instead of
the libLTO ThinLTO class, as it does with IRObjectFile.
-
The libLTO interfaces will invoke the ThinLTOObjectFile method
-
Leverages the same underlying writers as #1
1. ThinLTO intermediate file identification
This interface checks if an file or memory buffer contains a ThinLTO
function index/summary. File (possibly in memory buffer) can contain either
bitcode only or native object wrapped bitcode.
Input:
-
File or memory buffer to check
Output:
-
Boolean
Interfaces:
-
static bool ThinLTO::hasThinLTOIndexInFile(const char *path) /* libLTO
C++ */
-
static bool ThinLTO::hasThinLTOIndexInFile(const void *mem, size_t
length) /* libLTO C++ */
-
lto_bool_t lto_has_thinlto_index(const char *path) /* libLTO C */
-
lto_bool_t lto_has_thinlto_index_in_memory(const void *mem, size_t
length) /* libLTO C */
-
static bool ThinLTOObjectFile::hasThinLTOInMemBuffer(MemoryBufferRef
Object)
Notes:
-
The libLTO routines invoke the ThinLTOObjectFile function that checks
for ThinLTO function index/summary information using format-specific
readers.
-
The gold plugin will interact directly with ThinLTOObjectFile instead of
the libLTO ThinLTO class, as it does with IRObjectFile.
1. Read per-module index/summary from intermediate file
This interface will read and parse the ThinLTO function index/summary from
an intermediate file (possibly in memory) for a module (either bitcode only
or native object wrapped).
Inputs:
-
File or memory buffer to check
Output:
-
ThinLTO function index/summary for module (format discussed below)
-
Error status (libLTO C interfaces set sLastErrorString)
Interfaces:
-
static ThinLTO * ThinLTO::createFromFile(const char *path, std::string
&errMsg) /* libLTO C++ */
-
static ThinLTO * ThinLTO::createFromOpenFile(int fd, const char *path,
size_t size, std::string &errMsg) /* libLTO C++ */
-
static ThinLTO * ThinLTO::createFromBuffer(const void *mem, const char
*path, size_t length, std::string &errMsg) /* libLTO C++ */
-
thin_lto_t lto_thinlto_create(const char *path) /* libLTO C */
-
thin_lto_t lto_thinlto_create_from_fd(int fd, const char *path, size_t
size) /* libLTO C */
-
thin_lto_t lto_thinlto_create_from_memory(const void *mem, const char
*path, size_t length) /* libLTO C */
-
static ErrorOr<std::unique_ptr<ThinLTOObjectFile>>
object::ThinLTOObjectFile::create(MemoryBufferRef Object, bool
ReadFuncSummaryData = true)
Notes:
-
The optional ReadFuncSummaryData boolean flag to
ThinLTOObjectFile::create is used to support the interface discussed below
in #6, but is true in this context.
-
The libLTO routines invoke the ThinLTOObjectFile function that reads the
given function index/summary by invoking format-specific readers/parsers.
The resulting ThinLTOObjectFile is saved in new ThinLTO object.
-
The gold plugin will interact directly with ThinLTOObjectFile instead of
the libLTO ThinLTO class, as it does with IRObjectFile
1. Read entire combined index/summary file
This interface will read and parse the combined function index/summary
file. The file format will depend on the format of the intermediate files
(bitcode only or native object wrapped). Invoked from compiler during
ThinLTO importing (so no libLTO interfaces).
Inputs:
-
Combined index/summary file in memory buffer
-
Boolean indicating whether to read function summary data (in addition to
symbol table), true by default and in this context.
Output:
-
ThinLTO function index/summary object (wrapped in ErrorOr)
Interface:
-
static ErrorOr<std::unique_ptr<ThinLTOObjectFile>>
object::ThinLTOObjectFile::create(MemoryBufferRef Object, bool
ReadFuncSummaryData = true)
Notes:
-
This is the same ThinLTOObjectFile interface shown above in #4.
-
The optional ReadFuncSummaryData boolean flag is used to support the
interface discussed below in #6, but is true in this context.
1. Read given function from combined index/summary file
This interface will read and parse the combined function index/summary
file, parsing and populating the summary information for a single given
function. The file format will depend on the format of the intermediate
files (bitcode only or native object wrapped).
Inputs:
-
Combined index/summary file in memory buffer
-
Function name
-
ThinLTOObjectFile object (partially populated)
Output:
-
Error status
Side Effect:
-
The ThinLTOFunctionInfo entry for the given function is populated in the
given ThinLTOObjectFile
Interfaces (2 steps):
-
static ErrorOr<std::unique_ptr<ThinLTOObjectFile>>
object::ThinLTOObjectFile::create(MemoryBufferRef Object, false /*bool
ReadFuncSummaryData = true*/)
-
std::error_code
ThinLTOObjectFile::findThinLTOFunctionInfoInMemBuffer(MemoryBufferRef
Object, StringRef FunctionName)
Notes:
-
This usage model requires first reading the ThinLTO symbol table
information using the first interface ThinLTOObjectFile::create (same
interface as in #5/#6), but with ReadFuncSummaryData=false. In that case
the resulting ThinLTOObjectFile object is not fully populated.
Specifically, the ThinLTOFunctionInfo entries are not yet populated with
the bitcode index and function summary information.
-
Subsequent invocations to read specific function summaries use the
second interface.
--
Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150803/a002f570/attachment.html>
More information about the llvm-dev
mailing list