[LLVMdev] RFC: ThinLTO File API and Data Structures

Teresa Johnson tejohnson at google.com
Mon Aug 3 09:19:13 PDT 2015


This RFC describes the data structures to hold the ThinLTO function
index/summary used to support function importing. It also describes the
high-level APIs for reading and writing this information. As discussed in
the high-level ThinLTO RFC (
http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-May/086211.html), we would
like to add support for native object wrapped bitcode and ThinLTO
information. Based on comments on the mailing list, I am adding support for
ThinLTO in both normal bitcode files, as well as native-object wrapped
bitcode.

I've implemented support for the data structures in
http://reviews.llvm.org/D11721, and support for some of the APIs (not the
libLTO API's, but the underlying ThinLTOObjectFile interfaces used by both
libLTO and directly by gold) in http://reviews.llvm.org/D11723.

The file format is described in a separate RFC I am sending simultaneously,
which contains a pointer to the patch implementing the bitcode
reading/writing support.

Looking forward to your feedback. Thanks!
Teresa



RFC: ThinLTO File API and Data Structures


This RFC covers a proposed API for ThinLTO clients (e.g. gold plugin,
linkers, llvm-lto) to use when reading and writing ThinLTO information from
bitcode files (raw or wrapped). The APIs are meant to hide the underlying
format of the files, much as the existing LTOModule and IRObject interfaces
hide the format when reading bitcode (i.e. they work transparently on
native wrapped bitcode).


See the following thread for background on ThinLTO and motivation for the
APIs:

http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-June/086310.html


Current Interfaces


Currently linkers such as ld64, lld and some proprietary linkers (e.g.
Sony), as well as the llvm-lto tool, utilize libLTO to read bitcode
intermediate files. Specifically, they use the LTOModule class and
interfaces, or the C interface to these (lto_module_* routines). The gold
plugin performs many of the same steps, but interfaces directly with the
underlying IRObjectFile class, encapsulating the following in the
IRObjectFile::create static method.



   -

   Check if file/mem contains bitcode (including native object wrapped
   bitcode)
   -

      static bool LTOModule::isBitcode* and lto_module_is_object* interfaces
      -

      Reads file into a MemoryBuffer and invokes static
      IRObjectFile::findBitcodeInMemBuffer which:
      -

         If memory buffer is bitcode, return as is
         -

         If memory buffer is object file, invoke
         ObjectFile::createObjectFile, then
IRObjectFile::findBitcodeInObject to
         look for .llvmbc section, if found return memory buffer for
section contents
         -

      Return true if findBitcodeInMemBuffer returns non-null
      -

   Read and parse bitcode from file/mem, build Module, returning
   LTOModule/lto_module_t object
   -

      static LTOModule * LTOModule::createFrom* and lto_module_create_from*
      interfaces
      -

      Same steps as the isBitcode interface above, but if
      findBitcodeInMemBuffer returns a memory buffer:
      -

         Parse into a new Module object (or get a lazy bitcode parser
         Module) via BitcodeReader interfaces.
         -

         Create IRObjectFile object (derived from SymbolicFile), save
         Module here
         -

         Create LTOModule object, save IRObjectFile here
         -

      Return LTOModule object


The classes referred to above look like (only the relevant members shown):

include/llvm/LTO/LTOModule.h:

struct LTOModule {

    ...
    std::unique_ptr<object::IRObjectFile> IRFile;
    …

};


include/llvm/Object/IRObjectFile.h:

class IRObjectFile : public SymbolicFile {

 std::unique_ptr<Module> M;

 ...

};


include/llvm/Object/ObjectFile.h:

/// ObjectFile - This class is the base class for all object file types.
/// Concrete instances of this object are created by createObjectFile, which
/// figures out which type to create.

class ObjectFile : public SymbolicFile {

 ...
};


include/llvm/IR/Module.h:

class Module {

 …

};
New Interfaces for ThinLTO


High-level API Descriptions


The following main interfaces (e.g. in libLTO) will be needed to support
ThinLTO (detailed interface descriptions and data structures can be found
further down):

   1. Write per-module index/summary to intermediate file

This interface writes the ThinLTO function index/summary for a module to
the intermediate file containing its bitcode (either bitcode only or native
object wrapped), and is invoked by the compiler during intermediate object
emission.


Inputs:

   -

   ThinLTO function index/summary
   -

   Output stream to write it into



   1. Write combined index/summary file

This interface writes the combined function index/summary to a file. This
is the combined index/summary created by the plugin step from all modules.
The output file format will depend on the format of the intermediate files
(bitcode only or native object wrapped).


Inputs:

   -

   ThinLTO function index/summary
   -

   Output file path



   1. ThinLTO intermediate file identification

This interface checks if an file or memory buffer contains a ThinLTO
function index/summary. File (possibly in memory buffer) can contain either
bitcode only or native object wrapped bitcode.


Input:

   -

   File or memory buffer to check

Output:

   -

   Boolean



   1. Read per-module index/summary from intermediate file

This interface will read and parse the ThinLTO function index/summary from
an intermediate file for a module (either bitcode only or native object
wrapped).


Inputs:

   -

   File or memory buffer to check

Output:

   -

   ThinLTO function index/summary for module (format discussed below)



   1. Read entire combined index/summary file

This interface will read and parse the combined function index/summary
file. The file format will depend on the format of the intermediate files
(bitcode only or native object wrapped).


Inputs:

   -

   Combined index/summary file path

Output:

   -

   ThinLTO function index/summary object (format discussed below)



   1. Read given function from combined index/summary file

This interface will read and parse the combined function index/summary
file, parsing and populating the summary information for a single given
function. The file format will depend on the format of the intermediate
files (bitcode only or native object wrapped).


Inputs:

   -

   Combined index/summary file path
   -

   Function name

Output:

   -

   ThinLTO function index/summary object (format discussed below)



Some design considerations for the new interfaces:

   -

   Interfaces 1, 5 and 6 are used internally by the compiler.
   -

   Interfaces 2, 3 and 4 are used by the linker, and therefore are callable
   via libLTO and from the gold-plugin.
   -

   Just as the bitcode file format is transparently handled by the existing
   interfaces described above, the new ThinLTO interfaces should be similarly
   independent of the underlying format (bitcode only vs native object
   wrapped).
   -

   Interfaces 5 and 6 are similar, and may not both be needed eventually,
   depending on tradeoffs investigated when tuning the ThinLTO implementation.
   -

   The interfaces for reading and writing the function index/summary should
   look very similar regardless of whether we are doing the per-module
   versions (1, 4) or the combined index/summary file (2, 5, 6). This is both
   for consistency and to allow the implementation to share as much code as
   possible. This design goal also means the format of the index/summary in
   the module and in the combined file should be similar.


ThinLTO Function Index/Summary Data Structures


In order to save time and memory in the plugin step, we don’t want to parse
the entire bitcode for a module during the plugin step (which only needs to
read the function index/summary for merging into a combined index/summary).
We also don’t need to parse the ThinLTO information out of the module’s IR
when constructing the Module object during the backend compile step (the
ThinLTO importing step will read the combined index file, not the module’s
own ThinLTO index).


Therefore any data structure created to encapsulate ThinLTO function index
and summary information should be independent of LTOModule and
IRObjectFile, as those structures are created when we read/parse the
module’s normal (non-ThinLTO) bitcode, and result in the creation of a
Module object. This also simplifies the implementation of reading/writing
interfaces that can be shared between bitcode only and native object
wrapped bitcode formats, since the latter will represent the ThinLTO
information in a separate section and in the object’s symbol table.


For a description of the bitcode and native-wrapped file formats, see the
separate “ThinLTO File Format” RFC.


The proposed new data structures are outlined below. As mentioned earlier
in the interface design considerations, they are totally independent of
where the function index/summary sits in the intermediate file (whether in
a bitcode block or a native wrapped section/symtab):

   -

   ThinLTOModulePathStringTable: String table to hold/own module path
   strings, which additionally holds the module ID assigned to each module
   during the plugin step. This can simply be a typedef StringMap<uint64_t>,
   since the StringMap makes a copy of and owns inserted strings.
   -

   ThinLTOFunctionInfo: Class to hold function’s bitcode index and summary
   info. Includes:
   -

      Module path StringRefs (module strings owned by
      ThinLTOModulePathStringTable)
      -

      Bitcode index of function in the module
      -

      ThinLTOFunctionSummary: Function summary information to aid in
      importing decisions (e.g. instruction count, profile count).
      -

      Transient information used while reading/writing function summary
      from file (specifically the function’s offset into encoded
function summary
      section)
      -

   ThinLTOFunctionMap: Mapping from function name to corresponding
   ThinLTOFunctionInfo(s)
   -

      Implemented via StringMap class, which makes a copy of and owns
      inserted (function name) strings.
      -

      There may be more than one ThinLTOFunctionInfo for a given function
      name in the combined function index/summary map due to COMDATs. While the
      plugin step that creates the combined function map may decide to
select one
      representative COMDAT instance, we don’t want the design to preclude
      holding multiple as it may be advantageous to import a particular COMDAT
      from different modules in different backend instances.
      -

      Therefore, use StringMap< std::vector<ThinLTOFunctionInfo> >
      -

   ThinLTOFunctionSummaryIndex: Class to hold ThinLTOModulePathStringTable
   and ThinLTOFunctionMap and encapsulate methods for operating on them
   -

      Includes method for combining from another given
      ThinLTOFunctionSummaryIndex instance (used when creating combined
      index/summary map.
      -

      Used to hold both per-module and combined function index/summary.
      -

   Class ThinLTOObjectFile
   -

      Analogous to IRObjectFile, but holds ThinLTOFunctionSummaryIndex
      instead of Module
      -

      Derived from SymbolicFile
      -

   New libLTO class ThinLTO


The new classes referred to above will look like (only the relevant members
shown):


include/llvm/LTO/LTOThinLTO.h:

struct ThinLTO {
    ...
    std::unique_ptr<object::ThinLTOObjectFile> ThinLTOFile;
    …
};

include/llvm/Object/ThinLTOObjectFile.h:

class ThinLTOObjectFile : public SymbolicFile {
 std::unique_ptr<ThinLTOFunctionSummaryIndex> Index;
 ...
};


include/llvm/IR/ThinLTOInfo.h:

typedef std::vector<ThinLTOFunctionInfo> ThinLTOFunctionInfoList;

typedef StringMap<ThinLTOFunctionInfoList> ThinLTOFunctionMap;

typedef StringMap<uint64_t> ThinLTOModulePathStringTable;


class ThinLTOFunctionSummaryIndex {

  ThinLTOFunctionMap FunctionMap;

  ThinLTOModulePathStringTable ModulePathStringTable;

   ...

};


class ThinLTOFunctionInfo {

  StringRef ModulePath;

  uint64_t BitcodeIndex;

  ThinLTOFunctionSummary *FunctionSummary;

  uint64_t FunctionSummarySecOffset; // Used during parsing

  uint64_t FunctionSummarySecSize; // Used during parsing

  ...

};


class ThinLTOFunctionSummary {

  // TBD (includes function size, hotness, …)

};



Detailed API Descriptions


With the above data structures in mind, this section shows the refined and
more detailed interface specifications.


Note that the bitcode reader interfaces are not shown here, since this
document focuses on the higher-level API and data structures. But this will
use a new ThinLTOBitcodeReader class which will save and populate the newly
created ThinLTOFunctionSummaryIndex object (similar to how the
BitcodeReader class saves and populates a Module object). The
ThinLTOBitcodeReader class contains methods for parsing the ThinLTO bitcode
blocks in a bitcode-only file, as well as the bitcode-encoded ThinLTO
summary and module string table sections in the native-wrapped case. This
is also discussed in the separate “ThinLTO File Format” RFC.


API Details:

   1. Write per-module index/summary to intermediate file

This interface writes the ThinLTO function index/summary for a module to
the intermediate file containing its bitcode (either bitcode only or native
object wrapped), and is invoked by the compiler during intermediate object
emission (so no libLTO interfaces).


Inputs:

   -

   ThinLTO function index/summary object
   -

   Output stream to write it into


Interfaces:

   -

   static void WriteThinLTOBlock(const ThinLTOFunctionSummaryIndex *Index,
   BitstreamWriter &Stream)
   -

   void llvm::WriteThinLTOToStreamer(const ThinLTOFunctionSummaryIndex
   *Index, MCStreamer &MCS)


Notes:

   -

   The former (taking a BitstreamWriter) is used when we are writing
   bitcode-only (invoked from WriteModule under the appropriate option) and
   the latter (taking MCStreamer) indicates we are writing wrapped to the
   given native object streamer.



   1. Write combined index/summary file

This interface writes the combined function index/summary to a file. This
is the combined index/summary created by the plugin step from all modules.
The output file format will depend on the format of the intermediate files
(bitcode only or native object wrapped).


Inputs:

   -

   ThinLTO function index/summary
   -

   Output file path
   -

   Output triple

Outputs:

   -

   Error status (message or boolean)


Interfaces:

   -

   void ThinLTO::writeToFile(const char* path, const char* triple,
   std::string &errMsg) /* libLTO C++ */
   -

   lto_bool_t lto_thinlto_write_to_file(const char* path, const char*
   triple) /* libLTO C */
   -

   std::error_code ThinLTOObjectFile::writeToFile(const char* path, const
   char* triple)


Notes:

   -

   The triple is non-null in the case where we are writing native-wrapped
   bitcode, and is used to construct the appropriate MCStreamer object.
   -

   The gold plugin will interact directly with ThinLTOObjectFile instead of
   the libLTO ThinLTO class, as it does with IRObjectFile.
   -

   The libLTO interfaces will invoke the ThinLTOObjectFile method
   -

   Leverages the same underlying writers as #1



   1. ThinLTO intermediate file identification

This interface checks if an file or memory buffer contains a ThinLTO
function index/summary. File (possibly in memory buffer) can contain either
bitcode only or native object wrapped bitcode.


Input:

   -

   File or memory buffer to check

Output:

   -

   Boolean


Interfaces:

   -

   static bool ThinLTO::hasThinLTOIndexInFile(const char *path) /* libLTO
   C++ */
   -

   static bool ThinLTO::hasThinLTOIndexInFile(const void *mem, size_t
   length)  /* libLTO C++ */
   -

   lto_bool_t lto_has_thinlto_index(const char *path) /* libLTO C */
   -

   lto_bool_t lto_has_thinlto_index_in_memory(const void *mem, size_t
   length) /* libLTO C */
   -

   static bool ThinLTOObjectFile::hasThinLTOInMemBuffer(MemoryBufferRef
   Object)


Notes:

   -

   The libLTO routines invoke the ThinLTOObjectFile function that checks
   for ThinLTO function index/summary information using format-specific
   readers.
   -

   The gold plugin will interact directly with ThinLTOObjectFile instead of
   the libLTO ThinLTO class, as it does with IRObjectFile.



   1. Read per-module index/summary from intermediate file

This interface will read and parse the ThinLTO function index/summary from
an intermediate file (possibly in memory) for a module (either bitcode only
or native object wrapped).


Inputs:

   -

   File or memory buffer to check

Output:

   -

   ThinLTO function index/summary for module (format discussed below)
   -

   Error status (libLTO C interfaces set sLastErrorString)


Interfaces:

   -

   static ThinLTO * ThinLTO::createFromFile(const char *path, std::string
   &errMsg) /* libLTO C++ */
   -

   static ThinLTO * ThinLTO::createFromOpenFile(int fd, const char *path,
   size_t size, std::string &errMsg) /* libLTO C++ */
   -

   static ThinLTO * ThinLTO::createFromBuffer(const void *mem, const char
   *path, size_t length, std::string &errMsg) /* libLTO C++ */
   -

   thin_lto_t lto_thinlto_create(const char *path) /* libLTO C */
   -

   thin_lto_t lto_thinlto_create_from_fd(int fd, const char *path, size_t
   size) /* libLTO C */
   -

   thin_lto_t lto_thinlto_create_from_memory(const void *mem, const char
   *path, size_t length) /* libLTO C */
   -

   static ErrorOr<std::unique_ptr<ThinLTOObjectFile>>
   object::ThinLTOObjectFile::create(MemoryBufferRef Object, bool
   ReadFuncSummaryData = true)


Notes:

   -

   The optional ReadFuncSummaryData boolean flag to
   ThinLTOObjectFile::create is used to support the interface discussed below
   in #6, but is true in this context.
   -

   The libLTO routines invoke the ThinLTOObjectFile function that reads the
   given function index/summary by invoking format-specific readers/parsers.
   The resulting ThinLTOObjectFile is saved in new ThinLTO object.
   -

   The gold plugin will interact directly with ThinLTOObjectFile instead of
   the libLTO ThinLTO class, as it does with IRObjectFile



   1. Read entire combined index/summary file

This interface will read and parse the combined function index/summary
file. The file format will depend on the format of the intermediate files
(bitcode only or native object wrapped). Invoked from compiler during
ThinLTO importing (so no libLTO interfaces).


Inputs:

   -

   Combined index/summary file in memory buffer
   -

   Boolean indicating whether to read function summary data (in addition to
   symbol table), true by default and in this context.

Output:

   -

   ThinLTO function index/summary object (wrapped in ErrorOr)


Interface:

   -

   static ErrorOr<std::unique_ptr<ThinLTOObjectFile>>
   object::ThinLTOObjectFile::create(MemoryBufferRef Object, bool
   ReadFuncSummaryData = true)


Notes:

   -

   This is the same ThinLTOObjectFile interface shown above in #4.
   -

   The optional ReadFuncSummaryData boolean flag is used to support the
   interface discussed below in #6, but is true in this context.



   1. Read given function from combined index/summary file

This interface will read and parse the combined function index/summary
file, parsing and populating the summary information for a single given
function. The file format will depend on the format of the intermediate
files (bitcode only or native object wrapped).


Inputs:

   -

   Combined index/summary file in memory buffer
   -

   Function name
   -

   ThinLTOObjectFile object (partially populated)

Output:

   -

   Error status

Side Effect:

   -

   The ThinLTOFunctionInfo entry for the given function is populated in the
   given ThinLTOObjectFile


Interfaces (2 steps):

   -

   static ErrorOr<std::unique_ptr<ThinLTOObjectFile>>
   object::ThinLTOObjectFile::create(MemoryBufferRef Object, false /*bool
   ReadFuncSummaryData = true*/)
   -

   std::error_code
   ThinLTOObjectFile::findThinLTOFunctionInfoInMemBuffer(MemoryBufferRef
   Object, StringRef FunctionName)


Notes:


   -

   This usage model requires first reading the ThinLTO symbol table
   information using the first interface ThinLTOObjectFile::create (same
   interface as in #5/#6), but with ReadFuncSummaryData=false. In that case
   the resulting ThinLTOObjectFile object is not fully populated.
   Specifically, the ThinLTOFunctionInfo entries are not yet populated with
   the bitcode index and function summary information.
   -

   Subsequent invocations to read specific function summaries use the
   second interface.


-- 
Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150803/a002f570/attachment.html>


More information about the llvm-dev mailing list