[llvm-dev] [LLVMdev] RFC: ThinLTO File Format

Tue Sep 1 17:43:18 PDT 2015

Overall, I really like the direction of the new proposal.  The 
integration with existing lazy reading mechanisms is good to see.

w.r.t. the function summary section, I'd suggest that this doesn't need 
to be ThinLTO specific either.  It also sounds like this isn't fully 
fleshed out yet, so what I might suggest is that we explicitly design 
this strictly as a cache of the information stored elsewhere in the 
file.  If we did so, we could freely evolve the format without worrying 
about maintaining backwards compatibility by just ignoring the contents 
of the summary section (by rebuilding it) unless it's an exact match.  
If we wrote the interface carefully, this could be entirely invisible to 
the consumers of the file.

w.r.t. the summary file, this feels like it has a lot in common with 
bitcode linking.  Is there infrastructure which could be shared here?

Philip

On 09/01/2015 11:04 AM, Teresa Johnson via llvm-dev wrote:
> This RFC and the patches listed below are now obsolete. I have written 
> up the bitcode format discussed with Duncan and others, which I've 
> copied below but the link to the doc with potentially better 
> formatting is:
> https://drive.google.com/file/d/0B036uwnWM6RWdnBLakxmeDdOeXc/view.
>
> Duncan, can you take a look and make sure this properly describes the 
> format changes we discussed?
>
> I just sent a patch which implements the part of the bitcode format 
> changes that apply to the lazy function reader: 
> http://reviews.llvm.org/D12536. The current patch does not contain any 
> of the ThinLTO specific changes, which will be sent as a follow-on patch.
>
> I've additionally created a site that contains links to all the 
> existing ThinLTO related RFCs and patches, which I will try to keep 
> updated:
> https://sites.google.com/site/llvmthinlto
>
> Thanks,
> Teresa
>
> ThinLTO Bitcode Format Design
>
>
>   Bitcode Format Changes to Support ThinLTO
>
> During importing, ThinLTO needs the following information for a 
> potentially imported function:
>
>  *
>
>     Summary information to determine import profitability (e.g. inst
>     count, hotness, etc)
>
>  *
>
>     Location of function IR (path to module, bitcode offset)
>
>  *
>
>     Import source module unique identifier (for consistent renaming of
>     promoted locals)
>
> ThinLTO interaction with function summary/index:
>
>  *
>
>     Phase-1 compile (-c -fthinlto) produces summary info and records
>     function bitcode offsets
>
>  *
>
>     Phase-2 link aggregates all function summary and index information
>     into combined index file, assigns unique module ids, records
>     module paths and ids in combined index file
>
>  *
>
>     Phase-3 parallel backend processes independently use combined
>     index for function importing decisions and mechanics
>
>
>   Lazy Reader Support
>
> The existing function lazy reader builds an index of function blocks 
> bitcode offsets on the fly while initially walking through (and 
> skipping) function blocks. ThinLTO also needs an index of function 
> blocks but we don’t want to pay the cost of building this on the fly 
> each time a function is imported. The bitcode indexes of function 
> records will therefore be added to ValueSymbolTable (VST) function 
> records, and the existing lazy function reader will be changed to use 
> this index rather than building it on the fly. Then ThinLTO importing 
> can easily leverage the same infrastructure as lazy function reading.
>
>
> Specifically, augment the existing lazy reader with the function 
> bitcode index:
>
>  *
>
>     Build lazy reader’s DeferredFunctionInfo map (maps from Function*
>     to function block bitcode offset) from index rather than by
>     parsing the function blocks, use during currently lazy reading as
>     well as ThinLTO importing.
>
>  *
>
>     Means that bitcode index is needed before the function blocks in
>     the bitcode (phase-ordering issue discussed below)
>
>  *
>
>     Include bitcode offset with ValueSymbolTable (VST) function records
>
>
> Function offset needed in bitcode before function blocks (in order to 
> use for lazy function reading). However, we don’t have function 
> offsets when this part of bitcode being encoded/written. This requires 
> some kind of backpatching. There are several approaches:
>
> 1.
>
>     Backpatching a bitcode offset precludes encoding it as a VBR,
>     since we don’t know how many chunks are required. This means any
>     backpatched bitcode offsets must be 64-bit fixed. Doing this for
>     every function VST record can result in high overhead.
>
> 2.
>
>     Could encode the function blocks twice, once before the VST in a
>     temp stream to get each bitcode offset into the stream of function
>     blocks, then again into the final location in the real output
>     stream (or copy over the pre-encoded function blocks at the final
>     location). This has time overhead, but allows VBRs to be used for
>     encoding the offsets (which are offsets from the start of the
>     function blocks, not from the start of the file).
>
> 3.
>
>     Encode the VST after the function blocks, but place a new forward
>     declaration VST record at the point where we previously had the
>     VST (before the function blocks). Only the one forward decl record
>     needs to be backpatched with a 64-bit fixed offset (can likely get
>     by with a 32-bit word offset as the real VST block should be word
>     aligned). The reader needs to be taught to jump to and parse the
>     real VST when seeing the forward decl VST, and to jump back after
>     reading it.
>
>
> Proposal #3 above (forward decl VST) is the approach that was agreed 
> to and is being implemented. The first patch will implement this new 
> forward decl VST, add the bitcode offsets to the real VST, and change 
> the lazy reader to use the bitcode offsets from the VST instead of 
> building up the DeferredFunctionInfo on the fly.
>
>
> The new VST bitcode (used by the lazy function reader even without 
> ThinLTO) are shown below:
>
>
> A.c:
>
>  A1() {...}
>
>  static A2() {...}
>
>
> BitOffset
>  0      <MODULE_BLOCK>
>     ...
>           // MODULE_CODE_VSTOFFSET: [wordoffset(32-bit fixed)]
>           <VSTOFFSET 20/> // 20*32 = 640
> 320       <FUNCTION_BLOCK>
>              ...
>           </FUNCTION_BLOCK>
> 480       <FUNCTION_BLOCK>
>              ...
>           </FUNCTION_BLOCK>
> 640       <VALUE_SYMTAB>
> // VST_FNENTRY: [valueid, funcoffset(VBR), namechar x N]
>              <FNENTRY 0, 10, “A1”/> // 10*32 = 320
>              <FNENTRY 1, 15, “A2”/> // 15*32 = 480
>              ...
>           </VALUE_SYMTAB>
>         </MODULE_BLOCK>
>
>
>   ThinLTO-Specific Bitcode Changes
>
> In addition to the VST changes above, for ThinLTO importing additional 
> bitcode blocks are needed. These will initially only be generated 
> under -fthinlto, unless other use cases are identified. The bitcode 
> changes are summarized below.
>
>
>       Per-Module Bitcode
>
> This pertains to bitcode generated by the phase-1 compile step 
> (-fthinlto -c). It includes one new block that holds summary 
> information for the functions in that module, summarized below:
>
>  *
>
>     Function summary info encoded in new FUNCTION_SUMMARY_BLOCK.
>
>      o
>
>         One record per function with summary data containing: VST
>         value id, islocal flag for phase-2 renaming decisions, summary
>         info for importing decisions (e.g. instruction count). The
>         summary information will evolve over time.
>
>
> Note that the summary block is only used to create the combined index 
> in phase 2. It is not used when compiling that module through the 
> phase 3 backend. The earlier example is expanded with the function 
> summary block below:
>
>
> BitOffset
>  0     <MODULE_BLOCK>
>     ...
>           // MODULE_CODE_VSTOFFSET: [wordoffset(32-bit fixed)]
>           <VSTOFFSET 20/> // 20*32 = 640
> 320       <FUNCTION_BLOCK>
>              ...
>           </FUNCTION_BLOCK>
> 480       <FUNCTION_BLOCK>
>              ...
>           </FUNCTION_BLOCK>
> 640       <VALUE_SYMTAB>
>              // VST_FNENTRY: [valueid, funcoffset(VBR), namechar x N]
>              <FNENTRY 0, 10, “A1”/> // 10*32 = 320
>              <FNENTRY 1, 15, “A2”/> // 15*32 = 480
>              ...
>           </VALUE_SYMTAB>
>           <FUNCTION_SUMMARY_BLOCK>
>              // FS_ENTRY: [valueid, islocal, instcount]
>            <ENTRY 0, 0, 10/>
>            <ENTRY 1, 1, 15/>
>           </FUNCTION_SUMMARY_BLOCK>
>         </MODULE_BLOCK>
>
>
>       Combined Index File Bitcode
>
> This pertains to the combined index file generated by the phase-2 link 
> step, which is encoded as bitcode. This file contains a single 
> MODULE_BLOCK with only 3 subblocks: the VST, a module path strtab, and 
> a function summary block. Also note that the VST only contains entries 
> for functions, and the record type used in the combined index is 
> changed to include the VBR-encoded bitcode offset of the corresponding 
> summary record in the summary block. This is to allow lazy reading of 
> summary records from the combined index file during importing. That 
> replaces the bitcode offset of the function summary block which is not 
> needed in the combined index (it is obtained from the importee 
> module’s VST when importing from that module).
>
>  *
>
>     Module paths encoded in new MODULE_STRTAB_BLOCK.
>
>      o
>
>         One record per module containing: unique module id assigned
>         during phase-2 link and module path string
>
>  *
>
>     Function summary info encoded in new FUNCTION_SUMMARY_BLOCK.
>
>      o
>
>         One record per function containing: VST value id, module id,
>         summary info for importing decisions
>
>  *
>
>     VST:
>
>      o
>
>         One record per function containing: value id, function summary
>         offset, function name string
>
>
> Note that a VST forward decl record is not needed in the combined 
> index, as the VST can be connected to the summary records later via 
> the value ids (eager parsing of summary) or via the summary record 
> offsets (lazy parsing of summary). When reading the summary eagerly, 
> we just need to build a temporary map from value id to summary structure.
>
>
> BitOffset
>  0     <MODULE_BLOCK>
>           <MODULE_STRTAB_BLOCK>
>              // MST_ENTRY: [modid, namechar x N]
>              <ENTRY 1, “A.o”/>
>              <ENTRY 2, “B.o”/>
>           </MODULE_STRTAB_BLOCK>
>           <FUNCTION_SUMMARY_BLOCK>
>              // FS_ENTRY: [valueid, modid, islocal, instcount]
> 500          <ENTRY 0, 2, 0, 100/>
> 550          <ENTRY 1, 2, 0, 20/>
> 600          <ENTRY 2, 1, 0, 10/>
> 650          <ENTRY 3, 1, 1, 15/>
>           </FUNCTION_SUMMARY_BLOCK>
>           <VALUE_SYMTAB>
>              // VST_FNENTRY: [valueid, funcsumoffset, namechar x N]
>              <FNENTRY 2, 600, “A1”/>
>              <FNENTRY 3, 650, “A2”/>
>              <FNENTRY 0, 500, “B2”/>
>              <FNENTRY 1, 550, “B2”/>
>              ...
>           </VALUE_SYMTAB
>        </MODULE_BLOCK>
>
>
> Note that the value ids are reassigned here to be unique as they are 
> no longer correlated with uses outside of the function summary 
> records. They are not strictly necessary for correlating VST entries 
> with function summary entries, but enable some sanity checking.
>
>
>
> On Thu, Aug 13, 2015 at 7:49 AM, Teresa Johnson <tejohnson at google.com 
> <mailto:tejohnson at google.com>> wrote:
>
>     Hi all,
>
>     I updated the patches to remove the native object wrapper format.
>     As suggested we will work on getting the ThinLTO framework in
>     place using bitcode first, and then work on adding the native
>     object support. As noted in this RFC and in the associated patch
>     D11722, for now I have empty ThinLTO blocks with no records, since
>     I wanted to get feedback on the overall block design first. The
>     RFC discusses this in more detail, but one of the main ideas is to
>     leverage the existing value symbol table block in the module to
>     avoid duplicating function symbol strings, e.g.
>
>     I also wanted to call out another important design consideration
>     here, since it is buried in the other RFC (ThinLTO File API and
>     Data Structures), and has a big influence on the way I have
>     designed the ThinLTO index and object file data structures. The
>     ThinLTO index is read in compile/link steps when the rest of the
>     Module IR is not, and vice versa. That is why I have separate data
>     structures for reading/holding the ThinLTO index. The ThinLTO
>     index in the module (generated during the initial -c compile step)
>     is needed by other modules during the later parallel backend
>     compile phase, and therefore it is only used in the linker plugin
>     step to create the combined index file. The rest of the Module IR
>     is not read during this step (eventually we may look at adding
>     heavier weight whole program analysis under an option, but by
>     default the Module, Functions, etc are not read or materialized).
>     When the normal Module IR is read during the parallel backend
>     compile step, the ThinLTO information in its own module is not
>     read, as the importing pass will read the combined (global) index
>     file instead. This is because a module is only interested in the
>     ThinLTO index from other modules that it is considering importing
>     from.
>
>     Right now I have 5 outstanding patches to put in the basic
>     infrastructure/options for reading/writing the ThinLTO function
>     indices:
>
>     D11721[ThinLTO] Data structures for holding ThinLTO function
>     index/summary <http://reviews.llvm.org/D11721>
>     D11722[ThinLTO] Bitcode reading/writing support for ThinLTO
>     function summary/index <http://reviews.llvm.org/D11722>
>     D11723[ThinLTO] ThinLTO object file interfaces
>     <http://reviews.llvm.org/D11723>
>     D11907LLVM support for -fthinlto option.
>     <http://reviews.llvm.org/D11907>
>     D11908Clang support for -fthinlto. <http://reviews.llvm.org/D11908>
>
>     Once the basic options support, data structs, and bitcode support
>     goes in I can send patches for generating/emitting the function
>     index and the combined function index (off by defaut, guarded by
>     the -fthinlto option), and subsequently send patches for the
>     function importing during the backend compile step. I've tried to
>     break down the above infrastructure into small pieces for review,
>     and plan to implement the rest via incremental patches.
>
>     Hope this clarifies the approach I'm taking! Looking forward to
>     additional feedback on the approach and the patches.
>
>     Thanks,
>     Teresa
>
>     On Wed, Aug 12, 2015 at 2:09 PM, Teresa Johnson
>     <tejohnson at google.com <mailto:tejohnson at google.com>> wrote:
>
>         Saw that, thanks! Responding now. Will update the patch with
>         some changes and the wrapper stuff removed later today or very
>         early tomorrow.
>         Teresa
>
>         On Wed, Aug 12, 2015 at 2:07 PM, Philip Reames
>         <listmail at philipreames.com <mailto:listmail at philipreames.com>>
>         wrote:
>
>             I went ahead and replied to two of the review threads.  I
>             only considered the parts which would be left without the
>             native wrapped bitcode support.
>
>             Philip
>
>
>             On 08/12/2015 01:24 PM, Teresa Johnson wrote:
>>             I can remove the native wrapper portions of the
>>             associated patch and add that later. Most of the support
>>             as I mentioned is for the bitcode handling anyway, but I
>>             wanted to include a skeleton of the native wrapper part.
>>             For the RFC, I wanted to show the end goal including how
>>             the native wrapper support would look (it in fact mostly
>>             leverages the same bitcode encoding, so there isn't a lot
>>             of difference, and hence there isn't a whole lot of extra
>>             code needed to support that). The bulk of the RFC deals
>>             with the bitcode format, and I would love some feedback
>>             on that.
>>
>>             Thanks,
>>             Teresa
>>
>>             On Wed, Aug 12, 2015 at 11:50 AM, Philip Reames
>>             <listmail at philipreames.com
>>             <mailto:listmail at philipreames.com>> wrote:
>>
>>                 Alex already made what I consider to be the most
>>                 relevant point.  I would suggest removing the
>>                 unwanted functionality and asking again. From my
>>                 perspective, native wrapped bitcode is only
>>                 interesting (and thus worth reviewing and/or talking
>>                 about) once the native bitcode version is in tree and
>>                 functional. Frankly, I consider the native wrapped
>>                 bitcode to be an entirely orthogonal proposal that
>>                 shouldn't be tied to ThinLTO at all.
>>
>>                 Fair warning, I'm not going to be particularly
>>                 involved either way.  This is far enough from my own
>>                 immediate interests that I can't spare the cycles.  I
>>                 would suggest that you collaborate closely with the
>>                 Sony and Apple folks who are already *using* LTO to
>>                 find a proposal they're happy with.  Until you do
>>                 that, you are unlikely to make much progress.
>>
>>                 Philip
>>
>>
>>                 On 08/12/2015 09:13 AM, Teresa Johnson wrote:
>>>                 Ping. Explicitly adding a few more people who
>>>                 commented on the earlier (high-level) ThinLTO RFC. I
>>>                 removed the body of the RFC here since the original
>>>                 was large and had trouble getting through the
>>>                 mailer. I also updated the patch mentioned below so
>>>                 that it was emailed to llvm-commits properly.
>>>
>>>                 Thanks,
>>>                 Teresa
>>>
>>>                 On Mon, Aug 3, 2015 at 11:59 AM, Teresa Johnson
>>>                 <tejohnson at google.com <mailto:tejohnson at google.com>>
>>>                 wrote:
>>>
>>>                     Hi Alex,
>>>
>>>                     After outlining some of the rationale for using
>>>                     native-wrapped, there were a couple of responses
>>>                     that indicated native-wrapped support was
>>>                     reasonable, but they preferred to see
>>>                     bitcode-only first (Phillip and Rafael). This is
>>>                     essentially what this proposal and the patches
>>>                     do - I've implemented some of the basic support
>>>                     for looking for and parsing the native-wrapped
>>>                     sections, but the bitcode-only reading/writing
>>>                     support is more complete.
>>>
>>>                     In fact, as described in this RFC, I designed
>>>                     the native-wrapped format to utilize the same
>>>                     bitcode encoding for most of the ThinLTO
>>>                     information, so it uses most of the same
>>>                     underlying bitcode interfaces anyway. The
>>>                     additional support required for native-wrapped
>>>                     is not tremendous, and is similar to existing
>>>                     support in the LLVM tree for reading
>>>                     native-wrapped bitcode.
>>>
>>>                     We believe that there will be clang/llvm users
>>>                     who will find native-wrapped ThinLTO easier to
>>>                     use for the same reasons (e.g. compatibility
>>>                     with existing native toolchains), so I don't
>>>                     expect this to be Google specific.
>>>
>>>                     Thanks,
>>>                     Teresa
>>>
>>>                     On Mon, Aug 3, 2015 at 12:26 PM, Alex Rosenberg
>>>                     <alexr at leftfield.org
>>>                     <mailto:alexr at leftfield.org>> wrote:
>>>
>>>                         I think I've read all the feedback posted
>>>                         regarding your May proposal. I have yet to
>>>                         see a single response that wants native
>>>                         object wrapped bitcode.
>>>
>>>                         If the only use for native object wrapped
>>>                         bitcode is for your project at Google, then
>>>                         it probably shouldn't go into the tree
>>>                         against all of these objections.
>>>
>>>                         Alex
>>>
>>>                         On Aug 3, 2015, at 9:19 AM, Teresa Johnson
>>>                         <tejohnson at google.com
>>>                         <mailto:tejohnson at google.com>> wrote:
>>>
>>>>                         As discussed in the high-level ThinLTO RFC
>>>>                         (http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-May/086211.html),
>>>>                         we would like to add support for native
>>>>                         object wrapped bitcode and ThinLTO
>>>>                         information. Based on comments on the
>>>>                         mailing list, I am adding support for
>>>>                         ThinLTO in both normal bitcode files, as
>>>>                         well as native-object wrapped bitcode.
>>>>
>>>>                         The following RFC describes the planned
>>>>                         file format of ThinLTO information both in
>>>>                         the bitcode-only and native object wrapped
>>>>                         cases. It doesn't yet define the exact
>>>>                         record format, as I would like feedback on
>>>>                         the overall block design first.
>>>>
>>>>                         I've also implemented the support for
>>>>                         reading and writing the bitcode blocks in
>>>>                         the following patch:
>>>>                         http://reviews.llvm.org/D11722
>>>>                         <https://urldefense.proofpoint.com/v2/url?u=http-3A__reviews.llvm.org_D11722&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=oUy_PB_mSfRgDO7H7bZOR04gv_DMzX5rPO_lv4PHt60&s=WVxrKkHnjKr75fCQ-UkGke8dk6KpZcFCnLWVrJ3G188&e=>
>>>>
>>>>                         The ThinLTO data structures and the file
>>>>                         APIs are described in a separate RFC I will
>>>>                         be sending simultaneously, with pointers to
>>>>                         the patches implementing them.
>>>>
>>>>                         Looking forward to your feedback. Thanks!
>>>>                         Teresa
>>>>
>>>>
>>>
>>>
>>>                 -- 
>>>                 Teresa Johnson | 	 Software Engineer |
>>>                 tejohnson at google.com
>>>                 <mailto:tejohnson at google.com> | 	408-460-2413
>>>                 <tel:408-460-2413>
>>>
>>
>>
>>
>>
>>             -- 
>>             Teresa Johnson | 	 Software Engineer |
>>             tejohnson at google.com <mailto:tejohnson at google.com> |
>>             408-460-2413 <tel:408-460-2413>
>>
>
>
>
>
>         -- 
>         Teresa Johnson | 	 Software Engineer | 	tejohnson at google.com
>         <mailto:tejohnson at google.com> | 	408-460-2413 <tel:408-460-2413>
>
>
>
>
>     -- 
>     Teresa Johnson | 	 Software Engineer | 	tejohnson at google.com
>     <mailto:tejohnson at google.com> | 	408-460-2413 <tel:408-460-2413>
>
>
>
>
> -- 
> Teresa Johnson | 	 Software Engineer | 	tejohnson at google.com 
> <mailto:tejohnson at google.com> | 	 408-460-2413
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150901/fe25621e/attachment-0001.html>