[LLVMdev] RFC: ThinLTO Impementation Plan

Fri May 15 14:45:10 PDT 2015

> On 2015-May-15, at 13:06, Teresa Johnson <tejohnson at google.com> wrote:
> 
> On Fri, May 15, 2015 at 12:36 PM, Xinliang David Li
> <xinliangli at gmail.com> wrote:
>> 
>> 
>> On Fri, May 15, 2015 at 12:02 PM, Duncan P. N. Exon Smith
>> <dexonsmith at apple.com> wrote:
>>> 
>>> 
>>>> On 2015-May-15, at 07:30, Teresa Johnson <tejohnson at google.com> wrote:
>>>> 
>>>>>> a. Lazy Debug Metadata Linking:
>>>>>> 
>>>>>> The prototype implementation included lazy importing of module-level
>>>>>> metadata during the ThinLTO pass finalization (i.e. after all function
>>>>>> importing is complete). This actually applies to all module-level
>>>>>> metadata, not just debug, although it is the largest. This can be
>>>>>> added as a separate set of patches. Changes to BitcodeReader,
>>>>>> ValueMapper, ModuleLinker
>>>>> 
>>>>> It sounds like this would work well with the "full" LTO implemented
>>>>> by tools/gold-plugin right now.  What exactly did you do to improve
>>>>> this?
>>>> 
>>>> I don't think it will help with full LTO. The parsing of the metadata
>>>> is only delayed until the ThinLTO pass finalization, and the delayed
>>>> metadata import is necessary to avoid reading and linking in the
>>>> metadata multiple times (for each function imported from that module).
>>>> Coming out of the ThinLTO pass you still have all the metadata
>>>> necessary for each function that was imported. For a full LTO that
>>>> would end up being all of the metadata in the module.
>>>> 
>>>> The high level summary is that during the initial import it leaves the
>>>> temporary metadata on the instructions that were imported, but saves
>>>> the index used by the bitcode reader used to correlate with the
>>>> metadata when it is ready (i.e. the MDValuePtrs index), and skips the
>>>> metadata parsing. During finalization we parse just the metadata, and
>>>> suture it up during metadata value mapping using the saved index.
>>> 
>>> AFAICT, the gold-plugin currently does similar work.  (Rafael knows
>>> this code better (I've only introduced bugs there), but IIRC he's on
>>> vacation until next week.)  Even in "full" LTO.
>>> 
>>> Have a look at `getModuleForFile()` and its calling loop inside
>>> `allSymbolsReadHook()`:
>>> 
>>> 1. Load a single module, lazily.
>>> 2. Delete the bodies of unwanted functions (without ever loading them)
>>>    and fiddle with linkage as necessary.
>>> 
>>> 3. Link in the module.
>>> 4. Delete the module.
>>> 
>>> How is what you're proposing different?  Does it need to be different?
>> 
>> 
>> The difference is that for thinLTO, the lazy import happens in backend
>> compilation instead of running at linker plugin time. It is really lazy and
>> imports bare minimal depends on only references from imported functions.
>> 
>> Say module B has 1000 functions. Module A only imports 2 function foo1 and
>> foo2 from Module B transitively. The lazy reading will import necessary
>> module level data from B only to satisfy foo1 and foo2.
> 
> Right, so there are a couple of differences. As David mentions, we do
> this later and also in an iterative fashion. I.e. it may import
> function foo1 from module B, then in function foo1 see hot calls to
> functions bar1 in module C and bar2 in module D,

How do you know they're hot?  Are you selectively reading the profile
metadata?

> importing each of
> those in turn, then later decide to import function foo2 from module
> B. So the BitcodeReader is created and opens a module multiple times.
> Each time we import we create a BitcodeReader, read in what we need,
> create a Module (which contains exactly 1 function), and invoke
> linkInModule, delete the module.
> 
> This is in contrast to gold/LTO and the lazy bitcode reader which does
> instantiate the BitcodeReader once per module, and so it can defer
> parsing the metadata until the first function is materialized (at
> which point it parses and materializes all of the metadata). We don't
> want to parse/materialize the metadata until we have completed all
> imports from the module.
> 
> The alternative if we wanted to leverage the existing lazy metadata
> parsing support would be to keep the BitcodeReader instantiated for
> each module and not close it until ThinLTO importing finalization. So
> we would have a number of BitcodeReaders active at once (one per
> module we import from). I need to think more about whether this is
> feasible and investigate if this adds a lot of overhead.
> 
> The other thing that David alluded to above, that I should call out
> more explicitly in my RFC, is that we only link in the metadata
> transitively reached by the imported functions. This is handled during
> metadata value mapping.
> 
> I think I will update the RFC based on some of the discussion so far
> and will send out an updated version early next week.
> 
> Thanks,
> Teresa
> 
>> 
>> David
>> 
>>> 
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> 
>> 
> 
> 
> 
> -- 
> Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413