[PATCH] D11722: [ThinLTO] Bitcode reading/writing support for ThinLTO function summary/index

Thu Aug 13 10:22:34 PDT 2015

On Thu, Aug 13, 2015 at 9:42 AM, Duncan P. N. Exon Smith <
dexonsmith at apple.com> wrote:

>
> > On 2015-Aug-12, at 21:23, Teresa Johnson <tejohnson at google.com> wrote:
> >
> > tejohnson updated the summary for this revision.
> > tejohnson updated this revision to Diff 32027.
> > tejohnson added a comment.
> >
> > Removed native object wrapper support from this patch.
> >
> >
> > http://reviews.llvm.org/D11722
> >
> > Files:
> >  include/llvm/Bitcode/LLVMBitCodes.h
> >  include/llvm/Bitcode/ReaderWriter.h
> >  lib/Bitcode/Reader/BitcodeReader.cpp
> >  lib/Bitcode/Writer/BitcodeWriter.cpp
> >  tools/llvm-bcanalyzer/llvm-bcanalyzer.cpp
> >
> > <D11722.32027.patch>
>
> (Sorry I'm so late getting to this patch.)
>

Hi Duncan,
Thanks for the comments.

>
> I'm concerned about the direction.  IIUC, the purpose of this is faster
> lazy-loading.  Why do this in a side-channel?  Why not optimize the
> existing lazy-loader?  I imagine the LTO model implemented by the gold
> plugin would benefit from this, for example.  If we need an extra index
> to make lazy-loading "fast enough", it seems like the index should be
> available (as an opt-in for time/space tradeoff) for other consumers as
> well.  Alternatively, maybe there's something we can change about how
> the existing lazy-loader works (or the layout of bitcode) to be more
> amenable to the partial loading ThinLTO needs.
>

Only one part of the index (the function's bitcode offset) is used to do
the fast loading of the function from a given module. Much of the other
information in the ThinLTO sections is to identify/locate the module to
import from (via the module strtab index which is populated when creating
the combined (global) index), and information used in making importing
decisions from another module (like the function's size, hotness when there
is profile, etc, which we don't have since we haven't yet parsed that other
module).

The other issue with using the existing lazy loading support is that we may
import from a number of other modules in some interleaved fashion, so we
may open/import a function/close a module multiple times. My  understanding
is that the lazy loading is more for the case of loading a few different
functions in sequence.

So for example, if we have:

A.cc:

a() {
   b1();
   ...
   c1();
   ...
   b2();
   ...
   c2();
}

B.cc:

b1() { d1(); }
b2() { ... }

C.cc:

c1() { d2(); }
c2() { ... }

D.cc:

d1() { ... }
d2() { ... }

When compiling A.cc through the parallel backend's importing pass, we may
have the following sequence of events:

Consider importing b1  -> Decide to import b1  (exposes call to d1)
Consider importing d1  -> Decide to import d1
Consider importing c1  -> Decide to import c1  (exposes call to d2)
Consider importing d2  -> Decide to import d2
Consider importing b2  -> Decide *not* to import b2
Consider importing c2  -> Decide to import c2

For each of the inter-module calls considered in some priority based order,
we would look up the callee in the combined index, which provides some
information such as the callee's size, hotness, etc. A decision on whether
to import is made based on that information (without doing any parsing of
the callee's module). If we decide to import, then the combined index entry
for the callee provides the callee module path and the bitcode index, which
is used to open the module and import just that function, then the callee's
module is closed. Some of the lazy loading support could potentially be
used in place of the bitcode index, although I am not convinced it is
suited for that purpose. In any case, that is just a small part of what is
contained in the function summary/index as described earlier.

On the flip side, the bitcode index could be used by the lazy loader, in
place of building the index on the fly. I'm not sure how much benefit there
is to lazy loading, which eventually needs to read the entire module once a
function is materialized.

> Not included in this patch is support for lazy-loading of metadata,
> something you suggested you had support for in your prototype (and I
> assume ThinLTO relies on it).  I think it's particularly important that
> whatever you do there isn't ThinLTO-specific.
>
>
Right, that is needed when we do the importing. This initial patch is just
to write/read the function summary/index. I had briefly outlined the way I
was hoping to stage patches earlier today (
http://lists.llvm.org/pipermail/llvm-dev/2015-August/089222.html). I was
planning to do the importing as a separate set of patches once I have the
infrastructure for generating/reading the function indexes as that is
needed first to drive the importing. Let me know if the staging of work I
describe there makes sense.

To facilitate the process of reading/materializing a single function each
time a module is imported from, as described above, I we be parsing the
module level metadata once as a post-pass. Otherwise supporting the
interleaved importing from multiple modules is probably very difficult.
This is again different than the existing lazy loading, which is
parsing/materializing all necessary functions and metadata out of each
module in a single pass.

Let me know if this seems reasonable given the above example and
description.

Thanks,
Teresa

-- 
Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150813/090490ea/attachment.html>