[PATCH] D31364: LTO: Reduce memory consumption by creating an in-memory symbol table for InputFiles. NFCI.

Thu Mar 30 15:02:02 PDT 2017

pcc added inline comments.

================
Comment at: llvm/include/llvm/Object/IRSymtab.h:123
+/// Fills in Symtab and Strtab with a valid symbol and string table for Mods.
+Error write(ArrayRef<Module *> Mods, SmallVector<char, 0> &Symtab,
+            SmallVector<char, 0> &Strtab);
----------------
tejohnson wrote:
> The name "write" here seems unexpected to me, since we aren't writing to disk e.g.. The client does a "write" which involves a Writer class, followed by a Reader, when together both are needed to essentially "read" the symbols from Modules. Maybe "buildSymbolTable" or something like that. The Writer is more like a Builder.
Renamed to "build" (likewise to "Builder").

================
Comment at: llvm/lib/LTO/LTO.cpp:506
+  auto MsymI = SymTab.symbols().begin(), MsymE = SymTab.symbols().end();
+  auto Skip = [&]() {
+    while (MsymI != MsymE) {
----------------
tejohnson wrote:
> Before this was embedded within symbol_iterator. What is the impact of moving it here - we only want to skip these symbols in the regular LTO case?
> 
> It's a little confusing to me that we have two symbol iterations going on in the below loop, one over the InputFile::Symbols and one over the ModuleSymbolTable - why do we need two data structures of symbols now?
> Before this was embedded within symbol_iterator. What is the impact of moving it here - we only want to skip these symbols in the regular LTO case?

We need to skip the same set of symbols in the module as we do in the InputFile's symbol table. The logic in Skip() duplicates the condition on line 361 of LTO.cpp where we only add global/non-format-specific symbols to the InputFile. It may be worth eliminating some of the duplication here, but I'd have to think about it more carefully.

> It's a little confusing to me that we have two symbol iterations going on in the below loop, one over the InputFile::Symbols and one over the ModuleSymbolTable - why do we need two data structures of symbols now?

This function needs some attributes from the symbol table as well as direct access to module symbols for the IRMover. Because symbol table symbols were previously implemented in terms of module symbols, the symbol table list also served the purpose of providing access to module symbols.

This change separated the symbol table from the module, so now we have three lists: resolution, symbol table symbols and module symbols. In principle the latter two should contain the same information, so strictly we only need resolution and module symbols, but sometimes it's more convenient to pull symbol attributes out of the symbol table.

I agree that this is a little confusing, so I've left a comment at the top of this block.

https://reviews.llvm.org/D31364