[Lldb-commits] [PATCH] D113789: Add the ability to cache information for a module between debugger instances.

Mon Nov 15 15:22:27 PST 2021

clayborg added a comment.

In D113789#3130741 <https://reviews.llvm.org/D113789#3130741>, @labath wrote:

> Maybe it's good for starting a discussion, but I am surprised that we would need to cache symtab information. This is a fairly simple format that needs to be read at application runtime as well, so I am surprised that reading it from the cache is substantially faster than doing it from the object file. What is the slow part there? Is it demangling by any chance?

Symbol tables for LLDB are made up by scouring every bit of data we can out of one or more symbol tables (ELF has normal symbol table and the dynamic symbol table, and optionally the symbol table in the zipped debug info). There are also many runtime sections like .eh_frame and others that can give us the start address of functions. If a binary is stripped, then we might not have a symbol for local functions, but many runtime sections have information on where functions start.

So while the symbol table seems simple, there is a ton of work that goes into creating these. Some things that are expensive:

- parse all symbol tables and unique the results
- reduce duplicate symbols as we parse one or more symbol tables (this happens a lot in mach-o files)
- keep a map of symbols that have been emitted so we can skip creating symbols from runtime information
- mach-o files have many symbols that describe the same thing: normal symbols and debug map symbols, so these symbols need to be coalesced into one symbol so we don't end up with multiple symbols with the same name and different info

I will post some performance results on loading a large Mac debug session as soon as my release build finishes. I was getting some really good performance improvements when I first started this patch as I tried this out before taking the time to try and finalize this patch.

> Now to the discussion. Overall, I think this is an interesting feature, but I do have some remarks/questions:
>
> - I don't think that  "lldb module cache" is very good name for this feature as we already have the "platform module cache" feature (see `Target/ModuleCache.h` and its callers), which deals with downloading (and caching) modules (object files) from remote systems. And then there's the clang module cache, of course.. (and the llvm modules, but we fortunately don't cache those). It would be better if we chose a slightly less overloaded name. Maybe "index cache" ? (I assume you will also want to store dwarf indexes there)

yes debug info is the next item I want to put into this cache. "index cache" sounds fine. I will rename once I get a few NFC patches submitted and rebase.

> - I don't see anything which would ensure cache consistency for the case where multiple debugger instances try to cache the same module simultaneously. What's the plan for that?

I was hoping to learn LLVM already had this kind of thing and using that! I see below there are some things already in LLVM I should be able to use.

> - Have you considered the building upon the caching infrastructure in llvm (`llvm/Caching.h`, `llvm/CachePruning.h`)? I believe it already has some size limiting features and multiprocess synchronization built-in, so using it would take care of the previous item, and of @wallace's size limit request.

I will check into that and see if I can use that! I figured it would have been done already, glad to see it has!

> - it's moderately strange to be using the gsym file writer for this purpose

When I made gsym I create the FileWriter as something that can easily create binary content without having to go with the ASM print/writer stuff in LLVM as that requires tons of stuff including LLVM targets and object file support.

What suggestions do people have on this front? Should I make a patch to move FileWriter out of GSYM? I don't know how if llvm::gym::FileWriter is useful enough for other folks in LLVM. Any thoughts? Should we just make our own FileWriter class that only does what we need for the caching? Is there any other tech in LLVM that does this kind of thing without having to use the full ARM printer/writer stuff?

> I haven't looked at the code in detail, but I noted that the filesystem changes, and some of the unrelated formatting cleanups could be put into separate patches.

I can break those out into separate patches.

So my plan is to:

- make a separate patch for the symbol table parsing code in Module/ObjectFile
- make a separate patch for FileSystem stuff
- figure out what to do with llvm::gsym::FileWriter (input would be appreciated here!)
- rebase this patch on all above patches and try using the llvm caching libraries

================
Comment at: lldb/include/lldb/Host/FileSystem.h:93
+  /// Set the access and modification time of the given file.
+  bool SetAccessAndModificationTime(const FileSpec &file_spec,
+                                    llvm::sys::TimePoint<> time);
----------------
wallace wrote:
> this should return an llvm::Error and the caller should decide whether to consume the error or not.
I can return a Status as like many other calls to be more consistent with the functions in this directory.

================
Comment at: lldb/source/Core/Module.cpp:1676
+  if (mtime > 0)
+    id_strm << mtime;
+  return llvm::djbHash(id_strm.str());
----------------
Sounds good.

================
Comment at: lldb/source/Core/Module.cpp:1680
+
+llvm::Optional<FileSpec> Module::GetCacheDirectory() {
+  using namespace llvm;
----------------
wallace wrote:
> this function is doing many things and reading is harder. Besides splitting it, we should log in the target stats if the cache was enabled or not, if the cache was used or not in a per module basis and any errors associated. This will help us fix issues without having to wait for users' reports.
> you should also print the error in a log channel
Yes, stats should log if the cache was enabled, but each module should also emit stats for anything that was loaded from the cache. If the cache is enabled and nothing is in the cache yet, then we should emit stats for anything that was loaded from the cache.

================
Comment at: lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp:2722
       m_symtab_up = std::make_unique<Symtab>(symtab->GetObjectFile());
+      if (m_symtab_up->LoadFromCache())
+        return m_symtab_up.get();
----------------
labath wrote:
> Can we make it so that we don't need to add these bits to every object file implementation? Maybe create a protected `CreateSymtab` function, which would get only called if the Symtab really needs to be created?
Yeah, I was thinking about this as well as I was doing it. I will make a patch that does this as a NFC patch first and then rebase this patch on that.

================
Comment at: lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp:2723
+      if (m_symtab_up->LoadFromCache())
+        return m_symtab_up.get();
       symbol_id += ParseSymbolTable(m_symtab_up.get(), symbol_id, symtab);
----------------
wallace wrote:
> log here in the target stats if the cache was effectively used or not
Yeah, each object file should remember what/if it loads anything from a cache and emit stats. This patch was already getting too big, and I was planning on doing a quick follow up patch.

================
Comment at: lldb/source/Symbol/Symbol.cpp:645
+  file.writeU16(m_type_data);
+  file.writeU16((&m_type_data)[1]);
+  m_mangled.Encode(file);
----------------
labath wrote:
> It would be (much) better to put the bitfields in an (anonymous) union that can be accessed both as a uint16_t and as individual bitfields.
Agreed. 

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D113789/new/

https://reviews.llvm.org/D113789