[Lldb-commits] [PATCH] D115324: Added the ability to cache the finalized symbol tables subsequent debug sessions to start faster.

Wed Dec 15 14:03:56 PST 2021

clayborg added inline comments.

================
Comment at: lldb/source/Core/Module.cpp:1679
+
+std::string Module::GetCacheKey() {
+  std::string key;
----------------
labath wrote:
> The cache key and cache signature algorithms are fairly similar but subtly different (IIUC, one is used to locate the cache entry on the disk and the second one to verify that it refers to the correct file). I think it'd be better to have the two functions next to each other, to highlight the differences. If you moved the signature construction function here (by changing it from a Module constructor to a CacheSignature getter function), then that would also the dependency problem I mentioned last time as I believe these constructors were the only non-utility dependency of that code (I don't think that code has to be moved back to Utility, but it *could* be moved, it need arises).
So the signature only contains an optional UUID, optional mod time, and optional object mod time, so they really are not related. The cache key must be the same for the same file on disk even if the file gets updated. So the cache key is designed to stay the same for the same file on disk, but also allow for the same file on disk to have more than one architecture (universal files) or for a file to contain multiple files like .a files that contain .o files. Since we only include the basename of the path for the module file, we also include the hash from Module::Hash() which _does_ include the full file path. That way we can have many "a.out" cache files from different directories and the main thing that will differ is the hash that is appended at the end. 

The DataCacheFile.cpp was relying on the ModuleList.cpp in order to read the "symbols.* preferences via "ModuleList::GetGlobalModuleListProperties()", so the ModuleList is actually what was the part that violated the dependency issue.

================
Comment at: lldb/source/Symbol/Symbol.cpp:649
+/// The only tricky thing in this encoding is encoding all of the bits in the
+/// bitfields. We use a trick to store all bitfields as a 16 bit value and we
+/// do the same thing when decoding the symbol. There are test that ensure this
----------------
labath wrote:
> I guess this isn't a "trick" anymore, just manual encoding of boolean bitfields into an int.
It is. I tried doing your approach of making an anonymous union and an anonymous struct but I got a bunch of warnings that these were GNU extensions and I feared that some buildbots would fail with warnings as errors. I also didn't know if this would mess up the windows buildbots in case they didn't support these. So to be safe I just manually encode stuff to be safe.

================
Comment at: lldb/source/Symbol/Symtab.cpp:118-147
+        if (!pair.second.IsEmpty()) {
+          std::multimap<uint32_t, ConstString> index_to_names;
+          s->Printf("\n0x%2.2x name to index table:\n", pair.first);
+          for (const auto &entry: pair.second)
+            index_to_names.insert(std::make_pair(entry.value, entry.cstring));
+          std::set<ConstString> names;
+          uint32_t last_idx = UINT32_MAX;
----------------
labath wrote:
> If I managed to understand what this is doing correctly, it could be more simply implemented as:
> ```
> if (pair.second.IsEmpty())
>   continue;
> s->Printf("\n0x%2.2x name to index table:\n", pair.first);
> std::map<uint32_t, std::vector<ConstString>> index_to_names;
> for (const auto &entry: pair.second)
>   index_to_names[entry.value].push_back(entry.cstring); // automatically creates a new map entry for the first string
> for (const auto &pair: index_to_names) {
>   s->Printf("0x%8.8x: ", pair.first);
>   for (const auto &name: pair.second) // vector always non-empty
>     s->Printf("\"%s\" ", name.AsCString());
>   s->EOL();
> }
> ```
Whoops this was just dumping code I added to verify that the name tables were being reproduced correctly. No one will want to see these. I will remove the code.

================
Comment at: lldb/test/API/functionalities/module_cache/universal/TestModuleCacheUniversal.py:21
+        self.cache_dir = os.path.join(self.getBuildDir(), 'lldb-module-cache')
+        print('build-dir: ' + self.getBuildDir())
+        # Set the lldb module cache directory to a directory inside the build
----------------
labath wrote:
> leftover from debugging?
yes! good catch

================
Comment at: lldb/test/API/functionalities/module_cache/universal/TestModuleCacheUniversal.py:57-58
+
+        self.assertEqual(len(cache_files), 2,
+                "make sure there are two files in the module cache directory (%s) for %s" % (self.cache_dir, exe_basename))
----------------
labath wrote:
> IIUC, the python tests check the caching behavior, while the (c++) unit tests check that the actual contents of the cache entries can be read/written correctly. I like that.
Exactly. I wanted to make sure that encoding could be done with unit testing as that is easier. I could also verify that every bit in the Symbol class was encoded and decoded properly since I was playing tricks before with the bitfields, but it did help issues when I did things manually as well.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D115324/new/

https://reviews.llvm.org/D115324