[Lldb-commits] [PATCH] D115324: Added the ability to cache the finalized symbol tables subsequent debug sessions to start faster.

Pavel Labath via Phabricator via lldb-commits lldb-commits at lists.llvm.org
Wed Dec 15 07:58:17 PST 2021


labath added a comment.

Thanks for the quick response.

I did another pass over the patch, and I think I have a fairly good understanding of how it works. I have another round of comments, but nothing major.



================
Comment at: lldb/include/lldb/Core/DataFileCache.h:9
+
+#ifndef LLDB_UTILITY_DATAFILECACHE_H
+#define LLDB_UTILITY_DATAFILECACHE_H
----------------



================
Comment at: lldb/include/lldb/Core/DataFileCache.h:129-132
+    if (m_uuid.hasValue() != rhs.m_uuid.hasValue())
+      return true;
+    if (m_uuid.hasValue() && *m_uuid != *rhs.m_uuid)
+      return true;
----------------
the Optional class already has comparison operators that do the right thing.


================
Comment at: lldb/include/lldb/Core/Mangled.h:72-73
+  bool operator!=(const Mangled &rhs) const {
+    return m_mangled != rhs.m_mangled ||
+           GetDemangledName() != rhs.GetDemangledName();
+  }
----------------



================
Comment at: lldb/source/Core/Module.cpp:1679
+
+std::string Module::GetCacheKey() {
+  std::string key;
----------------
The cache key and cache signature algorithms are fairly similar but subtly different (IIUC, one is used to locate the cache entry on the disk and the second one to verify that it refers to the correct file). I think it'd be better to have the two functions next to each other, to highlight the differences. If you moved the signature construction function here (by changing it from a Module constructor to a CacheSignature getter function), then that would also the dependency problem I mentioned last time as I believe these constructors were the only non-utility dependency of that code (I don't think that code has to be moved back to Utility, but it *could* be moved, it need arises).


================
Comment at: lldb/source/Symbol/Symbol.cpp:649
+/// The only tricky thing in this encoding is encoding all of the bits in the
+/// bitfields. We use a trick to store all bitfields as a 16 bit value and we
+/// do the same thing when decoding the symbol. There are test that ensure this
----------------
I guess this isn't a "trick" anymore, just manual encoding of boolean bitfields into an int.


================
Comment at: lldb/source/Symbol/Symbol.cpp:645
+  file.AppendU16(m_type_data);
+  file.AppendU16((&m_type_data)[1]);
+  m_mangled.Encode(file, strtab);
----------------
clayborg wrote:
> labath wrote:
> > This is quite dodgy. It would be better to change the storage so that we don't have to do this. Perhaps something like
> > ```
> > union {
> >   uint16_t flag_storage;
> >   struct {
> >     uint16_t m_type_data_resolved : 1;
> >     ...
> >   };
> > };
> > ```
> I encoded them manually now. It was too disruptive to the code to try and add an anonymous union with an anonymous struct as there were many compiler warnings about it being a GNU extension and made the code way too messy, especially in the constructor as you can't init each value easily.
That works too.


================
Comment at: lldb/source/Symbol/Symtab.cpp:118-147
+        if (!pair.second.IsEmpty()) {
+          std::multimap<uint32_t, ConstString> index_to_names;
+          s->Printf("\n0x%2.2x name to index table:\n", pair.first);
+          for (const auto &entry: pair.second)
+            index_to_names.insert(std::make_pair(entry.value, entry.cstring));
+          std::set<ConstString> names;
+          uint32_t last_idx = UINT32_MAX;
----------------
If I managed to understand what this is doing correctly, it could be more simply implemented as:
```
if (pair.second.IsEmpty())
  continue;
s->Printf("\n0x%2.2x name to index table:\n", pair.first);
std::map<uint32_t, std::vector<ConstString>> index_to_names;
for (const auto &entry: pair.second)
  index_to_names[entry.value].push_back(entry.cstring); // automatically creates a new map entry for the first string
for (const auto &pair: index_to_names) {
  s->Printf("0x%8.8x: ", pair.first);
  for (const auto &name: pair.second) // vector always non-empty
    s->Printf("\"%s\" ", name.AsCString());
  s->EOL();
}
```


================
Comment at: lldb/source/Symbol/Symtab.cpp:303
     m_name_indexes_computed = true;
+    // During unit testing we don't have a object file.
     ElapsedTime elapsed(m_objfile->GetModule()->GetSymtabIndexTime());
----------------
I guess this is no longer relevant.


================
Comment at: lldb/source/Symbol/Symtab.cpp:1214
+  if (Encode(file)) {
+    auto key = GetCacheKey();
+    cache->SetCachedData(key, file.GetData());
----------------
this is not a good use of auto per the llvm coding standards


================
Comment at: lldb/source/Symbol/Symtab.cpp:1221
+
+void EncodeCStrMap(DataEncoder &encoder, ConstStringTable &strtab,
+                   const UniqueCStringMap<uint32_t> &cstr_map) {
----------------
add `static` (for the next function as well)


================
Comment at: lldb/test/API/functionalities/module_cache/universal/TestModuleCacheUniversal.py:21
+        self.cache_dir = os.path.join(self.getBuildDir(), 'lldb-module-cache')
+        print('build-dir: ' + self.getBuildDir())
+        # Set the lldb module cache directory to a directory inside the build
----------------
leftover from debugging?


================
Comment at: lldb/test/API/functionalities/module_cache/universal/TestModuleCacheUniversal.py:57-58
+
+        self.assertEqual(len(cache_files), 2,
+                "make sure there are two files in the module cache directory (%s) for %s" % (self.cache_dir, exe_basename))
----------------
IIUC, the python tests check the caching behavior, while the (c++) unit tests check that the actual contents of the cache entries can be read/written correctly. I like that.


================
Comment at: lldb/unittests/Symbol/SymbolTest.cpp:185
+
+TEST(SymbolTest, EncodeDecodeMangled) {
+  Mangled mangled;
----------------
Our c++ test layout mirrors the layout of the code it is testing, so this test should go to MangledTest.cpp and the next one to SymtabTest.cpp. (Although it might be nicer if all the (de)serialization code lived in a single central place, and then the tests for it would live in a single place as well.)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D115324/new/

https://reviews.llvm.org/D115324



More information about the lldb-commits mailing list