[Lldb-commits] [lldb] [lldb] Create dependent modules in parallel (PR #114507)

Thu Oct 31 21:50:38 PDT 2024

https://github.com/JDevlieghere created https://github.com/llvm/llvm-project/pull/114507

Create dependent modules in parallel in Target::SetExecutableModule. This change was inspired by #110646 which takes the same approach when attaching. Jason suggested we could use the same approach when you create a target in LLDB.

I used Slack for benchmarking, which loads 902 images.

```
Benchmark 1: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack
  Time (mean ± σ):      1.225 s ±  0.003 s    [User: 3.977 s, System: 1.521 s]
  Range (min … max):    1.220 s …  1.229 s    10 runs

Benchmark 2: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack
  Time (mean ± σ):      3.253 s ±  0.037 s    [User: 3.013 s, System: 0.248 s]
  Range (min … max):    3.211 s …  3.310 s    10 runs
```

We see about a 2x speedup, which matches what Jason saw for the attach scenario. I also ran this under TSan to confirm this doesn't introduce any races or deadlocks.

>From 9a269fa83cea529f704c9eba339eac9987032155 Mon Sep 17 00:00:00 2001
From: Jonas Devlieghere <jonas at devlieghere.com>
Date: Thu, 31 Oct 2024 21:42:38 -0700
Subject: [PATCH] [lldb] Create dependent modules in parallel
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Create dependent modules in parallel in Target::SetExecutableModule.
This change was inspired by #110646 which takes the same approach when
attaching. Jason suggested we could use the same approach when you
create a target in LLDB.

I used Slack for benchmarking, which loads 902 images.

  Benchmark 1: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack
    Time (mean ± σ):      1.225 s ±  0.003 s    [User: 3.977 s, System: 1.521 s]
    Range (min … max):    1.220 s …  1.229 s    10 runs

  Benchmark 2: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack
    Time (mean ± σ):      3.253 s ±  0.037 s    [User: 3.013 s, System: 0.248 s]
    Range (min … max):    3.211 s …  3.310 s    10 runs

We see about a 2x speedup, which matches what Jason saw for the attach
scenario. I also ran this under TSan to confirm this doesn't introduce
any races or deadlocks.
---
 lldb/source/Target/Target.cpp | 33 ++++++++++++++++++++++++++++-----
 1 file changed, 28 insertions(+), 5 deletions(-)

diff --git a/lldb/source/Target/Target.cpp b/lldb/source/Target/Target.cpp
index 199efae8a728cc..ef5d38fc796b08 100644
--- a/lldb/source/Target/Target.cpp
+++ b/lldb/source/Target/Target.cpp
@@ -68,6 +68,7 @@
 
 #include "llvm/ADT/ScopeExit.h"
 #include "llvm/ADT/SetVector.h"
+#include "llvm/Support/ThreadPool.h"
 
 #include <memory>
 #include <mutex>
@@ -1575,7 +1576,6 @@ void Target::SetExecutableModule(ModuleSP &executable_sp,
                m_arch.GetSpec().GetTriple().getTriple());
     }
 
-    FileSpecList dependent_files;
     ObjectFile *executable_objfile = executable_sp->GetObjectFile();
     bool load_dependents = true;
     switch (load_dependent_files) {
@@ -1591,10 +1591,14 @@ void Target::SetExecutableModule(ModuleSP &executable_sp,
     }
 
     if (executable_objfile && load_dependents) {
+      // FileSpecList is not thread safe and needs to be synchronized.
+      FileSpecList dependent_files;
+      std::mutex dependent_files_mutex;
+
+      // ModuleList is thread safe.
       ModuleList added_modules;
-      executable_objfile->GetDependentModules(dependent_files);
-      for (uint32_t i = 0; i < dependent_files.GetSize(); i++) {
-        FileSpec dependent_file_spec(dependent_files.GetFileSpecAtIndex(i));
+
+      auto GetDependentModules = [&](FileSpec dependent_file_spec) {
         FileSpec platform_dependent_file_spec;
         if (m_platform_sp)
           m_platform_sp->GetFileWithUUID(dependent_file_spec, nullptr,
@@ -1608,9 +1612,28 @@ void Target::SetExecutableModule(ModuleSP &executable_sp,
         if (image_module_sp) {
           added_modules.AppendIfNeeded(image_module_sp, false);
           ObjectFile *objfile = image_module_sp->GetObjectFile();
-          if (objfile)
+          if (objfile) {
+            std::lock_guard<std::mutex> guard(dependent_files_mutex);
             objfile->GetDependentModules(dependent_files);
+          }
+        }
+      };
+
+      executable_objfile->GetDependentModules(dependent_files);
+
+      llvm::ThreadPoolTaskGroup task_group(Debugger::GetThreadPool());
+      for (uint32_t i = 0; i < dependent_files.GetSize(); i++) {
+        // Process all currently known dependencies in parallel in the innermost
+        // loop. This may create newly discovered dependencies to be appended to
+        // dependent_files. We'll deal with these files during the next
+        // iteration of the outermost loop.
+        {
+          std::lock_guard<std::mutex> guard(dependent_files_mutex);
+          for (; i < dependent_files.GetSize(); i++)
+            task_group.async(GetDependentModules,
+                            dependent_files.GetFileSpecAtIndex(i));
         }
+        task_group.wait();
       }
       ModulesDidLoad(added_modules);
     }