[PATCH] D114845: [llvm] [Debuginfod] DebuginfodCollection and DebuginfodServer for tracking local debuginfo.

Mon Mar 28 15:23:38 PDT 2022

noajshu added inline comments.

================
Comment at: llvm/lib/Debuginfod/Debuginfod.cpp:315-319
+    // Wait for the number of concurrent jobs to go down
+    while (NumTasksRemaining > Concurrency) {
+      LLVM_DEBUG(dbgs() << NumTasksRemaining << " tasks remaining\n";);
+      std::this_thread::sleep_for(std::chrono::milliseconds(50));
+    }
----------------
mysterymath wrote:
> noajshu wrote:
> > mysterymath wrote:
> > > Is there an advantage for manually managing the concurrency here, over passing it as an argument to ThreadPool, `std::min`-ed with the hardware concurrency?
> > > From a cursor look at ThreadPool's API, each async call after the thread pool is full should just more-or-less push a std::function<void()> onto a vector.
> > As this is within a directory iterator loop, my concern was for when there is a large number of files within that directory. If we add tasks to the ThreadPool faster than they are completed, the memory usage of that vector of `std::function<void()>`s becomes unbounded. So I thought it best to manage the progress through the loop more manually. What do you think?
> I'm unsure whether or not the buildup would ever cause problems in practice; then again, I just finished cleaning up an unbounded memory usage problem that broke in production. I'll defer that determination to others with more experience.
> 
> Assuming we keep the semantics here, it seems like what we'd really want is a version of ThreadPool that blocks submission of additional requests if the thread pool is full. This would provide feedback to stop the iterator from producing additional entries that cannot yet be handled (and would thus need to be stored). This seems like a useful abstraction in its own right, and there's definitely prior art.
> 
> I'd suggest either wrapping ThreadPool to provide such an API (simplified for the purposes of this file) or adding it as an option to ThreadPool. The first would be a slight modification to the code you have to abstract out the management of NumTasksRemaining. It'd also probably be more elegant to have the blocking async call (s) wait on a condition variable, so that the next task that finishes can signal that there is now a thread available, rather than busy-waiting with sleep().
Thanks for these suggestions! I agree on all points.

It's only a small change to ThreadPool to let us wait for room in the queue with a condition variable:
```
bool queueEmptyUnlocked() { return Tasks.empty(); }

void ThreadPool::waitQueue() {
  // Wait for the queue to be empty
  std::unique_lock<std::mutex> LockGuard(QueueLock);
  CompletionCondition.wait(LockGuard, [&] { return queueEmptyUnlocked(); });
}
```

I also collected some data to find out when the unbounded memory usage of the queue of jobs could actually be a problem in production. From my measurements, each job in the pool's queue consumes approximately N + 320 bytes of memory, where N is the number of bytes in the file path. For the right system setup, this could indeed consume lots of memory. Briefly, filesystem caching could allow millions of file paths to be traversed in seconds, but actually reading those files could take longer, causing a queue buildup.

However, if these files are ELF binaries they will end up in our `StringMap` in memory anyways with the current implementation. So the unbounded memory usage will be a problem regardless for this user, if their files are mostly ELF binaries.

Therefore, only the user who has millions of non-ELF files mixed in with their smaller number of ELF binaries could meaningfully benefit from us waiting here to submit more jobs to the queue. For example, a developer user like myself with limited local memory and millions of files that are not ELF binaries. When I plug in the numbers for my own filesystem, I could fit one job for each file in my development directory with about .4 GB of total memory usage. This is a comfortable margin on my own system but I'm unsure about other users. So if it seems reasonable I will just make the small tweaks to `ThreadPool` API to be safe.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114845/new/

https://reviews.llvm.org/D114845