[PATCH] D114845: [llvm] [Debuginfod] DebuginfodCollection and DebuginfodServer for tracking local debuginfo.

Thu Mar 3 12:20:49 PST 2022

mysterymath added inline comments.

================
Comment at: llvm/lib/Debuginfod/Debuginfod.cpp:315-319
+    // Wait for the number of concurrent jobs to go down
+    while (NumTasksRemaining > Concurrency) {
+      LLVM_DEBUG(dbgs() << NumTasksRemaining << " tasks remaining\n";);
+      std::this_thread::sleep_for(std::chrono::milliseconds(50));
+    }
----------------
noajshu wrote:
> mysterymath wrote:
> > Is there an advantage for manually managing the concurrency here, over passing it as an argument to ThreadPool, `std::min`-ed with the hardware concurrency?
> > From a cursor look at ThreadPool's API, each async call after the thread pool is full should just more-or-less push a std::function<void()> onto a vector.
> As this is within a directory iterator loop, my concern was for when there is a large number of files within that directory. If we add tasks to the ThreadPool faster than they are completed, the memory usage of that vector of `std::function<void()>`s becomes unbounded. So I thought it best to manage the progress through the loop more manually. What do you think?
I'm unsure whether or not the buildup would ever cause problems in practice; then again, I just finished cleaning up an unbounded memory usage problem that broke in production. I'll defer that determination to others with more experience.

Assuming we keep the semantics here, it seems like what we'd really want is a version of ThreadPool that blocks submission of additional requests if the thread pool is full. This would provide feedback to stop the iterator from producing additional entries that cannot yet be handled (and would thus need to be stored). This seems like a useful abstraction in its own right, and there's definitely prior art.

I'd suggest either wrapping ThreadPool to provide such an API (simplified for the purposes of this file) or adding it as an option to ThreadPool. The first would be a slight modification to the code you have to abstract out the management of NumTasksRemaining. It'd also probably be more elegant to have the blocking async call (s) wait on a condition variable, so that the next task that finishes can signal that there is now a thread available, rather than busy-waiting with sleep().

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114845/new/

https://reviews.llvm.org/D114845