[Lldb-commits] [PATCH] D122975: parallelize module loading in DynamicLoaderPOSIXDYLD()

Tue Apr 5 03:21:01 PDT 2022

llunak added a comment.

In D122975#3428876 <https://reviews.llvm.org/D122975#3428876>, @labath wrote:

> OK, got it. So, for this case, I think the best approach would be to extract and paralelize the `PreloadSymbols` calls. They are not deep (=> relatively easy to extract), optional (they are just a performance optimization, so nothing will break if they're skipped), and completely isolated (they only access data from the single module).

> In fact we already have a sort of an existing place to do this kind of group module actions. The reason that the `ModulesDidLoad` (line 621) call does not happen inside `GetOrCreateModule` is because we want to send just one load event instead of spamming the user with potentially hundreds of messages. I don't think it would be unreasonable to move the `PreloadSymbols` call from to `ModulesDidLoad`.

I've meanwhile had a closer look at the relevant functions and I've come to a similar conclusion (minus the `ModulesDidLoad` part). I think `GetOrCreateModule` currently does call `ModulesDidLoad` for each module, but that should be easy to change (and `PreloadSymbols` comes from 7fca8c0757a5ee5f290844376c7f8c5f3c1ffcfe , which means the function should be fine being moved there). I'll have a go at this.

> Pending/suspended threads are less of a problem then threads actively contending for cpu time, but still less than ideal. On my machine I could end up with over 2k threads. At 8MB per thread, that's 16GB just for stack (most of it probably unused, but still...). And some machines have more (sometimes a lot more) CPUs than I do.
>
> Properly implementing thread pools is tricky, and I don't consider myself an expert, but I think that, instead of using semaphores, you could detect that the case when `wait()` is being called from inside a thread pool thread, and then, instead of passively waiting for the task to finish, start eagerly evaluating it on the same thread.
>
> I'm pretty sure I didn't come up with this idea, so it's possible that something like this is already implemented in llvm, and I got the idea from there.

Those 16GB would be address space, most of which wouldn't be memory. I'm not an expert on that, but that's why I didn't consider it to be a problem, e.g. sanitizers allocate way more address space. But ok, I can adjust ThreadPool to be reusable for different groups of tasks. I've thought of the processing `wait` idea too, ThreadPool currently can't do that, but I think it should be easy to add.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D122975/new/

https://reviews.llvm.org/D122975