[PATCH] D148916: [Support][Parallel] Initialize threadIndex and add assertion checking its usage.
Alexey Lapshin via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Apr 25 02:22:42 PDT 2023
avl added inline comments.
================
Comment at: llvm/lib/Support/Parallel.cpp:225
- auto NumItems = End - Begin;
- if (NumItems > 1 && parallel::strategy.ThreadsRequested != 1) {
// Limit the number of tasks to MaxTasksPerGroup to limit job scheduling
----------------
avl wrote:
> andrewng wrote:
> > The following is the commit for which the `NumItems <= 1` case was important: 5fa9d416 "[Support/Parallel] Add a special case for 0/1 items to llvm::parallel_for_each.". I don't know if it's still important but probably should be checked.
> > The following is the commit for which the `NumItems <= 1` case was important: 5fa9d416 "[Support/Parallel] Add a special case for 0/1 items to llvm::parallel_for_each.". I don't know if it's still important but probably should be checked.
>
> Thank you for the pointer!
>
> That problem is still important. I would suggest to use another workaround though.
>
> The original problem is connected with the fact that parallelFor() functions do not support nested parallelism.
> If nested TaskGroups would be allowed to be Parallel then deadlock could happen:
> when all threads are busy and are waiting for nested tasks the task queue does not move.
> The test case from https://github.com/llvm/circt/issues/993 looks like this:
>
>
> ```
> parallelFor (Files) {
> parallelFor(Passes) {
> }
> }
>
> ```
> A nested loop for Passes is not Parallel since nested parallelism is forbidden. But if "Files" have only one file,
> the workaround is to run root parallelFor sequentially and run nested parallelFor parallelly. It gives a performance improvement.
> That is why this workaround for NumItems is necessary.
>
> The drawbacks of the workaround:
>
> 1. It helps one concrete case.
> 2. It makes using getThreadIndex() dangerous if parallelFor() is called from different threads.
> parallelFor() may be run on calling threads and then the same threadIndex would be used concurrently.
>
> The following workaround could probably solve the problem better:
>
>
> ```
> std::function<void()> Fn = [&]() {
> parallelFor(Passes) {
> }
> };
>
> ThreadPool Pool;
>
> for (Files) {
> Pool.async(Fn);
>
> Pool.wait();
>
> ```
> It does not have a problem with using threadIndex. It also could give more parallelism. Let`s say we have two files.
> The current solution will use only two threads(as internal parallelFor is sequential), while the above solution would use all
> available threads.
>
> Another solution would be to implement support of nested parallelism for TaskGroups. But it looks to be much bigger task.
>
> Will put that info into https://github.com/llvm/circt/issues/993
@lattner Chris, Would you mind to comment on whether it would be OK to remove "NumItems > 1" optimization and use ThreadPool for nested parallelFor instead, please?
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D148916/new/
https://reviews.llvm.org/D148916
More information about the llvm-commits
mailing list