[PATCH] D148916: [Support][Parallel] Initialize threadIndex and add assertion checking its usage.

Alexey Lapshin via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Sat Apr 22 02:50:35 PDT 2023


avl added inline comments.


================
Comment at: llvm/lib/Support/Parallel.cpp:225
-  auto NumItems = End - Begin;
-  if (NumItems > 1 && parallel::strategy.ThreadsRequested != 1) {
     // Limit the number of tasks to MaxTasksPerGroup to limit job scheduling
----------------
andrewng wrote:
> The following is the commit for which the `NumItems <= 1` case was important: 5fa9d416 "[Support/Parallel] Add a special case for 0/1 items to llvm::parallel_for_each.". I don't know if it's still important but probably should be checked.
> The following is the commit for which the `NumItems <= 1` case was important: 5fa9d416 "[Support/Parallel] Add a special case for 0/1 items to llvm::parallel_for_each.". I don't know if it's still important but probably should be checked.

Thank you for the pointer!

That problem is still important. I would suggest to use another workaround though.

The original problem is connected with the fact that parallelFor() functions do not support nested parallelism.
If nested TaskGroups would be allowed to be Parallel then deadlock could happen:
when all threads are busy and are waiting for nested tasks the task queue does not move.
The test case from https://github.com/llvm/circt/issues/993 looks like this:


```
parallelFor (Files) {
   parallelFor(Passes) {
   }
}

```
A nested loop for Passes is not Parallel since nested parallelism is forbidden. But if "Files" have only one file,
the workaround is to run root parallelFor sequentially and run nested parallelFor parallelly. It gives a performance improvement.
That is why this workaround for NumItems is necessary.

The drawbacks of the workaround: 

1. It helps one concrete case.
2. It makes using getThreadIndex() dangerous if parallelFor() is called from different threads.
   parallelFor() may be run on calling threads and then the same threadIndex would be used concurrently.

The following workaround could probably solve the problem better:


```
    std::function<void()> Fn = [&]() {
      parallelFor(Passes) {
      }
    };

    ThreadPool Pool;

    for (Files) {
      Pool.async(Fn);

    Pool.wait();

```
It does not have a problem with using threadIndex. It also could give more parallelism. Let`s say we have two files.
The current solution will use only two threads(as internal parallelFor is sequential), while the above solution would use all
available threads.

Another solution would be to implement support of nested parallelism for TaskGroups. But it looks to be much bigger task.

Will put that info into https://github.com/llvm/circt/issues/993


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D148916/new/

https://reviews.llvm.org/D148916



More information about the llvm-commits mailing list