[PATCH] D148916: [Support][Parallel] Initialize threadIndex and add assertion checking its usage.

Alexey Lapshin via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Apr 25 02:22:42 PDT 2023


avl added inline comments.


================
Comment at: llvm/lib/Support/Parallel.cpp:225
-  auto NumItems = End - Begin;
-  if (NumItems > 1 && parallel::strategy.ThreadsRequested != 1) {
     // Limit the number of tasks to MaxTasksPerGroup to limit job scheduling
----------------
avl wrote:
> andrewng wrote:
> > The following is the commit for which the `NumItems <= 1` case was important: 5fa9d416 "[Support/Parallel] Add a special case for 0/1 items to llvm::parallel_for_each.". I don't know if it's still important but probably should be checked.
> > The following is the commit for which the `NumItems <= 1` case was important: 5fa9d416 "[Support/Parallel] Add a special case for 0/1 items to llvm::parallel_for_each.". I don't know if it's still important but probably should be checked.
> 
> Thank you for the pointer!
> 
> That problem is still important. I would suggest to use another workaround though.
> 
> The original problem is connected with the fact that parallelFor() functions do not support nested parallelism.
> If nested TaskGroups would be allowed to be Parallel then deadlock could happen:
> when all threads are busy and are waiting for nested tasks the task queue does not move.
> The test case from https://github.com/llvm/circt/issues/993 looks like this:
> 
> 
> ```
> parallelFor (Files) {
>    parallelFor(Passes) {
>    }
> }
> 
> ```
> A nested loop for Passes is not Parallel since nested parallelism is forbidden. But if "Files" have only one file,
> the workaround is to run root parallelFor sequentially and run nested parallelFor parallelly. It gives a performance improvement.
> That is why this workaround for NumItems is necessary.
> 
> The drawbacks of the workaround: 
> 
> 1. It helps one concrete case.
> 2. It makes using getThreadIndex() dangerous if parallelFor() is called from different threads.
>    parallelFor() may be run on calling threads and then the same threadIndex would be used concurrently.
> 
> The following workaround could probably solve the problem better:
> 
> 
> ```
>     std::function<void()> Fn = [&]() {
>       parallelFor(Passes) {
>       }
>     };
> 
>     ThreadPool Pool;
> 
>     for (Files) {
>       Pool.async(Fn);
> 
>     Pool.wait();
> 
> ```
> It does not have a problem with using threadIndex. It also could give more parallelism. Let`s say we have two files.
> The current solution will use only two threads(as internal parallelFor is sequential), while the above solution would use all
> available threads.
> 
> Another solution would be to implement support of nested parallelism for TaskGroups. But it looks to be much bigger task.
> 
> Will put that info into https://github.com/llvm/circt/issues/993
@lattner Chris, Would you mind to comment on whether it would be OK to remove "NumItems > 1" optimization and use ThreadPool for nested parallelFor instead, please?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D148916/new/

https://reviews.llvm.org/D148916



More information about the llvm-commits mailing list