[Openmp-dev] OpenMP threads slow to start on Windows
Henry Rich via Openmp-dev
openmp-dev at lists.llvm.org
Wed Oct 7 08:31:54 PDT 2020
Executive summary: Windows 10 system. I run parallel for on short
workloads - say, 2-10us/thread. My problem is that the delay before the
worker threads start is dozens to hundreds of us quite often, even when
I run the task with high priority. It's as if the scheduler were
nonpreemptive, but the docs say it is preemptive. Can someone explain?
It's more a Windows question than an OpenMP question.
Detail:
My application operates on pairs of long vectors - say it adds them
together to produce a vector result. Its rules state that it must
completely finish with one pair before it can be given another. I would
like to use multiple threads to speed things up. I am running Windows 10.
I created an OpenMP parallel for construct and divided the vector among
all the threads of the team. All threads start, all threads run pretty
fast, so the multithreading is effective.
But the speedup is slight, and the reason is that some of the time, one
of the worker threads takes way longer than usual. I have instrumented
the operation, and I see that sometimes the worker threads take a long
time to start - delay varies from 20 microseconds on average to dozens
of milliseconds depending on system load. The master thread does not
show this delay.
That makes me think that the scheduler is taking some time to start the
worker threads. The master thread is already running, so it doesn't have
to wait to be started.
But here is the nub of the question: raising the priority of the process
doesn't make any difference. I can raise it to high priority or even
realtime priority, and I still see that startup of the worker threads is
often delayed. It looks like the Windows scheduler is not fully
preemptive, and sometimes lets a lower-priority thread run when a
higher-priority one is eligible. Can anyone confirm this?
I have verified that the worker threads are created with the default OS
priority, namely the base priority of the class of the master process.
This should be higher that the priority of any running thread, I think.
Or is it normal for there to be some thread with realtime priority that
might be blocking my workers? I don't see one with Task Manager.
I guess one last possibility is that the task switch might take 20-2000
usec. Is that plausible?
I have a 4-core system without hyperthreading.
Henry Rich
--
This email has been checked for viruses by AVG.
https://www.avg.com
More information about the Openmp-dev
mailing list