[Openmp-dev] OpenMP threads slow to start on Windows

Henry Rich via Openmp-dev openmp-dev at lists.llvm.org
Wed Oct 7 08:31:54 PDT 2020


Executive summary: Windows 10 system.  I run parallel for on short 
workloads - say, 2-10us/thread.  My problem is that the delay before the 
worker threads start is dozens to hundreds of us quite often, even when 
I run the task with high priority.  It's as if the scheduler were 
nonpreemptive, but the docs say it is preemptive.  Can someone explain?  
It's more a Windows question than an OpenMP question.

Detail:

My application operates on pairs of long vectors - say it adds them 
together to produce a vector result. Its rules state that it must 
completely finish with one pair before it can be given another. I would 
like to use multiple threads to speed things up. I am running Windows 10.

I created an OpenMP parallel for construct and divided the vector among 
all the threads of the team. All threads start, all threads run pretty 
fast, so the multithreading is effective.

But the speedup is slight, and the reason is that some of the time, one 
of the worker threads takes way longer than usual. I have instrumented 
the operation, and I see that sometimes the worker threads take a long 
time to start - delay varies from 20 microseconds on average to dozens 
of milliseconds depending on system load. The master thread does not 
show this delay.

That makes me think that the scheduler is taking some time to start the 
worker threads. The master thread is already running, so it doesn't have 
to wait to be started.

But here is the nub of the question: raising the priority of the process 
doesn't make any difference. I can raise it to high priority or even 
realtime priority, and I still see that startup of the worker threads is 
often delayed. It looks like the Windows scheduler is not fully 
preemptive, and sometimes lets a lower-priority thread run when a 
higher-priority one is eligible. Can anyone confirm this?

I have verified that the worker threads are created with the default OS 
priority, namely the base priority of the class of the master process. 
This should be higher that the priority of any running thread, I think. 
Or is it normal for there to be some thread with realtime priority that 
might be blocking my workers? I don't see one with Task Manager.

I guess one last possibility is that the task switch might take 20-2000 
usec. Is that plausible?

I have a 4-core system without hyperthreading.

Henry Rich

-- 
This email has been checked for viruses by AVG.
https://www.avg.com



More information about the Openmp-dev mailing list