[Openmp-dev] Debug assert trigger in OpenMP + MPI

Joachim Protze via Openmp-dev openmp-dev at lists.llvm.org
Fri May 22 13:11:50 PDT 2020


I looked a bit into this issue, because I ran into the same issue with a 
blocked cholesky factorization code this week. Thanks for providing this 
reproducer!

I think, the bookkeeping of task queues is broken, so that under certain 
conditions, the tail/head marker is not updated correctly.
In addition to the assertion, I see regular stalling in the runtime.

I'm not convinced, that TCW_4 &Co have any effect in current builds. 
Therefore, I think, that the compiler might move the accesses to the 
head/tail counters out of the locked region?!?

For testing purposes, I added KMP_MB() after locking and before 
unlocking the lock function (kmp_tasking.diff). Or should this actually 
become a part of the locking function (kmp_lock.diff)?!?

I did no performance tests of those change, but the latter solution 
fixed my stalls as well as the spurious assertion violations.

Best
Joachim


Am 21.05.20 um 18:38 schrieb Johannes Doerfert via Openmp-dev:
> 
> On 5/21/20 11:12 AM, Raúl Peñacoba Veigas via Openmp-dev wrote:
>> Hello again,
>>
>> I've managed to remove MPI from the equation. It seems a race 
>> condition in the runtime.
>>
>> int main(int argc, char **argv)
>> {
>>         int TIMESTEPS = 10;
>>         int BLOCKS = 100;
>>
>>         int nranks = 4;
>>
>>         int DATA;
>>
>>         #pragma omp parallel
>>         #pragma omp single
>>         {
>>                 for (int t = 0; t < TIMESTEPS; ++t) {
>>                         for (int r = 0; r < nranks; ++r) {
>>                                 for (int b = 0; b < BLOCKS; ++b) {
>>                                         #pragma omp task depend(in: DATA)
>>                                         { }
>>                                 }
>>                         }
>>
>>                         #pragma omp task depend(inout: DATA)
>>                         {}
>>                 }
>>                 #pragma omp taskwait
>>         }
>> }
>>
>> To run it execute:
>>
>> clang -fopenmp t1.c -o t1
>>
>> for i in {1..5000}; do echo $i; OMP_NUM_THREADS=3 ./t1; done
>>
> Thanks for the reproducer! We might need to file a bug report for this one
> 
> but maybe someone will pick it up from here, let's wait a little while :)
> 
> 
> 
>> Regards,
>> Raúl
>>
>> El 21/5/20 a las 9:57, Raúl Peñacoba Veigas escribió:
>>> Hello everyone,
>>>
>>> Writing an OpenMP + MPI code I've triggered a debug assert in 
>>> __kmp_task_start:
>>>
>>> KMP_DEBUG_ASSERT(taskdata->td_flags.tasktype == TASK_EXPLICIT);
>>>
>>> I attach a simpler code that does not do anything special with 
>>> additional info.
>>>
>>> #include <mpi.h>
>>>
>>> #include <assert.h>
>>> #include <stdio.h>
>>> #include <stdlib.h>
>>> #include <string.h>
>>> #include <unistd.h>
>>>
>>> int main(int argc, char **argv)
>>> {
>>>         int TIMESTEPS = 10;
>>>         int BLOCKS = 100;
>>>
>>>         MPI_Init(&argc, &argv);
>>>
>>>         int rank, nranks;
>>>         MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>>>         MPI_Comm_size(MPI_COMM_WORLD, &nranks);
>>>
>>>         int DATA;
>>>
>>>         #pragma omp parallel
>>>         #pragma omp single
>>>         {
>>>                 for (int t = 0; t < TIMESTEPS; ++t) {
>>>                         for (int r = 0; r < nranks; ++r) {
>>>                                 for (int b = 0; b < BLOCKS; ++b) {
>>>                                         #pragma omp task depend(in: 
>>> DATA)
>>>                                         { }
>>>                                 }
>>>                         }
>>>
>>>                         #pragma omp task depend(inout: DATA)
>>>                         {}
>>>                 }
>>>                 #pragma omp taskwait
>>>         }
>>>
>>>         MPI_Finalize();
>>>
>>> }
>>>
>>> llvm_project debug build, commitaafdeeade8d
>>> MPICH Version: 3.3a2
>>> MPICH Release date: Sun Nov 13 09:12:11 MST 2016
>>>
>>> $ MPICH_CC=clang mpicc -fopenmp t1.c -o t1
>>> $ for i in {1..100}; do mpiexec.hydra -n 4 ./t1; done
>>>
>>>
>>
>> http://bsc.es/disclaimer
>> _______________________________________________
>> Openmp-dev mailing list
>> Openmp-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev
> _______________________________________________
> Openmp-dev mailing list
> Openmp-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev

-------------- next part --------------
A non-text attachment was scrubbed...
Name: kmp_tasking.diff
Type: text/x-patch
Size: 2812 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/openmp-dev/attachments/20200522/48155906/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: kmp_lock.diff
Type: text/x-patch
Size: 862 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/openmp-dev/attachments/20200522/48155906/attachment-0001.bin>


More information about the Openmp-dev mailing list