[Openmp-dev] Task-based application aborting in kmp_alloc.cpp

Johannes Doerfert via Openmp-dev openmp-dev at lists.llvm.org
Sun Jan 24 16:29:26 PST 2021


Glad this is fixed. Thanks a lot for reporting and investigating!
Also many thanks to Joachim and Andrey :)

~ Johannes


On 1/24/21 2:14 AM, Joseph Schuchart via Openmp-dev wrote:
> For the record: With some help from Joachim I found the culprit in a 
> missing initialization of buckets in __kmp_dephash_extend. This 
> explains why this issue only occured with large problem sizes that 
> presumably led to higher numbers of dependencies, which in turn 
> triggered the hash table extension. A fix has been committed in 
> https://reviews.llvm.org/D95167 (thanks Andrey).
>
> Cheers
> Joseph
>
> On 1/5/21 11:45 AM, Joseph Schuchart via Openmp-dev wrote:
>> Small update: I tried to disable the use of fast memory by defining 
>> USE_FAST_MEMORY to 0 in kmp.h right before the test for whether it is 
>> defined. My reasoning is that without fast memory it might be easier 
>> to spot race conditions and/or memory corruption (memory free'd by 
>> one thread and accessed by another). Unfortunately, I now get a 
>> Segfault with the following stacktrace obtained from DDT:
>>
>> #9 main (argc=<optimized out>, argv=<optimized out>) at 
>> src/npb3.3/BT-MZ-CXX-omptasks/bt.cc:210 (at 0x00000000005067e1)
>> #8 __kmpc_fork_call (loc=0x56ba00, argc=15, microtask=0x50c880 
>> <.omp_outlined.(void) const>) at 
>> src/clang/llvm-project/openmp/runtime/src/kmp_csupport.cpp:307 (at 
>> 0x00001555546ae30d)
>> #7 __kmp_fork_call (loc=loc at entry=0x56ba00, gtid=gtid at entry=0, 
>> call_context=call_context at entry=fork_context_intel, 
>> argc=argc at entry=15, microtask=microtask at entry=0x50c880 
>> <.omp_outlined.(void) const>, invoker=<optimized out>, ap=<optimized 
>> out>) at 
>> src/clang/llvm-project/openmp/runtime/src/kmp_runtime.cpp:2240 (at 
>> 0x00001555546c9ca7)
>> #6 __kmp_invoke_task_func (gtid=0) at 
>> src/clang/llvm-project/openmp/runtime/src/kmp_runtime.cpp:7201 (at 
>> 0x00001555546c46d8)
>> #5 __kmp_invoke_microtask () at 
>> src/clang/llvm-project/openmp/runtime/src/z_Linux_asm.S:1166 (at 
>> 0x0000155554746f83)
>> #4 .omp_outlined.(void) const (.global_tid.=<optimized out>, 
>> .bound_tid.=<optimized out>, u=..., qbc_ou=..., qbc_in=..., nx=..., 
>> nxmax=..., ny=..., nz=..., rho_i=..., us=..., vs=..., ws=..., qs=..., 
>> square=..., rhs=..., forcing=...) at 
>> src/npb3.3/BT-MZ-CXX-omptasks/bt.cc:210 (at 0x000000000050c992)
>> #3 .omp_outlined._debug__ (.global_tid.=<optimized out>, 
>> .bound_tid.=<optimized out>, u=..., qbc_ou=..., qbc_in=..., nx=..., 
>> nxmax=..., ny=..., nz=..., rho_i=..., us=..., vs=..., ws=..., qs=..., 
>> square=..., rhs=..., forcing=...) at 
>> src/npb3.3/BT-MZ-CXX-omptasks/bt.cc:214 (at 0x000000000050c992)
>> #2 exch_qbc (u=..., qbc_ou=..., qbc_in=..., nx=..., nxmax=..., 
>> ny=..., nz=..., iprn_msg=<optimized out>) at 
>> src/npb3.3/BT-MZ-CXX-omptasks/exch_qbc.cc:76 (at 0x00000000005441c8)
>> #1 __kmpc_omp_task_with_deps (loc_ref=0x56fa80, gtid=0, 
>> new_task=0x1551b3708760, ndeps=1, dep_list=0x7fffffff8810, 
>> ndeps_noalias=0, noalias_dep_list=0x0) at 
>> src/clang/llvm-project/openmp/runtime/src/kmp_taskdeps.cpp:601 (at 
>> 0x000015555473ed1f)
>> #0 __kmp_init_node (node=0x1551b37065d0) at 
>> src/clang/llvm-project/openmp/runtime/src/kmp_taskdeps.cpp:601 (at 
>> 0x000015555473ed1f)
>>
>> Clang's ASAN reports:
>>
>> SUMMARY: AddressSanitizer: SEGV 
>> src/clang/llvm-project/openmp/runtime/src/kmp_taskdeps.cpp:37:23 in 
>> __kmp_init_node
>>
>> Interestingly, the address does not seem to be NULL (from looking at 
>> it in DDT). It seems that __kmp_thread_malloc is not providing valid 
>> memory? Is setting USE_FAST_MEMORY to 0 in kmp.h the right approach 
>> to disabling fast memory? I'm not at all familiar with this code base 
>> so I'm fishing in muddy waters here...
>>
>> Cheers
>> Joseph
>>
>> On 1/4/21 1:27 PM, Joseph Schuchart via Openmp-dev wrote:
>>> Dear devs,
>>>
>>> Happy new year to all of you!
>>>
>>> I am seeing the following assert triggering in the Clang OpenMP 
>>> runtime when running an application using OpenMP tasks and taskloops:
>>>
>>> Assertion failure at kmp_alloc.cpp(2012): size > 128 * 64.
>>> OMP: Error #13: Assertion failure at kmp_alloc.cpp(2012).
>>>
>>> backtrace from DDT:
>>>
>>> #19 __kmp_launch_worker (thr=0xa24500) at 
>>> src/clang/llvm-project/openmp/runtime/src/z_Linux_util.cpp:590 (at 
>>> 0x0000155554d589ec)
>>> #18 __kmp_launch_thread (this_thr=this_thr at entry=0xa24500) at 
>>> src/clang/llvm-project/openmp/runtime/src/kmp_runtime.cpp:5742 (at 
>>> 0x0000155554cf03e2)
>>> #17 __kmp_fork_barrier (gtid=gtid at entry=26, tid=tid at entry=-2) at 
>>> src/clang/llvm-project/openmp/runtime/src/kmp_barrier.cpp:1979 (at 
>>> 0x0000155554d25f0c)
>>> #16 __kmp_hyper_barrier_release (bt=bt at entry=bs_forkjoin_barrier, 
>>> this_thr=this_thr at entry=0xa24500, gtid=gtid at entry=26, 
>>> tid=tid at entry=-2, propagate_icvs=propagate_icvs at entry=1, 
>>> itt_sync_obj=itt_sync_obj at entry=0x0) at 
>>> src/clang/llvm-project/openmp/runtime/src/kmp_barrier.cpp:672 (at 
>>> 0x0000155554d1c15e)
>>> #15 kmp_flag_64<false, true>::wait (itt_sync_obj=0x0, final_spin=1, 
>>> this_thr=0xa24500, this=0x15514afdba10) at 
>>> src/clang/llvm-project/openmp/runtime/src/kmp_wait_release.h:899 (at 
>>> 0x0000155554d1c15e)
>>> #14 __kmp_wait_template<kmp_flag_64<>, true, false, true>(kmp_info_t 
>>> *, kmp_flag_64<false, true> *, void *) 
>>> (this_thr=this_thr at entry=0xa24500, flag=flag at entry=0x15514afdba10, 
>>> itt_sync_obj=itt_sync_obj at entry=0x0) at 
>>> src/clang/llvm-project/openmp/runtime/src/kmp_wait_release.h:345 (at 
>>> 0x0000155554d1bb29)
>>> #13 kmp_flag_64<false, true>::execute_tasks (is_constrained=0, 
>>> itt_sync_obj=0x0, thread_finished=0x15514afdb94c, final_spin=1, 
>>> gtid=26, this_thr=0xa24500, this=0x15514afdba10) at 
>>> src/clang/llvm-project/openmp/runtime/src/kmp_wait_release.h:892 (at 
>>> 0x0000155554d1bb29)
>>> #12 __kmp_execute_tasks_64<false, true> 
>>> (thread=thread at entry=0xa24500, gtid=gtid at entry=26, 
>>> flag=flag at entry=0x15514afdba10, final_spin=final_spin at entry=1, 
>>> thread_finished=thread_finished at entry=0x15514afdb94c, 
>>> itt_sync_obj=itt_sync_obj at entry=0x0, is_constrained=0) at 
>>> src/clang/llvm-project/openmp/runtime/src/kmp_tasking.cpp:3029 (at 
>>> 0x0000155554d176a9)
>>> #11 __kmp_execute_tasks_template<kmp_flag_64<> > (is_constrained=0, 
>>> itt_sync_obj=0x0, thread_finished=<optimized out>, 
>>> final_spin=<optimized out>, flag=0x15514afdba10, gtid=26, 
>>> thread=thread at entry=0xa24500) at 
>>> /sw/hawk-rh8/hlrs/non-spack/compiler/gcc/10.2.0/include/c++/10.2.0/bits/atomic_base.h:420 
>>> (at 0x0000155554d176a9)
>>> #10 __kmp_invoke_task (gtid=gtid at entry=26, task=0xada040, 
>>> current_task=current_task at entry=0x9b2280) at 
>>> src/clang/llvm-project/openmp/runtime/src/kmp_tasking.cpp:1589 (at 
>>> 0x0000155554d10da7)
>>> #9 __kmp_task_finish<false> (gtid=gtid at entry=26, 
>>> task=task at entry=0xada040, resumed_task=resumed_task at entry=0x9b2280) 
>>> at 
>>> /sw/hawk-rh8/hlrs/non-spack/compiler/gcc/10.2.0/include/c++/10.2.0/bits/atomic_base.h:556 
>>> (at 0x0000155554d108e7)
>>> #8 __kmp_release_deps (gtid=gtid at entry=26, task=task at entry=0xad9f00) 
>>> at src/clang/llvm-project/openmp/runtime/src/kmp_taskdeps.h:106 (at 
>>> 0x0000155554d0fa53)
>>> #7 __kmp_dephash_free (h=<optimized out>, thread=0xa24500) at 
>>> src/clang/llvm-project/openmp/runtime/src/kmp_taskdeps.h:80 (at 
>>> 0x0000155554d0fa53)
>>> #6 __kmp_dephash_free_entries (h=<optimized out>, thread=0xa24500) 
>>> at src/clang/llvm-project/openmp/runtime/src/kmp_taskdeps.h:62 (at 
>>> 0x0000155554d0fa53)
>>> #5 __kmp_depnode_list_free (list=0x15513affd580, thread=0xa24500) at 
>>> src/clang/llvm-project/openmp/runtime/src/kmp_taskdeps.h:47 (at 
>>> 0x0000155554d0fa53)
>>> #4 ___kmp_fast_free (this_thr=this_thr at entry=0xa24500, 
>>> ptr=ptr at entry=0x15513affd580, _file_=_file_ at entry=0x155554d8b710 
>>> "src/clang/llvm-project/openmp/runtime/src/kmp_taskdeps.h", 
>>> _line_=_line_ at entry=47) at 
>>> src/clang/llvm-project/openmp/runtime/src/kmp_alloc.cpp:2012 (at 
>>> 0x0000155554cc9098)
>>> #3 __kmp_debug_assert (msg=msg at entry=0x155554d7e46c "size > 128 * 
>>> 64", file=0x155554d7d087 "kmp_alloc.cpp", file at entry=0x155554d7d038 
>>> "src/clang/llvm-project/openmp/runtime/src/kmp_alloc.cpp", 
>>> line=line at entry=2012) at 
>>> src/clang/llvm-project/openmp/runtime/src/kmp_debug.cpp:74 (at 
>>> 0x0000155554ce2b55)
>>> #2 __kmp_fatal (message=...) at 
>>> src/clang/llvm-project/openmp/runtime/src/kmp_i18n.cpp:868 (at 
>>> 0x0000155554ce77e0)
>>> #1 __kmp_abort_process () at 
>>> src/clang/llvm-project/openmp/runtime/src/kmp_runtime.cpp:444 (at 
>>> 0x0000155554cf1abc)
>>> #0 abort () from /lib64/libc.so.6 (at 0x00001555546e2bce)
>>>
>>>
>>> Any idea what might cause this? I am having a hard time reproducing 
>>> this in a small example so I'd appreciate if someone could hint at 
>>> what might trigger this assertion. This error is reproducible in my 
>>> C++ port of the NPB BT-MZ benchmark using OpenMP tasks (both tied 
>>> and untied, no detached tasks or offloading), starting from class E 
>>> (class D runs fine but is too small for my experiments). I have 
>>> removed all stack-based array allocations to rule out stack 
>>> overflows inside tasks.
>>>
>>> This is llvmorg-12-init-15664-gba82c0b315 (pulled last night) 
>>> compiled with GCC 10.2.0 on an AMD EPYC system running 32 threads 
>>> per process (16 processes in total using MPI).
>>>
>>> I can compile and run the application with no issues with GCC 
>>> 10.2.0. I can reproduce the assert if imposing Clang's libgomp after 
>>> compiling the application with GCC. I used GCC's ASAN both with 
>>> GCC's and Clang's libgomp on the application and didn't see any 
>>> errors. Any advice on what else I could try?
>>>
>>> Many thanks in advance,
>>> Joseph
>> _______________________________________________
>> Openmp-dev mailing list
>> Openmp-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev
> _______________________________________________
> Openmp-dev mailing list
> Openmp-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev


More information about the Openmp-dev mailing list