[Openmp-dev] Enabling OMPT by default

Jonas Hahnfeld via Openmp-dev openmp-dev at lists.llvm.org
Fri Dec 22 05:20:36 PST 2017


I'd say this looks more like the runtime can't start threads because the 
system is loaded (3000 x 4 threads)...

Am 2017-12-22 12:11, schrieb Paul Osmialowski:
> On 21/12/2017 14:17, Joachim Protze wrote:
>> Hi Paul,
>> 
>> Typically oversubscription makes failures more probably, so I test 
>> with something like (assuming to be in a BUILD directory in the top 
>> openmp directory):
>> 
>> for i in $(seq 20)
>> do
>>    (env LD_PRELOAD=runtime/src/libomp.so \
>>     runtime/test/ompt/tasks/Output/explicit_task.c.tmp |
>>     sort -n --stable | tee explicit_task.c.log.$i |
>>     FileCheck ../runtime/test/ompt/tasks/explicit_task.c ||
>>     echo "$i failed")&
>> done
>> 
>> This starts 20 parallel executions, writes a copy of the output to 
>> explicit_task.c.log.$i and prints you the number of the failed test. 
>> This should give you the output for a failed test case after some 
>> iterations.
> 
> I've put it into runtest.sh and started. Normally, it didn't occur at
> all, but I finally managed to catch something when I increased seq to
> 3000 (this time, it's line 56, previously it was 94 initially and
> later 76):
> 
> $ ./runtest.sh
> ./runtest.sh: fork: retry: No child processes
> ./runtest.sh: fork: retry: No child processes
> ./runtest.sh: fork: retry: No child processes
> ./runtest.sh: fork: retry: Resource temporarily unavailable
> ./runtest.sh: fork: retry: Resource temporarily unavailable
> ./runtest.sh: fork: retry: No child processes
> ./runtest.sh: fork: retry: Resource temporarily unavailable
> ./runtest.sh: fork: retry: No child processes
> ./runtest.sh: fork: retry: No child processes
> ./runtest.sh: fork: retry: No child processes
> ./runtest.sh: fork: retry: Resource temporarily unavailable
> OMP: Error #34: System unable to allocate necessary resources for OMP 
> thread:
> OMP: System error #11: Resource temporarily unavailable
> OMP: Hint: Try decreasing the value of OMP_NUM_THREADS.
> ../runtime/test/ompt/tasks/explicit_task.c:56:12: error: expected
> string not found in input
>  // CHECK: {{^}}[[MASTER_ID]]: ompt_event_implicit_task_begin:
> parallel_id=[[PARALLEL_ID]], task_id=[[IMPLICIT_TASK_ID:[0-9]+]]
>            ^
> <stdin>:6:1: note: scanning from here
> 
> ^
> <stdin>:6:1: note: with variable "MASTER_ID" equal to "281474976710657"
> 
> ^
> <stdin>:6:1: note: with variable "PARALLEL_ID" equal to 
> "281474976710660"
> 
> ^
> 1695 failed
> 
> $ cat explicit_task.c.log.1695
> 0: NULL_POINTER=(nil)
> 281474976710657: ompt_event_thread_begin:
> thread_type=ompt_thread_initial=1, thread_id=281474976710657
> 281474976710657: ompt_event_task_create: parent_task_id=0,
> parent_task_frame.exit=(nil), parent_task_frame.reenter=(nil),
> new_task_id=281474976710658, codeptr_ra=(nil),
> task_type=ompt_task_initial=1, has_dependences=no
> 281474976710657: __builtin_frame_address(0)=0xfffff2490930
> 281474976710657: ompt_event_parallel_begin:
> parent_task_id=281474976710658, parent_task_frame.exit=(nil),
> parent_task_frame.reenter=0xfffff2490930, parallel_id=281474976710660,
> requested_team_size=2, codeptr_ra=0x402fac, invoker=2
> $
> 
> 
> 
>> I typically LD_PRELOAD the libomp from the BUILD directory to make 
>> sure, that I use the right library.
>> 
> 
> Yes, I don't do system-wide installation of libomp.so, so those test
> cases won't even run without setting LD_PRELOAD or LD_LIBRARY_PATH.
> 
>> Best
>> Joachim
>> 
>> On 12/21/2017 01:14 PM, Paul Osmialowski wrote:
>>> 
>>> 
>>> On 21/12/2017 10:51, Jonas Hahnfeld wrote:
>>>> Am 2017-12-21 11:22, schrieb Paul Osmialowski:
>>>>> replies inlined below:
>>>>> 
>>>>> On 20/12/2017 18:34, Jonas Hahnfeld wrote:
>>>>>> Am 2017-12-20 14:22, schrieb Paul Osmialowski:
>>>>>>> Yeah, you're right again, with the following change:
>>>>>>> 
>>>>>>> +#define print_possible_return_addresses(addr) \
>>>>>>> +  printf("%" PRIu64 ": current_address=%p or %p\n",
>>>>>>> ompt_get_thread_data()->value, \
>>>>>>> +         ((char *)addr) - 4, ((char *)addr) - 8)
>>>>>> 
>>>>>> Cool, can you put up a patch for this?
>>>>>> 
>>>>> 
>>>>> Done, https://reviews.llvm.org/D41482
>>>>> 
>>>>> 
>>>>>>> ...I can see only ompt/tasks/explicit_task.c failing from time to
>>>>>>> time, but it seems to be unrelated to printed address issue:
>>>>>>> 
>>>>>>> runtime/test/ompt/tasks/explicit_task.c:94:12: error: expected 
>>>>>>> string
>>>>>>> not found in input
>>>>>>>  // CHECK: {{^}}[[THREAD_ID]]: ompt_event_barrier_end:
>>>>>>> parallel_id={{[0-9]+}}, task_id=[[IMPLICIT_TASK_ID]]
>>>>>>>            ^
>>>>>>> <stdin>:53:1: note: scanning from here
>>>>>> 
>>>>>> Do you have the chance to get the full output when the checks 
>>>>>> fail? (I usually run the test directly, save the output 
>>>>>> temporarily and pass it to FileCheck to have the output at hand if 
>>>>>> that fails.)
>>>>>> 
>>>>> 
>>>>> Now when I run it in isolation, it points to different line, but 
>>>>> the
>>>>> issue seems the same:
>>>>> 
>>>>> $ cat explicit_task.c.tmp.out
>>>>> |$HOME/llvm/build-shared-release/bin/FileCheck
>>>>> $HOME/openmp/runtime/test/ompt/tasks/explicit_task.c
>>>>> $HOME/openmp/runtime/test/ompt/tasks/explicit_task.c:76:12: error:
>>>>> expected string not found in input
>>>>>  // CHECK: {{^}}[[THREAD_ID:[0-9]+]]: 
>>>>> ompt_event_implicit_task_begin:
>>>>> parallel_id=[[PARALLEL_ID]], task_id=[[IMPLICIT_TASK_ID:[0-9]+]]
>>>>>            ^
>>>>> <stdin>:50:86: note: scanning from here
>>>>> 281474976710657: ompt_event_implicit_task_end: parallel_id=0,
>>>>> task_id=281474976710661, team_size=2, thread_num=0
>>>>> 
>>>>>              ^
>>>>> <stdin>:50:86: note: with variable "PARALLEL_ID" equal to 
>>>>> "281474976710660"
>>>>> 281474976710657: ompt_event_implicit_task_end: parallel_id=0,
>>>>> task_id=281474976710661, team_size=2, thread_num=0
>>>>> 
>>>>>              ^
>>>>> <stdin>:55:5: note: possible intended match here
>>>>> 562949953421313: ompt_event_implicit_task_end: parallel_id=0,
>>>>> task_id=562949953421314, team_size=0, thread_num=1
>>>>>     ^
>>>>> 
>>>>> ...And the full output is:
>>>>> 
>>>>> [...]
>>>> 
>>>> The test sorts the output by thread: sort --numeric-sort --stable. 
>>>> So unfortunately this output doesn't show the original error that 
>>>> you have been seeing :-(
>>> 
>>> Unfortunately it occurs very rarely...
>> 


More information about the Openmp-dev mailing list