[Openmp-dev] Enabling OMPT by default

Paul Osmialowski via Openmp-dev openmp-dev at lists.llvm.org
Fri Dec 22 03:11:50 PST 2017



On 21/12/2017 14:17, Joachim Protze wrote:
> Hi Paul,
> 
> Typically oversubscription makes failures more probably, so I test with 
> something like (assuming to be in a BUILD directory in the top openmp 
> directory):
> 
> for i in $(seq 20)
> do
>    (env LD_PRELOAD=runtime/src/libomp.so \
>     runtime/test/ompt/tasks/Output/explicit_task.c.tmp |
>     sort -n --stable | tee explicit_task.c.log.$i |
>     FileCheck ../runtime/test/ompt/tasks/explicit_task.c ||
>     echo "$i failed")&
> done
> 
> This starts 20 parallel executions, writes a copy of the output to 
> explicit_task.c.log.$i and prints you the number of the failed test. 
> This should give you the output for a failed test case after some 
> iterations.

I've put it into runtest.sh and started. Normally, it didn't occur at 
all, but I finally managed to catch something when I increased seq to 
3000 (this time, it's line 56, previously it was 94 initially and later 76):

$ ./runtest.sh
./runtest.sh: fork: retry: No child processes
./runtest.sh: fork: retry: No child processes
./runtest.sh: fork: retry: No child processes
./runtest.sh: fork: retry: Resource temporarily unavailable
./runtest.sh: fork: retry: Resource temporarily unavailable
./runtest.sh: fork: retry: No child processes
./runtest.sh: fork: retry: Resource temporarily unavailable
./runtest.sh: fork: retry: No child processes
./runtest.sh: fork: retry: No child processes
./runtest.sh: fork: retry: No child processes
./runtest.sh: fork: retry: Resource temporarily unavailable
OMP: Error #34: System unable to allocate necessary resources for OMP 
thread:
OMP: System error #11: Resource temporarily unavailable
OMP: Hint: Try decreasing the value of OMP_NUM_THREADS.
../runtime/test/ompt/tasks/explicit_task.c:56:12: error: expected string 
not found in input
  // CHECK: {{^}}[[MASTER_ID]]: ompt_event_implicit_task_begin: 
parallel_id=[[PARALLEL_ID]], task_id=[[IMPLICIT_TASK_ID:[0-9]+]]
            ^
<stdin>:6:1: note: scanning from here

^
<stdin>:6:1: note: with variable "MASTER_ID" equal to "281474976710657"

^
<stdin>:6:1: note: with variable "PARALLEL_ID" equal to "281474976710660"

^
1695 failed

$ cat explicit_task.c.log.1695
0: NULL_POINTER=(nil)
281474976710657: ompt_event_thread_begin: 
thread_type=ompt_thread_initial=1, thread_id=281474976710657
281474976710657: ompt_event_task_create: parent_task_id=0, 
parent_task_frame.exit=(nil), parent_task_frame.reenter=(nil), 
new_task_id=281474976710658, codeptr_ra=(nil), 
task_type=ompt_task_initial=1, has_dependences=no
281474976710657: __builtin_frame_address(0)=0xfffff2490930
281474976710657: ompt_event_parallel_begin: 
parent_task_id=281474976710658, parent_task_frame.exit=(nil), 
parent_task_frame.reenter=0xfffff2490930, parallel_id=281474976710660, 
requested_team_size=2, codeptr_ra=0x402fac, invoker=2
$



> I typically LD_PRELOAD the libomp from the BUILD directory to make sure, 
> that I use the right library.
> 

Yes, I don't do system-wide installation of libomp.so, so those test 
cases won't even run without setting LD_PRELOAD or LD_LIBRARY_PATH.

> Best
> Joachim
> 
> On 12/21/2017 01:14 PM, Paul Osmialowski wrote:
>>
>>
>> On 21/12/2017 10:51, Jonas Hahnfeld wrote:
>>> Am 2017-12-21 11:22, schrieb Paul Osmialowski:
>>>> replies inlined below:
>>>>
>>>> On 20/12/2017 18:34, Jonas Hahnfeld wrote:
>>>>> Am 2017-12-20 14:22, schrieb Paul Osmialowski:
>>>>>> Yeah, you're right again, with the following change:
>>>>>>
>>>>>> +#define print_possible_return_addresses(addr) \
>>>>>> +  printf("%" PRIu64 ": current_address=%p or %p\n",
>>>>>> ompt_get_thread_data()->value, \
>>>>>> +         ((char *)addr) - 4, ((char *)addr) - 8)
>>>>>
>>>>> Cool, can you put up a patch for this?
>>>>>
>>>>
>>>> Done, https://reviews.llvm.org/D41482
>>>>
>>>>
>>>>>> ...I can see only ompt/tasks/explicit_task.c failing from time to
>>>>>> time, but it seems to be unrelated to printed address issue:
>>>>>>
>>>>>> runtime/test/ompt/tasks/explicit_task.c:94:12: error: expected string
>>>>>> not found in input
>>>>>>  // CHECK: {{^}}[[THREAD_ID]]: ompt_event_barrier_end:
>>>>>> parallel_id={{[0-9]+}}, task_id=[[IMPLICIT_TASK_ID]]
>>>>>>            ^
>>>>>> <stdin>:53:1: note: scanning from here
>>>>>
>>>>> Do you have the chance to get the full output when the checks fail? 
>>>>> (I usually run the test directly, save the output temporarily and 
>>>>> pass it to FileCheck to have the output at hand if that fails.)
>>>>>
>>>>
>>>> Now when I run it in isolation, it points to different line, but the
>>>> issue seems the same:
>>>>
>>>> $ cat explicit_task.c.tmp.out
>>>> |$HOME/llvm/build-shared-release/bin/FileCheck
>>>> $HOME/openmp/runtime/test/ompt/tasks/explicit_task.c
>>>> $HOME/openmp/runtime/test/ompt/tasks/explicit_task.c:76:12: error:
>>>> expected string not found in input
>>>>  // CHECK: {{^}}[[THREAD_ID:[0-9]+]]: ompt_event_implicit_task_begin:
>>>> parallel_id=[[PARALLEL_ID]], task_id=[[IMPLICIT_TASK_ID:[0-9]+]]
>>>>            ^
>>>> <stdin>:50:86: note: scanning from here
>>>> 281474976710657: ompt_event_implicit_task_end: parallel_id=0,
>>>> task_id=281474976710661, team_size=2, thread_num=0
>>>>
>>>>              ^
>>>> <stdin>:50:86: note: with variable "PARALLEL_ID" equal to 
>>>> "281474976710660"
>>>> 281474976710657: ompt_event_implicit_task_end: parallel_id=0,
>>>> task_id=281474976710661, team_size=2, thread_num=0
>>>>
>>>>              ^
>>>> <stdin>:55:5: note: possible intended match here
>>>> 562949953421313: ompt_event_implicit_task_end: parallel_id=0,
>>>> task_id=562949953421314, team_size=0, thread_num=1
>>>>     ^
>>>>
>>>> ...And the full output is:
>>>>
>>>> [...]
>>>
>>> The test sorts the output by thread: sort --numeric-sort --stable. So 
>>> unfortunately this output doesn't show the original error that you 
>>> have been seeing :-(
>>
>> Unfortunately it occurs very rarely...
> 


More information about the Openmp-dev mailing list