[Openmp-dev] Enabling OMPT by default
Paul Osmialowski via Openmp-dev
openmp-dev at lists.llvm.org
Fri Dec 22 03:11:50 PST 2017
On 21/12/2017 14:17, Joachim Protze wrote:
> Hi Paul,
>
> Typically oversubscription makes failures more probably, so I test with
> something like (assuming to be in a BUILD directory in the top openmp
> directory):
>
> for i in $(seq 20)
> do
> (env LD_PRELOAD=runtime/src/libomp.so \
> runtime/test/ompt/tasks/Output/explicit_task.c.tmp |
> sort -n --stable | tee explicit_task.c.log.$i |
> FileCheck ../runtime/test/ompt/tasks/explicit_task.c ||
> echo "$i failed")&
> done
>
> This starts 20 parallel executions, writes a copy of the output to
> explicit_task.c.log.$i and prints you the number of the failed test.
> This should give you the output for a failed test case after some
> iterations.
I've put it into runtest.sh and started. Normally, it didn't occur at
all, but I finally managed to catch something when I increased seq to
3000 (this time, it's line 56, previously it was 94 initially and later 76):
$ ./runtest.sh
./runtest.sh: fork: retry: No child processes
./runtest.sh: fork: retry: No child processes
./runtest.sh: fork: retry: No child processes
./runtest.sh: fork: retry: Resource temporarily unavailable
./runtest.sh: fork: retry: Resource temporarily unavailable
./runtest.sh: fork: retry: No child processes
./runtest.sh: fork: retry: Resource temporarily unavailable
./runtest.sh: fork: retry: No child processes
./runtest.sh: fork: retry: No child processes
./runtest.sh: fork: retry: No child processes
./runtest.sh: fork: retry: Resource temporarily unavailable
OMP: Error #34: System unable to allocate necessary resources for OMP
thread:
OMP: System error #11: Resource temporarily unavailable
OMP: Hint: Try decreasing the value of OMP_NUM_THREADS.
../runtime/test/ompt/tasks/explicit_task.c:56:12: error: expected string
not found in input
// CHECK: {{^}}[[MASTER_ID]]: ompt_event_implicit_task_begin:
parallel_id=[[PARALLEL_ID]], task_id=[[IMPLICIT_TASK_ID:[0-9]+]]
^
<stdin>:6:1: note: scanning from here
^
<stdin>:6:1: note: with variable "MASTER_ID" equal to "281474976710657"
^
<stdin>:6:1: note: with variable "PARALLEL_ID" equal to "281474976710660"
^
1695 failed
$ cat explicit_task.c.log.1695
0: NULL_POINTER=(nil)
281474976710657: ompt_event_thread_begin:
thread_type=ompt_thread_initial=1, thread_id=281474976710657
281474976710657: ompt_event_task_create: parent_task_id=0,
parent_task_frame.exit=(nil), parent_task_frame.reenter=(nil),
new_task_id=281474976710658, codeptr_ra=(nil),
task_type=ompt_task_initial=1, has_dependences=no
281474976710657: __builtin_frame_address(0)=0xfffff2490930
281474976710657: ompt_event_parallel_begin:
parent_task_id=281474976710658, parent_task_frame.exit=(nil),
parent_task_frame.reenter=0xfffff2490930, parallel_id=281474976710660,
requested_team_size=2, codeptr_ra=0x402fac, invoker=2
$
> I typically LD_PRELOAD the libomp from the BUILD directory to make sure,
> that I use the right library.
>
Yes, I don't do system-wide installation of libomp.so, so those test
cases won't even run without setting LD_PRELOAD or LD_LIBRARY_PATH.
> Best
> Joachim
>
> On 12/21/2017 01:14 PM, Paul Osmialowski wrote:
>>
>>
>> On 21/12/2017 10:51, Jonas Hahnfeld wrote:
>>> Am 2017-12-21 11:22, schrieb Paul Osmialowski:
>>>> replies inlined below:
>>>>
>>>> On 20/12/2017 18:34, Jonas Hahnfeld wrote:
>>>>> Am 2017-12-20 14:22, schrieb Paul Osmialowski:
>>>>>> Yeah, you're right again, with the following change:
>>>>>>
>>>>>> +#define print_possible_return_addresses(addr) \
>>>>>> + printf("%" PRIu64 ": current_address=%p or %p\n",
>>>>>> ompt_get_thread_data()->value, \
>>>>>> + ((char *)addr) - 4, ((char *)addr) - 8)
>>>>>
>>>>> Cool, can you put up a patch for this?
>>>>>
>>>>
>>>> Done, https://reviews.llvm.org/D41482
>>>>
>>>>
>>>>>> ...I can see only ompt/tasks/explicit_task.c failing from time to
>>>>>> time, but it seems to be unrelated to printed address issue:
>>>>>>
>>>>>> runtime/test/ompt/tasks/explicit_task.c:94:12: error: expected string
>>>>>> not found in input
>>>>>> // CHECK: {{^}}[[THREAD_ID]]: ompt_event_barrier_end:
>>>>>> parallel_id={{[0-9]+}}, task_id=[[IMPLICIT_TASK_ID]]
>>>>>> ^
>>>>>> <stdin>:53:1: note: scanning from here
>>>>>
>>>>> Do you have the chance to get the full output when the checks fail?
>>>>> (I usually run the test directly, save the output temporarily and
>>>>> pass it to FileCheck to have the output at hand if that fails.)
>>>>>
>>>>
>>>> Now when I run it in isolation, it points to different line, but the
>>>> issue seems the same:
>>>>
>>>> $ cat explicit_task.c.tmp.out
>>>> |$HOME/llvm/build-shared-release/bin/FileCheck
>>>> $HOME/openmp/runtime/test/ompt/tasks/explicit_task.c
>>>> $HOME/openmp/runtime/test/ompt/tasks/explicit_task.c:76:12: error:
>>>> expected string not found in input
>>>> // CHECK: {{^}}[[THREAD_ID:[0-9]+]]: ompt_event_implicit_task_begin:
>>>> parallel_id=[[PARALLEL_ID]], task_id=[[IMPLICIT_TASK_ID:[0-9]+]]
>>>> ^
>>>> <stdin>:50:86: note: scanning from here
>>>> 281474976710657: ompt_event_implicit_task_end: parallel_id=0,
>>>> task_id=281474976710661, team_size=2, thread_num=0
>>>>
>>>> ^
>>>> <stdin>:50:86: note: with variable "PARALLEL_ID" equal to
>>>> "281474976710660"
>>>> 281474976710657: ompt_event_implicit_task_end: parallel_id=0,
>>>> task_id=281474976710661, team_size=2, thread_num=0
>>>>
>>>> ^
>>>> <stdin>:55:5: note: possible intended match here
>>>> 562949953421313: ompt_event_implicit_task_end: parallel_id=0,
>>>> task_id=562949953421314, team_size=0, thread_num=1
>>>> ^
>>>>
>>>> ...And the full output is:
>>>>
>>>> [...]
>>>
>>> The test sorts the output by thread: sort --numeric-sort --stable. So
>>> unfortunately this output doesn't show the original error that you
>>> have been seeing :-(
>>
>> Unfortunately it occurs very rarely...
>
More information about the Openmp-dev
mailing list