[Openmp-dev] Performance slowdown

César via Openmp-dev openmp-dev at lists.llvm.org
Tue Aug 18 15:25:47 PDT 2015


Hi Andrey,

this is strange because when I compile with "clang-3.5 -fopenmp" the
executable that is produced is parallel. I am sure of this because I'm able
to see the threads and also because I can see the symbols used by the IOMP
runtime in the binary.

$ clang -O3 -g -fopenmp toy13.cpp -o toy13 -lm

$ nm toy13 | grep kmpc
U __kmpc_cancel_barrier@@VERSION
U __kmpc_end_single@@VERSION
U __kmpc_fork_call@@VERSION
U __kmpc_omp_task_alloc@@VERSION
U __kmpc_omp_task_with_deps@@VERSION
U __kmpc_single@@VERSION

$ ldd toy13
linux-vdso.so.1 =>  (0x00007fff9805d000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fc00e3cc000)
libiomp5.so => /usr/lib/libiomp5.so (0x00007fc00e121000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0
(0x00007fc00df03000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fc00db3e000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fc00d939000)
/lib64/ld-linux-x86-64.so.2 (0x00007fc00e6fc000)





César.

On Tue, Aug 18, 2015 at 6:15 PM, <andreybokhanko at gmail.com> wrote:

> César,
>
> - The version of Clang that I am using to compile the benchmarks: 3.5.0
>
>
> Clang 3.5 doesn't support OpenMP -- it simply ignores the pragmas.
>
> Please use version from trunk or from 3_7 release branch. Also, please
> supply -fopenmp= libomp option.
>
> Yours,
> Andrey Bokhanko
> =============
> Software Engineer
> Intel Compiler Team
> Intel
>
> Отправлено с iPad
>
> 18 авг. 2015 г., в 21:14, César via Openmp-dev <openmp-dev at lists.llvm.org>
> написал(а):
>
> Hello,
>
> I don't know if this is the correct list to talk about this - I did not
> find a better place..
>
> I am doing performance experiments with a few OpenMP implementations
> (IOMP, GOMP and our private impl.) and I am seeing a severe slowdown when I
> use IOMP (GOMP and others are performing well).
>
> The benchmarks I am using are these ones:
> http://kastors.gforge.inria.fr/#!index.md
>
> Really, the slowdown is huge. For one of the programs
> (plasma/dpotrf_taskdep -n 8192 -b 64 -i 1 -c) the serial version executes
> in ~28s and the parallel one executes in ~110s. I did some profiling and
> found that most of the time is being spent on synchronization barriers and
> dependence tracking (see attached image). Before digging deeper I would
> like to hear back from you if I am doing something wrong here:
>
> - I tested with the last version of the repository:
> http://llvm.org/svn/llvm-project/openmp/trunk
> - I am using Ubuntu 14.10.
> - I have tested on more than one machine, the results above are from a
> Intel i7-3770
> - The runtime itself is compiled using: make compiler=gcc os_omp=linux
> arch=32e
> - The version of GCC that I am using is: 4.9.1
> - The version of Clang that I am using to compile the benchmarks: 3.5.0
>
>
> César.
>
> <pic1.png>
>
> <pic2.png>
>
> <pic3.png>
>
> _______________________________________________
> Openmp-dev mailing list
> Openmp-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/openmp-dev/attachments/20150818/291e11b5/attachment.html>


More information about the Openmp-dev mailing list