[Openmp-dev] Performance slowdown

Finkel, Hal J. via Openmp-dev openmp-dev at lists.llvm.org
Wed Aug 19 13:39:57 PDT 2015


Sent from my Verizon Wireless 4G LTE DROID
On Aug 19, 2015 1:36 PM, César via Openmp-dev <openmp-dev at lists.llvm.org<mailto:openmp-dev at lists.llvm.org>> wrote:
>
> On Wed, Aug 19, 2015 at 3:07 PM, Jack Howarth <howarth.mailing.lists at gmail.com<mailto:howarth.mailing.lists at gmail.com>> wrote:
>>
>> On Tue, Aug 18, 2015 at 2:14 PM, César via Openmp-dev
>> <openmp-dev at lists.llvm.org<mailto:openmp-dev at lists.llvm.org>> wrote:
>> > Hello,
>> >
>> > I don't know if this is the correct list to talk about this - I did not find
>> > a better place..
>> >
>> > I am doing performance experiments with a few OpenMP implementations (IOMP,
>> > GOMP and our private impl.) and I am seeing a severe slowdown when I use
>> > IOMP (GOMP and others are performing well).
>> >
>> > The benchmarks I am using are these ones:
>> > http://kastors.gforge.inria.fr/#!index.md
>>
>> That web page claims the benchmarks use parts of the OpenMP 4.0 specification.
>>
>> "The KaStORS benchmark suite has been designed to evaluate the implementation of
>> the OpenMP dependent task paradigm, introduced as part of the OpenMP 4.0
>> specification."
>>
>> Currently openmp is only complete for the OpenMP 3.2 specification
>>
>
> I am able to compile a few benchmarks that use task dependence annotations (from OMP 4.0) but for those that specify the range of the memory dependence I get syntax error. So, should I assume that this part is not implemented, right? Is there a list for the OMP 4.0 items that are currently supported?
>
> BTW, the Clang version from Github was able to parse these annotations, was it dropped from the current newer version?
>

It is not there yet. You'll need to use the code from the github clang_trunk (and llvm_trunk, etc.) repositories to get both recent Clang/LLVM and all of the OpenMP features.

-Hal

>
>>
>> >
>> > Really, the slowdown is huge. For one of the programs (plasma/dpotrf_taskdep
>> > -n 8192 -b 64 -i 1 -c) the serial version executes in ~28s and the parallel
>> > one executes in ~110s. I did some profiling and found that most of the time
>> > is being spent on synchronization barriers and dependence tracking (see
>> > attached image). Before digging deeper I would like to hear back from you if
>> > I am doing something wrong here:
>> >
>> > - I tested with the last version of the repository:
>> > http://llvm.org/svn/llvm-project/openmp/trunk
>> > - I am using Ubuntu 14.10.
>> > - I have tested on more than one machine, the results above are from a Intel
>> > i7-3770
>> > - The runtime itself is compiled using: make compiler=gcc os_omp=linux
>> > arch=32e
>> > - The version of GCC that I am using is: 4.9.1
>> > - The version of Clang that I am using to compile the benchmarks: 3.5.0
>> >
>> >
>> > César.
>> >
>> > _______________________________________________
>> > Openmp-dev mailing list
>> > Openmp-dev at lists.llvm.org<mailto:Openmp-dev at lists.llvm.org>
>> > http://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev
>> >
>
>


Sent from my Verizon Wireless 4G LTE DROID
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/openmp-dev/attachments/20150819/14dad05b/attachment.html>


More information about the Openmp-dev mailing list