[llvm-bugs] [Bug 26454] New: [3.8.0] omp parallel for simd unexpected behaviour at different optimization levels

Wed Feb 3 08:01:17 PST 2016

https://llvm.org/bugs/show_bug.cgi?id=26454

            Bug ID: 26454
           Summary: [3.8.0] omp parallel for simd unexpected behaviour at
                    different optimization levels
           Product: OpenMP
           Version: unspecified
          Hardware: PC
                OS: FreeBSD
            Status: NEW
          Severity: normal
          Priority: P
         Component: Clang Compiler Support
          Assignee: unassignedclangbugs at nondot.org
          Reporter: bugzilla at hannes.hauswedell.net
                CC: llvm-bugs at lists.llvm.org
    Classification: Unclassified

Created attachment 15815
  --> https://llvm.org/bugs/attachment.cgi?id=15815&action=edit
small benchmark

I have created a little example to compare the vectorization support of clang
vs gcc and the possible benefits of combining omp parallel and simd.

These are the results, g++ is 5.3.0 and clang++ is 3.7.1 / 3.8.d20150720_1 (I
know, not the most recent snapshot); I have limited OMP_NUM_THREADS to 2, so
that we still get a clear picture.

g++5 test.cpp -std=c++14 -fopenmp -O0

auto: 4.07434
omp parallel for: 2.03428
omp simd: 3.24567
omp parallel for simd: 1.85369

g++5 test.cpp -std=c++14 -fopenmp -O3

auto: 0.595322
omp parallel for: 0.410147
omp simd: 0.514423
omp parallel for simd: 0.383947

clang++37 test.cpp -std=c++14 -fopenmp -O0

auto: 2.91202
omp parallel for: 2.44816
omp simd: 2.95256
omp parallel for simd: 1.82498

clang++37 test.cpp -std=c++14 -fopenmp -O3

auto: 0.619024
omp parallel for: 0.412554
omp simd: 0.593244
omp parallel for simd: 0.403466

clang++-devel test.cpp -std=c++14 -fopenmp -O0

auto: 2.91251
omp parallel for: 1.72933
omp simd: 2.95548
omp parallel for simd: 2.14271

clang++-devel test.cpp -std=c++14 -fopenmp -O3

auto: 0.616876
omp parallel for: 0.289257
omp simd: 0.557144
omp parallel for simd: 0.289215

The first observation: clang38 is faster or the same speed as GCC in auto, omp
parallel for and omp simd, both with and without optimization. With 
optimization there is also a significant speed-up of clang38 over clang37 and
gcc! Congratulations :)

For "#pragma omp parallel for simd" it is different. I know this is an OPENMP4
feature and for -03 clang37 and 38 correctly warn me:

warning: loop not vectorized: failed explicitly specified loop vectorization
[-Wpass-failed]

Hence the speed of "parallel for" and "parallel for simd" are the same on
clang37 and clang38. However for -00 there is no warning which I would consider
a bug, BUT the runtime is also different. It is better than simd and worse than
parallel which means it is doing neither and something else instead, could be
another bug... but what is actually happening there?

Thank you for taking the time and providing this excellent compiler!

PS: Is there an ETA for #pragma omp parallel for simd ?

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20160203/3cb4ec17/attachment.html>