[llvm-bugs] [Bug 26454] New: [3.8.0] omp parallel for simd unexpected behaviour at different optimization levels
via llvm-bugs
llvm-bugs at lists.llvm.org
Wed Feb 3 08:01:17 PST 2016
https://llvm.org/bugs/show_bug.cgi?id=26454
Bug ID: 26454
Summary: [3.8.0] omp parallel for simd unexpected behaviour at
different optimization levels
Product: OpenMP
Version: unspecified
Hardware: PC
OS: FreeBSD
Status: NEW
Severity: normal
Priority: P
Component: Clang Compiler Support
Assignee: unassignedclangbugs at nondot.org
Reporter: bugzilla at hannes.hauswedell.net
CC: llvm-bugs at lists.llvm.org
Classification: Unclassified
Created attachment 15815
--> https://llvm.org/bugs/attachment.cgi?id=15815&action=edit
small benchmark
I have created a little example to compare the vectorization support of clang
vs gcc and the possible benefits of combining omp parallel and simd.
These are the results, g++ is 5.3.0 and clang++ is 3.7.1 / 3.8.d20150720_1 (I
know, not the most recent snapshot); I have limited OMP_NUM_THREADS to 2, so
that we still get a clear picture.
g++5 test.cpp -std=c++14 -fopenmp -O0
auto: 4.07434
omp parallel for: 2.03428
omp simd: 3.24567
omp parallel for simd: 1.85369
g++5 test.cpp -std=c++14 -fopenmp -O3
auto: 0.595322
omp parallel for: 0.410147
omp simd: 0.514423
omp parallel for simd: 0.383947
clang++37 test.cpp -std=c++14 -fopenmp -O0
auto: 2.91202
omp parallel for: 2.44816
omp simd: 2.95256
omp parallel for simd: 1.82498
clang++37 test.cpp -std=c++14 -fopenmp -O3
auto: 0.619024
omp parallel for: 0.412554
omp simd: 0.593244
omp parallel for simd: 0.403466
clang++-devel test.cpp -std=c++14 -fopenmp -O0
auto: 2.91251
omp parallel for: 1.72933
omp simd: 2.95548
omp parallel for simd: 2.14271
clang++-devel test.cpp -std=c++14 -fopenmp -O3
auto: 0.616876
omp parallel for: 0.289257
omp simd: 0.557144
omp parallel for simd: 0.289215
The first observation: clang38 is faster or the same speed as GCC in auto, omp
parallel for and omp simd, both with and without optimization. With
optimization there is also a significant speed-up of clang38 over clang37 and
gcc! Congratulations :)
For "#pragma omp parallel for simd" it is different. I know this is an OPENMP4
feature and for -03 clang37 and 38 correctly warn me:
warning: loop not vectorized: failed explicitly specified loop vectorization
[-Wpass-failed]
Hence the speed of "parallel for" and "parallel for simd" are the same on
clang37 and clang38. However for -00 there is no warning which I would consider
a bug, BUT the runtime is also different. It is better than simd and worse than
parallel which means it is doing neither and something else instead, could be
another bug... but what is actually happening there?
Thank you for taking the time and providing this excellent compiler!
PS: Is there an ETA for #pragma omp parallel for simd ?
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20160203/3cb4ec17/attachment.html>
More information about the llvm-bugs
mailing list