<div dir="ltr">Andrey,<div>     An initial attempt at benchmarking the performance for graphicsmagick 1.3.19 on x86_64-apple-darwin14 built at various optimization levels with openmp support enabled using gcc 5.1.0 or clang svn at r236592 with...<br><br><a href="http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20150504/128555.html">http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20150504/128555.html</a><br><a href="http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20150504/128561.html">http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20150504/128561.html</a><br><a href="http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20150504/128567.html">http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20150504/128567.html</a></div><div><br></div><div>produced the following results.</div><div><br></div><div><div>gcc 5.1 -O3</div><div><br></div><div>% gm benchmark -stepthreads 1 -duration 10 convert -size 2048x1080 pattern:granite -operator all Noise-Gaussian 30% null:</div><div>Results: 1 threads 14 iter 10.76s user 10.76s total 1.301 iter/s 1.301 iter/cpu 1.00 speedup 1.000 karp-flatt</div><div>Results: 2 threads 25 iter 19.75s user 10.27s total 2.434 iter/s 1.266 iter/cpu 1.87 speedup 0.069 karp-flatt</div><div>Results: 3 threads 36 iter 28.74s user 10.04s total 3.586 iter/s 1.253 iter/cpu 2.76 speedup 0.044 karp-flatt</div><div>Results: 4 threads 48 iter 38.54s user 10.21s total 4.701 iter/s 1.245 iter/cpu 3.61 speedup 0.036 karp-flatt</div><div>Results: 5 threads 58 iter 46.71s user 10.04s total 5.777 iter/s 1.242 iter/cpu 4.44 speedup 0.032 karp-flatt</div><div>Results: 6 threads 69 iter 55.76s user 10.14s total 6.805 iter/s 1.237 iter/cpu 5.23 speedup 0.029 karp-flatt</div><div>Results: 7 threads 78 iter 63.16s user 10.01s total 7.792 iter/s 1.235 iter/cpu 5.99 speedup 0.028 karp-flatt</div><div>Results: 8 threads 88 iter 71.33s user 10.02s total 8.782 iter/s 1.234 iter/cpu 6.75 speedup 0.026 karp-flatt</div><div><br></div><div>clang 3.7svn -O3</div><div><br></div><div>% gm benchmark -stepthreads 1 -duration 10 convert -size 2048x1080 pattern:granite -operator all Noise-Gaussian 30% null:</div><div>Results: 1 threads 19 iter 10.42s user 10.41s total 1.825 iter/s 1.823 iter/cpu 1.00 speedup 1.000 karp-flatt</div><div>Results: 2 threads 36 iter 20.15s user 10.08s total 3.571 iter/s 1.787 iter/cpu 1.96 speedup 0.022 karp-flatt</div><div>Results: 3 threads 53 iter 30.45s user 10.15s total 5.222 iter/s 1.741 iter/cpu 2.86 speedup 0.024 karp-flatt</div><div>Results: 4 threads 68 iter 39.96s user 10.00s total 6.800 iter/s 1.702 iter/cpu 3.73 speedup 0.025 karp-flatt</div><div>Results: 5 threads 83 iter 50.18s user 10.04s total 8.267 iter/s 1.654 iter/cpu 4.53 speedup 0.026 karp-flatt</div><div>Results: 6 threads 97 iter 59.97s user 10.01s total 9.690 iter/s 1.617 iter/cpu 5.31 speedup 0.026 karp-flatt</div><div>Results: 7 threads 111 iter 70.37s user 10.06s total 11.034 iter/s 1.577 iter/cpu 6.05 speedup 0.026 karp-flatt</div><div>Results: 8 threads 124 iter 79.95s user 10.04s total 12.351 iter/s 1.551 iter/cpu 6.77 speedup 0.026 karp-flatt</div><div><br></div><div>gcc 5.1 -O2</div><div><br></div><div>% gm benchmark -stepthreads 1 -duration 10 convert -size 2048x1080 pattern:granite -operator all Noise-Gaussian 30% null:</div><div>Results: 1 threads 13 iter 10.04s user 10.04s total 1.295 iter/s 1.295 iter/cpu 1.00 speedup 1.000 karp-flatt</div><div>Results: 2 threads 25 iter 19.86s user 10.32s total 2.422 iter/s 1.259 iter/cpu 1.87 speedup 0.069 karp-flatt</div><div>Results: 3 threads 36 iter 28.87s user 10.08s total 3.571 iter/s 1.247 iter/cpu 2.76 speedup 0.044 karp-flatt</div><div>Results: 4 threads 47 iter 37.84s user 10.03s total 4.686 iter/s 1.242 iter/cpu 3.62 speedup 0.035 karp-flatt</div><div>Results: 5 threads 58 iter 46.84s user 10.09s total 5.748 iter/s 1.238 iter/cpu 4.44 speedup 0.032 karp-flatt</div><div>Results: 6 threads 68 iter 55.06s user 10.02s total 6.786 iter/s 1.235 iter/cpu 5.24 speedup 0.029 karp-flatt</div><div>Results: 7 threads 78 iter 63.28s user 10.05s total 7.761 iter/s 1.233 iter/cpu 5.99 speedup 0.028 karp-flatt</div><div>Results: 8 threads 88 iter 71.48s user 10.02s total 8.782 iter/s 1.231 iter/cpu 6.78 speedup 0.026 karp-flatt</div><div><br></div><div>clang 3.7svn -O2</div><div><br></div><div>% gm benchmark -stepthreads 1 -duration 10 convert -size 2048x1080 pattern:granite -operator all Noise-Gaussian 30% null:</div><div>Results: 1 threads 19 iter 10.36s user 10.35s total 1.836 iter/s 1.834 iter/cpu 1.00 speedup 1.000 karp-flatt</div><div>Results: 2 threads 32 iter 20.63s user 10.31s total 3.104 iter/s 1.551 iter/cpu 1.69 speedup 0.183 karp-flatt</div><div>Results: 3 threads 46 iter 30.29s user 10.10s total 4.554 iter/s 1.519 iter/cpu 2.48 speedup 0.105 karp-flatt</div><div>Results: 4 threads 60 iter 40.36s user 10.09s total 5.946 iter/s 1.487 iter/cpu 3.24 speedup 0.078 karp-flatt</div><div>Results: 5 threads 73 iter 50.25s user 10.05s total 7.264 iter/s 1.453 iter/cpu 3.96 speedup 0.066 karp-flatt</div><div>Results: 6 threads 86 iter 60.44s user 10.08s total 8.532 iter/s 1.423 iter/cpu 4.65 speedup 0.058 karp-flatt</div><div>Results: 7 threads 98 iter 70.47s user 10.08s total 9.722 iter/s 1.391 iter/cpu 5.30 speedup 0.054 karp-flatt</div><div>Results: 8 threads 109 iter 79.59s user 10.02s total 10.878 iter/s 1.370 iter/cpu 5.93 speedup 0.050 karp-flatt</div><div><br></div><div>gcc 5.1 -Os</div><div><br></div><div>% gm benchmark -stepthreads 1 -duration 10 convert -size 2048x1080 pattern:granite -operator all Noise-Gaussian 30% null:</div><div>Results: 1 threads 12 iter 10.29s user 10.29s total 1.166 iter/s 1.166 iter/cpu 1.00 speedup 1.000 karp-flatt</div><div>Results: 2 threads 23 iter 19.56s user 10.00s total 2.300 iter/s 1.176 iter/cpu 1.97 speedup 0.014 karp-flatt</div><div>Results: 3 threads 35 iter 29.68s user 10.27s total 3.408 iter/s 1.179 iter/cpu 2.92 speedup 0.013 karp-flatt</div><div>Results: 4 threads 45 iter 38.14s user 10.04s total 4.482 iter/s 1.180 iter/cpu 3.84 speedup 0.014 karp-flatt</div><div>Results: 5 threads 56 iter 47.43s user 10.11s total 5.539 iter/s 1.181 iter/cpu 4.75 speedup 0.013 karp-flatt</div><div>Results: 6 threads 66 iter 55.89s user 10.06s total 6.561 iter/s 1.181 iter/cpu 5.63 speedup 0.013 karp-flatt</div><div>Results: 7 threads 76 iter 64.39s user 10.11s total 7.517 iter/s 1.180 iter/cpu 6.45 speedup 0.014 karp-flatt</div><div>Results: 8 threads 86 iter 72.90s user 10.11s total 8.506 iter/s 1.180 iter/cpu 7.29 speedup 0.014 karp-flatt</div><div><br></div><div>clang 3.7svn -Os</div><div><br></div><div>% gm benchmark -stepthreads 1 -duration 10 convert -size 2048x1080 pattern:granite -operator all Noise-Gaussian 30% null:</div><div>Results: 1 threads 19 iter 10.36s user 10.36s total 1.834 iter/s 1.834 iter/cpu 1.00 speedup 1.000 karp-flatt</div><div>Results: 2 threads 36 iter 20.50s user 10.25s total 3.512 iter/s 1.756 iter/cpu 1.92 speedup 0.044 karp-flatt</div><div>Results: 3 threads 52 iter 30.30s user 10.11s total 5.143 iter/s 1.716 iter/cpu 2.80 speedup 0.035 karp-flatt</div><div>Results: 4 threads 67 iter 40.12s user 10.03s total 6.680 iter/s 1.670 iter/cpu 3.64 speedup 0.033 karp-flatt</div><div>Results: 5 threads 82 iter 50.25s user 10.06s total 8.151 iter/s 1.632 iter/cpu 4.44 speedup 0.031 karp-flatt</div><div>Results: 6 threads 96 iter 60.23s user 10.04s total 9.562 iter/s 1.594 iter/cpu 5.21 speedup 0.030 karp-flatt</div><div>Results: 7 threads 109 iter 70.12s user 10.03s total 10.867 iter/s 1.554 iter/cpu 5.93 speedup 0.030 karp-flatt</div><div>Results: 8 threads 122 iter 79.82s user 10.03s total 12.164 iter/s 1.528 iter/cpu 6.63 speedup 0.029 karp-flatt</div><div><br></div><div>as described in <a href="http://www.graphicsmagick.org/OpenMP.html">http://www.graphicsmagick.org/OpenMP.html</a>. The interpretation of the results seem complex as the optimal results would be a combination of the highest iter/cpu as well as the highest speedup. The results for clang 3.7svn are clearly superior to gcc 5.1 on both metrics for -O3. For -O2 and -Os, the performance (iter/cpu) is always higher for clang 3.7svn but not the speedup compared to gcc 5.1.</div><div>                Jack</div><div><br></div><div><br></div><div class="gmail_extra"><div class="gmail_quote">On Wed, May 6, 2015 at 5:41 AM, Andrey Bokhanko <span dir="ltr"><<a href="mailto:andreybokhanko@gmail.com" target="_blank">andreybokhanko@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">Jack,<br>

<br>

Thanks you for all the testing efforts! -- they are really appreciated<br>

and in my eyes one of the best contributions to the overall OMP<br>

development effort.<br>

<br>

Keep up the good work!<br>

<span class=""><font color="#888888"><br>

Andrey<br>

</font></span><div class=""><div class="h5"><br>

<br>

On Mon, May 4, 2015 at 1:02 AM, Jack Howarth<br>

<<a href="mailto:howarth.mailing.lists@gmail.com">howarth.mailing.lists@gmail.com</a>> wrote:<br>

> A couple more data points. Current llvm 3.7svn with the two outstanding<br>

> OPENMP patches can build the openmp support in gdl 0.9.5 (which completely<br>

> passes its test suite) and apbs 1.4.1's limited openmp support.<br>

><br>

> On Sat, May 2, 2015 at 11:11 PM, Jack Howarth<br>

> <<a href="mailto:howarth.mailing.lists@gmail.com">howarth.mailing.lists@gmail.com</a>> wrote:<br>

>><br>

>>     On a positive note, current llvm 3.7svn with the two outstanding<br>

>> OPENMP patches applied builds the openmp support in gromacs 5.0.4 and the<br>

>> resulting build fully passes the gromacs regression test suite. Tested on<br>

>> x86_64-apple-darwin14.<br>

><br>

><br>

</div></div></blockquote></div><br></div></div></div>