<div dir="ltr">Andrey,<div>     FSF gcc is not exactly defaulting to -O2 but certainly is higher than the default optimizations on clang….</div><div><br></div><div>% touch t.cc</div><div>% g++-fsf-4.9 -fverbose-asm t.cc -S</div>
<div>% more t.s</div><div><div># GNU C++ (GCC) version 4.9.0 (x86_64-apple-darwin11.4.2)</div><div>#       compiled by GNU C version 4.9.0, GMP version 6.0.0, MPFR version 3.1.2, MPC version 1.0.2</div><div># GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072</div>
<div># options passed:  -D__DYNAMIC__ t.cc -fPIC -mmacosx-version-min=10.7.4</div><div># -mtune=core2 -fverbose-asm</div><div># options enabled:  -Wnonportable-cfstrings -fPIC</div><div># -faggressive-loop-optimizations -fasynchronous-unwind-tables</div>
<div># -fauto-inc-dec -fcommon -fdelete-null-pointer-checks -fearly-inlining</div><div># -feliminate-unused-debug-types -fexceptions -ffunction-cse -fgcse-lm</div><div># -fgnu-unique -fident -finline-atomics -fira-hoist-pressure</div>
<div># -fira-share-save-slots -fira-share-spill-slots -fivopts</div><div># -fkeep-static-consts -fleading-underscore -fmath-errno</div><div># -fmerge-debug-strings -fnext-runtime -fobjc-abi-version= -fpeephole</div><div># -fprefetch-loop-arrays -freg-struct-return</div>
<div># -fsched-critical-path-heuristic -fsched-dep-count-heuristic</div><div># -fsched-group-heuristic -fsched-interblock -fsched-last-insn-heuristic</div><div># -fsched-rank-heuristic -fsched-spec -fsched-spec-insn-heuristic</div>
<div># -fsched-stalled-insns-dep -fshow-column -fsigned-zeros</div><div># -fsplit-ivs-in-unroller -fstrict-volatile-bitfields -fsync-libcalls</div><div># -ftrapping-math -ftree-coalesce-vars -ftree-cselim -ftree-forwprop</div>
<div># -ftree-loop-if-convert -ftree-loop-im -ftree-loop-ivcanon</div><div># -ftree-loop-optimize -ftree-parallelize-loops= -ftree-phiprop</div><div># -ftree-reassoc -ftree-scev-cprop -funit-at-a-time -funwind-tables</div>
<div># -fverbose-asm -fzero-initialized-in-bss -gstrict-dwarf</div><div># -m128bit-long-double -m64 -m80387 -malign-stringops -matt-stubs</div><div># -mconstant-cfstrings -mfancy-math-387 -mfp-ret-in-387 -mfxsr -mieee-fp</div>
<div># -mlong-double-80 -mmmx -mno-sse4 -mpush-args -mred-zone -msse -msse2</div><div># -msse3</div><div><br></div><div>        .constructor</div><div>        .destructor</div><div>        .align 1</div><div>        .subsections_via_symbols</div>
</div><div><br></div><div><div>% g++-fsf-4.9 -fverbose-asm -O2 t.cc -S</div><div>% more t.s</div></div><div><div># GNU C++ (GCC) version 4.9.0 (x86_64-apple-darwin11.4.2)</div><div>#       compiled by GNU C version 4.9.0, GMP version 6.0.0, MPFR version 3.1.2, MPC version 1.0.2</div>
<div># GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072</div><div># options passed:  -D__DYNAMIC__ t.cc -fPIC -mmacosx-version-min=10.7.4</div><div># -mtune=core2 -O2 -fverbose-asm</div><div># options enabled:  -Wnonportable-cfstrings -fPIC</div>
<div># -faggressive-loop-optimizations -fasynchronous-unwind-tables</div><div># -fauto-inc-dec -fbranch-count-reg -fcaller-saves</div><div># -fcombine-stack-adjustments -fcommon -fcompare-elim -fcprop-registers</div><div>
# -fcrossjumping -fcse-follow-jumps -fdefer-pop</div><div># -fdelete-null-pointer-checks -fdevirtualize -fdevirtualize-speculatively</div><div># -fearly-inlining -feliminate-unused-debug-types -fexceptions</div><div># -fexpensive-optimizations -fforward-propagate -ffunction-cse -fgcse</div>
<div># -fgcse-lm -fgnu-unique -fguess-branch-probability -fhoist-adjacent-loads</div><div># -fident -fif-conversion -fif-conversion2 -findirect-inlining -finline</div><div># -finline-atomics -finline-functions-called-once -finline-small-functions</div>
<div># -fipa-cp -fipa-profile -fipa-pure-const -fipa-reference -fipa-sra</div><div># -fira-hoist-pressure -fira-share-save-slots -fira-share-spill-slots</div><div># -fisolate-erroneous-paths-dereference -fivopts -fkeep-static-consts</div>
<div># -fleading-underscore -fmath-errno -fmerge-constants -fmerge-debug-strings</div><div># -fmove-loop-invariants -fnext-runtime -fobjc-abi-version=</div><div># -fomit-frame-pointer -foptimize-sibling-calls -foptimize-strlen</div>
<div># -fpartial-inlining -fpeephole -fpeephole2 -fprefetch-loop-arrays -free</div><div># -freg-struct-return -freorder-blocks -freorder-blocks-and-partition</div><div># -freorder-functions -frerun-cse-after-loop</div><div>
# -fsched-critical-path-heuristic -fsched-dep-count-heuristic</div><div># -fsched-group-heuristic -fsched-interblock -fsched-last-insn-heuristic</div><div># -fsched-rank-heuristic -fsched-spec -fsched-spec-insn-heuristic</div>
<div># -fsched-stalled-insns-dep -fschedule-insns2 -fshow-column -fshrink-wrap</div><div># -fsigned-zeros -fsplit-ivs-in-unroller -fsplit-wide-types</div><div># -fstrict-aliasing -fstrict-overflow -fstrict-volatile-bitfields</div>
<div># -fsync-libcalls -fthread-jumps -ftoplevel-reorder -ftrapping-math</div><div># -ftree-bit-ccp -ftree-builtin-call-dce -ftree-ccp -ftree-ch</div><div># -ftree-coalesce-vars -ftree-copy-prop -ftree-copyrename -ftree-cselim</div>
<div># -ftree-dce -ftree-dominator-opts -ftree-dse -ftree-forwprop -ftree-fre</div><div># -ftree-loop-if-convert -ftree-loop-im -ftree-loop-ivcanon</div><div># -ftree-loop-optimize -ftree-parallelize-loops= -ftree-phiprop -ftree-pre</div>
<div># -ftree-pta -ftree-reassoc -ftree-scev-cprop -ftree-sink -ftree-slsr</div><div># -ftree-sra -ftree-switch-conversion -ftree-tail-merge -ftree-ter</div><div># -ftree-vrp -funit-at-a-time -funwind-tables -fverbose-asm</div>
<div># -fzero-initialized-in-bss -gstrict-dwarf -m128bit-long-double -m64</div><div># -m80387 -malign-stringops -matt-stubs -mconstant-cfstrings</div><div># -mfancy-math-387 -mfp-ret-in-387 -mfxsr -mieee-fp -mlong-double-80 -mmmx</div>
<div># -mno-sse4 -mpush-args -mred-zone -msse -msse2 -msse3 -mvzeroupper</div><div><br></div><div>        .constructor</div><div>        .destructor</div><div>        .align 1</div><div>        .subsections_via_symbols</div>
</div><div><br></div><div>You might consider filing an enhancement request for clang 3.5 to have the default behavior without optimization flags bumped up to -O1.</div><div>         Jack</div><div><br></div></div><div class="gmail_extra">
<br><br><div class="gmail_quote">On Wed, Jun 4, 2014 at 11:02 AM, Andrey Bokhanko <span dir="ltr"><<a href="mailto:andreybokhanko@gmail.com" target="_blank">andreybokhanko@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr"><div><div><div>Hi All,<br><br></div>Some of you probably saw performance benchmarking of OpenMP-enabled clang compiler published on Phoronix web-site: <a href="http://www.phoronix.com/scan.php?page=article&item=llvm_clang_openmp&num=1" target="_blank">http://www.phoronix.com/scan.php?page=article&item=llvm_clang_openmp&num=1</a><br>

<br></div><div>As you can see, clang-omp is behind gcc in most of the benchmarks. The reason is quite simple: no -O options supplied to both compilers, so gcc assumed -O2 by default, while clang assumed -O0. No surprise gcc got ahead!<br>

<br></div><div>We added -O2 to clang and re-measured, and results changed quite significantly. My Intel colleague, Alexey Bataev, informed Michael Larabel on this omission.<br></div><div><br></div>Yours,<br>Andrey<br>=====<br>

</div>Software Engineer<br>Intel Compiler Team<br>Intel<br><br></div>
<br>_______________________________________________<br>
Openmp-dev mailing list<br>
<a href="mailto:Openmp-dev@dcs-maillist2.engr.illinois.edu">Openmp-dev@dcs-maillist2.engr.illinois.edu</a><br>
<a href="http://lists.cs.uiuc.edu/mailman/listinfo/openmp-dev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/openmp-dev</a><br>
<br></blockquote></div><br></div>