<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">Hi Chandler,<div class=""><br class=""></div><div class="">I had observed some improvements and regressions with the new lowering.</div><div class=""><br class=""></div><div class="">Here are the numbers for an Ivy Bridge machine fixed at 2900MHz.</div><div class=""><br class=""></div><div class="">I’ll look into the regressions to provide test cases.</div><div class=""><br class=""></div><div class="">** Numbers **</div><div class=""><br class=""></div><div class="">Smaller is better. Only reported tests that run for at least one second.</div><div class="">Reference is the default lowering, Test is the new lowering.</div><div class="">The Os numbers are overall neutral, but the O3 numbers mainly expose regressions.</div><div class=""><br class=""></div><div class="">Note: I can attach the raw numbers if you want.</div><div class=""><br class=""></div><div class="">* Os *</div><div class=""><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">Benchmark_ID <span class="Apple-tab-span" style="white-space:pre"> </span>Reference<span class="Apple-tab-span" style="white-space:pre"> </span>Test <span class="Apple-tab-span" style="white-space:pre"> </span>Expansion <span class="Apple-tab-span" style="white-space:pre"> </span>Percent</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">-------------------------------------------------------------------------------</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">External/Nurbs/nurbs <span class="Apple-tab-span" style="white-space:pre"> </span> 2.3302<span class="Apple-tab-span" style="white-space:pre"> </span> 2.3122<span class="Apple-tab-span" style="white-space:pre"> </span> 0.99<span class="Apple-tab-span" style="white-space:pre"> </span> -1%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">External/SPEC/CFP2000/183.equake/183.eq<span class="Apple-tab-span" style="white-space:pre"> </span> 3.2606<span class="Apple-tab-span" style="white-space:pre"> </span> 3.2419<span class="Apple-tab-span" style="white-space:pre"> </span> 0.99<span class="Apple-tab-span" style="white-space:pre"> </span> -1%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">External/SPEC/CFP2006/447.dealII/<a href="http://447.de" class="">447.de</a><span class="Apple-tab-span" style="white-space:pre"> </span> 16.4638<span class="Apple-tab-span" style="white-space:pre"> </span> 16.1313<span class="Apple-tab-span" style="white-space:pre"> </span> 0.98<span class="Apple-tab-span" style="white-space:pre"> </span> -2%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">External/SPEC/CFP2006/470.lbm/470.lbm <span class="Apple-tab-span" style="white-space:pre"> </span> 2.0159<span class="Apple-tab-span" style="white-space:pre"> </span> 1.9931<span class="Apple-tab-span" style="white-space:pre"> </span> 0.99<span class="Apple-tab-span" style="white-space:pre"> </span> -1%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">External/SPEC/CINT2000/164.gzip/164.gzi<span class="Apple-tab-span" style="white-space:pre"> </span> 8.7611<span class="Apple-tab-span" style="white-space:pre"> </span> 8.6981<span class="Apple-tab-span" style="white-space:pre"> </span> 0.99<span class="Apple-tab-span" style="white-space:pre"> </span> -1%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">External/SPEC/CINT2006/456.hmmer/456.hm<span class="Apple-tab-span" style="white-space:pre"> </span> 2.5674<span class="Apple-tab-span" style="white-space:pre"> </span> 2.5819<span class="Apple-tab-span" style="white-space:pre"> </span> 1.01<span class="Apple-tab-span" style="white-space:pre"> </span> +1%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">External/SPEC/CINT2006/462.libquantum/4<span class="Apple-tab-span" style="white-space:pre"> </span> 1.2924<span class="Apple-tab-span" style="white-space:pre"> </span> 1.347<span class="Apple-tab-span" style="white-space:pre"> </span> 1.04<span class="Apple-tab-span" style="white-space:pre"> </span> +4%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">MultiSource/Benchmarks/TSVC/CrossingThr<span class="Apple-tab-span" style="white-space:pre"> </span> 2.4703<span class="Apple-tab-span" style="white-space:pre"> </span> 2.4852<span class="Apple-tab-span" style="white-space:pre"> </span> 1.01<span class="Apple-tab-span" style="white-space:pre"> </span> +1%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">MultiSource/Benchmarks/TSVC/LoopRerolli<span class="Apple-tab-span" style="white-space:pre"> </span> 2.6611<span class="Apple-tab-span" style="white-space:pre"> </span> 2.5668<span class="Apple-tab-span" style="white-space:pre"> </span> 0.96<span class="Apple-tab-span" style="white-space:pre"> </span> -4%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">MultiSource/Benchmarks/mafft/pairlocala<span class="Apple-tab-span" style="white-space:pre"> </span> 24.676<span class="Apple-tab-span" style="white-space:pre"> </span> 24.5372<span class="Apple-tab-span" style="white-space:pre"> </span> 0.99<span class="Apple-tab-span" style="white-space:pre"> </span> -1%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">SingleSource/Benchmarks/Adobe-C++/simpl<span class="Apple-tab-span" style="white-space:pre"> </span> 1.0579<span class="Apple-tab-span" style="white-space:pre"> </span> 1.1048<span class="Apple-tab-span" style="white-space:pre"> </span> 1.04<span class="Apple-tab-span" style="white-space:pre"> </span> +4%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">SingleSource/Benchmarks/Linpack/linpack<span class="Apple-tab-span" style="white-space:pre"> </span> 4.2817<span class="Apple-tab-span" style="white-space:pre"> </span> 4.3298<span class="Apple-tab-span" style="white-space:pre"> </span> 1.01<span class="Apple-tab-span" style="white-space:pre"> </span> +1%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">SingleSource/Benchmarks/Misc-C++/stepan<span class="Apple-tab-span" style="white-space:pre"> </span> 4.1821<span class="Apple-tab-span" style="white-space:pre"> </span> 4.226<span class="Apple-tab-span" style="white-space:pre"> </span> 1.01<span class="Apple-tab-span" style="white-space:pre"> </span> +1%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">SingleSource/Benchmarks/Misc/oourafft <span class="Apple-tab-span" style="white-space:pre"> </span> 3.0305<span class="Apple-tab-span" style="white-space:pre"> </span> 3.1777<span class="Apple-tab-span" style="white-space:pre"> </span> 1.05<span class="Apple-tab-span" style="white-space:pre"> </span> +5%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">-------------------------------------------------------------------------------</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">Min (14) <span class="Apple-tab-span" style="white-space:pre"> </span> -<span class="Apple-tab-span" style="white-space:pre"> </span> -<span class="Apple-tab-span" style="white-space:pre"> </span> 0.96<span class="Apple-tab-span" style="white-space:pre"> </span> -</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">-------------------------------------------------------------------------------</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">Max (14) <span class="Apple-tab-span" style="white-space:pre"> </span> -<span class="Apple-tab-span" style="white-space:pre"> </span> -<span class="Apple-tab-span" style="white-space:pre"> </span> 1.05<span class="Apple-tab-span" style="white-space:pre"> </span> -</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">-------------------------------------------------------------------------------</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">Sum (14) <span class="Apple-tab-span" style="white-space:pre"> </span> 79<span class="Apple-tab-span" style="white-space:pre"> </span> 79<span class="Apple-tab-span" style="white-space:pre"> </span> 1<span class="Apple-tab-span" style="white-space:pre"> </span> +0%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">-------------------------------------------------------------------------------</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">A.Mean (14) <span class="Apple-tab-span" style="white-space:pre"> </span> -<span class="Apple-tab-span" style="white-space:pre"> </span> -<span class="Apple-tab-span" style="white-space:pre"> </span> 1.01<span class="Apple-tab-span" style="white-space:pre"> </span> +1%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">-------------------------------------------------------------------------------</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">G.Mean 2 (14) <span class="Apple-tab-span" style="white-space:pre"> </span> -<span class="Apple-tab-span" style="white-space:pre"> </span> -<span class="Apple-tab-span" style="white-space:pre"> </span> 1.01<span class="Apple-tab-span" style="white-space:pre"> </span> +1%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">-------------------------------------------------------------------------------</div></div><div class=""><br class=""></div><div class="">* O3 *</div><div class=""><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">Benchmark_ID <span class="Apple-tab-span" style="white-space:pre"> </span>Reference<span class="Apple-tab-span" style="white-space:pre"> </span>Test <span class="Apple-tab-span" style="white-space:pre"> </span>Expansion <span class="Apple-tab-span" style="white-space:pre"> </span>Percent</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">-------------------------------------------------------------------------------</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">External/Nurbs/nurbs <span class="Apple-tab-span" style="white-space:pre"> </span> 2.2322<span class="Apple-tab-span" style="white-space:pre"> </span> 2.2131<span class="Apple-tab-span" style="white-space:pre"> </span> 0.99<span class="Apple-tab-span" style="white-space:pre"> </span> -1%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">External/Povray/povray <span class="Apple-tab-span" style="white-space:pre"> </span> 2.2638<span class="Apple-tab-span" style="white-space:pre"> </span> 2.2762<span class="Apple-tab-span" style="white-space:pre"> </span> 1.01<span class="Apple-tab-span" style="white-space:pre"> </span> +1%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">External/SPEC/CFP2000/177.mesa/177.mesa<span class="Apple-tab-span" style="white-space:pre"> </span> 1.6675<span class="Apple-tab-span" style="white-space:pre"> </span> 1.6828<span class="Apple-tab-span" style="white-space:pre"> </span> 1.01<span class="Apple-tab-span" style="white-space:pre"> </span> +1%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">External/SPEC/CFP2000/188.ammp/188.ammp<span class="Apple-tab-span" style="white-space:pre"> </span> 10.9309<span class="Apple-tab-span" style="white-space:pre"> </span> 11.1191<span class="Apple-tab-span" style="white-space:pre"> </span> 1.02<span class="Apple-tab-span" style="white-space:pre"> </span> +2%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">External/SPEC/CFP2006/433.milc/433.milc<span class="Apple-tab-span" style="white-space:pre"> </span> 6.9214<span class="Apple-tab-span" style="white-space:pre"> </span> 7.1696<span class="Apple-tab-span" style="white-space:pre"> </span> 1.04<span class="Apple-tab-span" style="white-space:pre"> </span> +4%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">External/SPEC/CINT2000/164.gzip/164.gzi<span class="Apple-tab-span" style="white-space:pre"> </span> 8.5327<span class="Apple-tab-span" style="white-space:pre"> </span> 8.8114<span class="Apple-tab-span" style="white-space:pre"> </span> 1.03<span class="Apple-tab-span" style="white-space:pre"> </span> +3%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">External/SPEC/CINT2000/186.crafty/186.c<span class="Apple-tab-span" style="white-space:pre"> </span> 4.1266<span class="Apple-tab-span" style="white-space:pre"> </span> 4.16<span class="Apple-tab-span" style="white-space:pre"> </span> 1.01<span class="Apple-tab-span" style="white-space:pre"> </span> +1%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">External/SPEC/CINT2000/253.perlbmk/253.<span class="Apple-tab-span" style="white-space:pre"> </span> 5.6991<span class="Apple-tab-span" style="white-space:pre"> </span> 5.7309<span class="Apple-tab-span" style="white-space:pre"> </span> 1.01<span class="Apple-tab-span" style="white-space:pre"> </span> +1%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">External/SPEC/CINT2000/256.bzip2/256.bz<span class="Apple-tab-span" style="white-space:pre"> </span> 6.7917<span class="Apple-tab-span" style="white-space:pre"> </span> 6.8763<span class="Apple-tab-span" style="white-space:pre"> </span> 1.01<span class="Apple-tab-span" style="white-space:pre"> </span> +1%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">External/SPEC/CINT2006/400.perlbench/40<span class="Apple-tab-span" style="white-space:pre"> </span> 6.243<span class="Apple-tab-span" style="white-space:pre"> </span> 6.1464<span class="Apple-tab-span" style="white-space:pre"> </span> 0.98<span class="Apple-tab-span" style="white-space:pre"> </span> -2%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">External/SPEC/CINT2006/401.bzip2/401.bz<span class="Apple-tab-span" style="white-space:pre"> </span> 2.095<span class="Apple-tab-span" style="white-space:pre"> </span> 2.0588<span class="Apple-tab-span" style="white-space:pre"> </span> 0.98<span class="Apple-tab-span" style="white-space:pre"> </span> -2%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">External/SPEC/CINT2006/462.libquantum/4<span class="Apple-tab-span" style="white-space:pre"> </span> 1.2<span class="Apple-tab-span" style="white-space:pre"> </span> 1.2108<span class="Apple-tab-span" style="white-space:pre"> </span> 1.01<span class="Apple-tab-span" style="white-space:pre"> </span> +1%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">MultiSource/Applications/SIBsim4/SIBsim<span class="Apple-tab-span" style="white-space:pre"> </span> 2.4547<span class="Apple-tab-span" style="white-space:pre"> </span> 2.5129<span class="Apple-tab-span" style="white-space:pre"> </span> 1.02<span class="Apple-tab-span" style="white-space:pre"> </span> +2%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">MultiSource/Benchmarks/Bullet/bullet <span class="Apple-tab-span" style="white-space:pre"> </span> 4.1687<span class="Apple-tab-span" style="white-space:pre"> </span> 4.0882<span class="Apple-tab-span" style="white-space:pre"> </span> 0.98<span class="Apple-tab-span" style="white-space:pre"> </span> -2%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">MultiSource/Benchmarks/TSVC/LinearDepen<span class="Apple-tab-span" style="white-space:pre"> </span> 3.0389<span class="Apple-tab-span" style="white-space:pre"> </span> 3.0566<span class="Apple-tab-span" style="white-space:pre"> </span> 1.01<span class="Apple-tab-span" style="white-space:pre"> </span> +1%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">MultiSource/Benchmarks/TSVC/LinearDepen<span class="Apple-tab-span" style="white-space:pre"> </span> 2.1298<span class="Apple-tab-span" style="white-space:pre"> </span> 2.1997<span class="Apple-tab-span" style="white-space:pre"> </span> 1.03<span class="Apple-tab-span" style="white-space:pre"> </span> +3%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">MultiSource/Benchmarks/TSVC/LoopRerolli<span class="Apple-tab-span" style="white-space:pre"> </span> 2.6458<span class="Apple-tab-span" style="white-space:pre"> </span> 2.5552<span class="Apple-tab-span" style="white-space:pre"> </span> 0.97<span class="Apple-tab-span" style="white-space:pre"> </span> -3%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">MultiSource/Benchmarks/TSVC/Symbolics-f<span class="Apple-tab-span" style="white-space:pre"> </span> 1.6243<span class="Apple-tab-span" style="white-space:pre"> </span> 1.6612<span class="Apple-tab-span" style="white-space:pre"> </span> 1.02<span class="Apple-tab-span" style="white-space:pre"> </span> +2%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">MultiSource/Benchmarks/mafft/pairlocala<span class="Apple-tab-span" style="white-space:pre"> </span> 23.8979<span class="Apple-tab-span" style="white-space:pre"> </span> 24.0547<span class="Apple-tab-span" style="white-space:pre"> </span> 1.01<span class="Apple-tab-span" style="white-space:pre"> </span> +1%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">SingleSource/Benchmarks/Misc/oourafft <span class="Apple-tab-span" style="white-space:pre"> </span> 3.0374<span class="Apple-tab-span" style="white-space:pre"> </span> 3.1846<span class="Apple-tab-span" style="white-space:pre"> </span> 1.05<span class="Apple-tab-span" style="white-space:pre"> </span> +5%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">SingleSource/Benchmarks/SmallPT/smallpt<span class="Apple-tab-span" style="white-space:pre"> </span> 6.5533<span class="Apple-tab-span" style="white-space:pre"> </span> 6.6683<span class="Apple-tab-span" style="white-space:pre"> </span> 1.02<span class="Apple-tab-span" style="white-space:pre"> </span> +2%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">-------------------------------------------------------------------------------</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">Min (21) <span class="Apple-tab-span" style="white-space:pre"> </span> -<span class="Apple-tab-span" style="white-space:pre"> </span> -<span class="Apple-tab-span" style="white-space:pre"> </span> 0.97<span class="Apple-tab-span" style="white-space:pre"> </span> -</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">-------------------------------------------------------------------------------</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">Max (21) <span class="Apple-tab-span" style="white-space:pre"> </span> -<span class="Apple-tab-span" style="white-space:pre"> </span> -<span class="Apple-tab-span" style="white-space:pre"> </span> 1.05<span class="Apple-tab-span" style="white-space:pre"> </span> -</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">-------------------------------------------------------------------------------</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">Sum (21) <span class="Apple-tab-span" style="white-space:pre"> </span> 108<span class="Apple-tab-span" style="white-space:pre"> </span> 109<span class="Apple-tab-span" style="white-space:pre"> </span> 1.01<span class="Apple-tab-span" style="white-space:pre"> </span> -1%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">-------------------------------------------------------------------------------</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">A.Mean (21) <span class="Apple-tab-span" style="white-space:pre"> </span> -<span class="Apple-tab-span" style="white-space:pre"> </span> -<span class="Apple-tab-span" style="white-space:pre"> </span> 1.01<span class="Apple-tab-span" style="white-space:pre"> </span> +1%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">-------------------------------------------------------------------------------</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">G.Mean 2 (21) <span class="Apple-tab-span" style="white-space:pre"> </span> -<span class="Apple-tab-span" style="white-space:pre"> </span> -<span class="Apple-tab-span" style="white-space:pre"> </span> 1.01<span class="Apple-tab-span" style="white-space:pre"> </span> +1%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">-------------------------------------------------------------------------------</div></div><div class=""><br class=""></div><div class="">Thanks,</div><div class="">-Quentin<br class=""><div><blockquote type="cite" class=""><div class="">On Sep 9, 2014, at 6:13 AM, Andrea Di Biagio <<a href="mailto:andrea.dibiagio@gmail.com" class="">andrea.dibiagio@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class="">Hi Chandler,<br class=""><br class="">Thanks for fixing the problem with the insertps mask.<br class=""><br class="">Generally the new shuffle lowering looks promising, however there are<br class="">some cases where the codegen is now worse causing runtime performance<br class="">regressions in some of our internal codebase.<br class=""><br class="">You have already mentioned how the new shuffle lowering is missing<br class="">some features; for example, you explicitly said that we currently lack<br class="">of SSE4.1 blend support. Unfortunately, this seems to be one of the<br class="">main reasons for the slowdown we are seeing.<br class=""><br class="">Here is a list of what we found so far that we think is causing most<br class="">of the slowdown:<br class="">1) shufps is always emitted in cases where we could emit a single<br class="">blendps; in these cases, blendps is preferable because it has better<br class="">reciprocal throughput (this is true on all modern Intel and AMD cpus).<br class=""><br class="">Things get worse when it comes to lowering shuffles where the shuffle<br class="">mask indices refer to elements from both input vectors in each lane.<br class="">For example, a shuffle mask of <0,5,2,7> could be easily lowered into<br class="">a single blendps; instead it gets lowered into two shufps<br class="">instructions.<br class=""><br class="">Example:<br class="">;;;<br class="">define <4 x float> @foo(<4 x float> %A, <4 x float> %B) {<br class=""> %1 = shufflevector <4 x float> %A, <4 x float> %B, <4 x i32> <i32 0,<br class="">i32 5, i32 2, i32 7><br class=""> ret <4 x float> %1<br class="">}<br class="">;;;<br class=""><br class="">llc (-mcpu=corei7-avx):<br class=""> vblendps $10, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0],xmm1[5],xmm0[2],xmm1[7]<br class=""><br class="">llc -x86-experimental-vector-shuffle-lowering (-mcpu=corei7-avx):<br class=""> vshufps $-40, %xmm0, %xmm1, %xmm0 # xmm0 = xmm1[0,2],xmm0[1,3]<br class=""> vshufps $-40, %xmm0, %xmm0, %xmm0 # xmm0[0,2,1,3]<br class=""><br class=""><br class="">2) On SSE4.1, we should try not to emit an insertps if the shuffle<br class="">mask identifies a blend. At the moment the new lowering logic is very<br class="">aggressively emitting insertps instead of cheaper blendps.<br class=""><br class="">Example:<br class="">;;;<br class="">define <4 x float> @bar(<4 x float> %A, <4 x float> %B) {<br class=""> %1 = shufflevector <4 x float> %A, <4 x float> %B, <4 x i32> <i32 4,<br class="">i32 5, i32 2, i32 7><br class=""> ret <4 x float> %1<br class="">}<br class="">;;;<br class=""><br class="">llc (-mcpu=corei7-avx):<br class=""> vblendps $11, %xmm0, %xmm1, %xmm0 # xmm0 = xmm0[0,1],xmm1[2],xmm0[3]<br class=""><br class="">llc -x86-experimental-vector-shuffle-lowering (-mcpu=corei7-avx):<br class=""> vinsertps $-96, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1],xmm1[2],xmm0[3]<br class=""><br class=""><br class="">3) When a shuffle performs an insert at index 0 we always generate an<br class="">insertps, while a movss would do a better job.<br class="">;;;<br class="">define <4 x float> @baz(<4 x float> %A, <4 x float> %B) {<br class=""> %1 = shufflevector <4 x float> %A, <4 x float> %B, <4 x i32> <i32 4,<br class="">i32 1, i32 2, i32 3><br class=""> ret <4 x float> %1<br class="">}<br class="">;;;<br class=""><br class="">llc (-mcpu=corei7-avx):<br class=""> vmovss %xmm1, %xmm0, %xmm0<br class=""><br class="">llc -x86-experimental-vector-shuffle-lowering (-mcpu=corei7-avx):<br class=""> vinsertps $0, %xmm1, %xmm0, %xmm0 # xmm0 = xmm1[0],xmm0[1,2,3]<br class=""><br class="">I hope this is useful. We would be happy to contribute patches to<br class="">improve some of the above cases, but we obviously know that this is<br class="">still a work in progress, so we don't want to introduce conflicts with<br class="">your work. Please let us know what you think.<br class=""><br class="">We will keep looking at this and follow up with any further findings.<br class=""><br class="">Thanks,<br class="">Andrea Di Biagio<br class="">SN Systems - Sony Computer Entertainment Inc.<br class=""><br class="">On Mon, Sep 8, 2014 at 6:08 PM, Quentin Colombet <<a href="mailto:qcolombet@apple.com" class="">qcolombet@apple.com</a>> wrote:<br class=""><blockquote type="cite" class="">Hi Chandler,<br class=""><br class="">Forget about that I said.<br class="">It seems I have some weird dependencies in my built system.<br class="">My binaries are out-of-sync.<br class=""><br class="">Let me sort that out, this is likely the problem is already fixed, and I can<br class="">resume the measurements.<br class=""><br class="">Sorry for the noise.<br class=""><br class="">Q.<br class=""><br class="">On Sep 8, 2014, at 9:32 AM, Quentin Colombet <<a href="mailto:qcolombet@apple.com" class="">qcolombet@apple.com</a>> wrote:<br class=""><br class=""><br class="">On Sep 7, 2014, at 8:49 PM, Quentin Colombet <<a href="mailto:qcolombet@apple.com" class="">qcolombet@apple.com</a>> wrote:<br class=""><br class="">Sure,<br class=""><br class="">Here is the command line:<br class="">clang -cc1 -triple x86_64-apple-macosx -S -disable-free<br class="">-disable-llvm-verifier -main-file-name tmp.i -mrelocation-model pic<br class="">-pic-level 2 -mdisable-fp-elim -masm-verbose -munwind-tables -target-cpu<br class="">core-avx-i -O3 -ferror-limit 19 -fmessage-length 114 -stack-protector 1<br class="">-mstackrealign -fblocks -fencode-extended-block-signature<br class="">-fmax-type-align=16 -fdiagnostics-show-option -fcolor-diagnostics<br class="">-vectorize-loops -vectorize-slp -mllvm<br class="">-x86-experimental-vector-shuffle-lowering=true -o tmp.s -x cpp-output tmp.i<br class=""><br class="">This was with trunk 215249.<br class=""><br class="">I meant, r217281.<br class=""><br class=""><br class="">Thanks,<br class="">-Quentin<br class=""><br class=""><tmp.i><br class=""><br class="">On Sep 6, 2014, at 4:27 PM, Chandler Carruth <<a href="mailto:chandlerc@gmail.com" class="">chandlerc@gmail.com</a>> wrote:<br class=""><br class="">I've run the SingleSource test suite for core-avx-i and have no failures<br class="">here so a preprocessed file + commandline would be very useful if this<br class="">reproduces for you still.<br class=""><br class="">On Sat, Sep 6, 2014 at 4:07 PM, Chandler Carruth <<a href="mailto:chandlerc@gmail.com" class="">chandlerc@gmail.com</a>><br class="">wrote:<br class=""><blockquote type="cite" class=""><br class="">I'm having trouble reproducing this. I'm trying to get LNT to actually<br class="">run, but manually compiling the given source file didn't reproduce it for<br class="">me.<br class=""><br class="">It might have been fixed recently (although I'd be surprised if so), but<br class="">it would help to get the actual command line for which compiling this file<br class="">in the test suite failed.<br class=""><br class="">-Chandler<br class=""><br class="">On Fri, Sep 5, 2014 at 4:36 PM, Quentin Colombet <<a href="mailto:qcolombet@apple.com" class="">qcolombet@apple.com</a>><br class="">wrote:<br class=""><blockquote type="cite" class=""><br class="">Hi Chandler,<br class=""><br class="">While doing the performance measurement on a Ivy Bridge, I ran into<br class="">compile time errors.<br class=""><br class="">I saw a bunch of “cannot select" in the LLVM test suite with<br class="">-march=core-avx-i.<br class="">E.g., SingleSource/UnitTests/Vector/SSE/sse.isamax.c is failing at O3<br class="">-march=core-avx-i with:<br class="">fatal error: error in backend: Cannot select: 0x7f91b99a6420: v4i32 =<br class="">bitcast 0x7f91b99b0e10 [ORD=3] [ID=27]<br class=""> 0x7f91b99b0e10: v4i64 = insert_subvector 0x7f91b99a7210,<br class="">0x7f91b99a6d68, 0x7f91b99ace70 [ORD=2] [ID=25]<br class=""> 0x7f91b99a7210: v4i64 = undef [ID=15]<br class=""> 0x7f91b99a6d68: v2i64 = scalar_to_vector 0x7f91b99ab840 [ORD=2]<br class="">[ID=23]<br class=""> 0x7f91b99ab840: i64 = AssertZext 0x7f91b99acc60, 0x7f91b99ac738<br class="">[ORD=2] [ID=20]<br class=""> 0x7f91b99acc60: i64,ch = CopyFromReg 0x7f91b8d52820,<br class="">0x7f91b99a3a10 [ORD=2] [ID=16]<br class=""> 0x7f91b99a3a10: i64 = Register %vreg68 [ID=1]<br class=""> 0x7f91b99ace70: i64 = Constant<0> [ID=3]<br class="">In function: isamax0<br class="">clang: error: clang frontend command failed with exit code 70 (use -v to<br class="">see invocation)<br class="">clang version 3.6.0 (215249)<br class="">Target: x86_64-apple-darwin14.0.0<br class=""><br class="">For some reason, I cannot reproduce the problem with the test case that<br class="">clang gives me using -emit-llvm. Since the source is public, I guess you can<br class="">try to reproduce on your side.<br class="">Indeed, if you run the test-suite with -march=core-avx-i you’ll likely<br class="">see all those failures.<br class=""><br class="">Let me know if you cannot and I’ll try harder to produce a test case.<br class=""><br class="">Note: This is the same failure all over the place, i.e., cannot select a<br class="">bit cast from various types to v4i32 or v4i64.<br class=""><br class="">Thanks,<br class="">-Quentin<br class=""><br class=""><br class="">On Sep 5, 2014, at 11:09 AM, Robert Lougher <rob.lougher@<br class=""><br class=""><a href="http://gmail.com" class="">gmail.com</a>> wrote:<br class=""><br class="">Hi Chandler,<br class=""><br class="">On 5 September 2014 17:38, Chandler Carruth <<a href="mailto:chandlerc@gmail.com" class="">chandlerc@gmail.com</a>> wrote:<br class=""><br class=""><br class="">On Fri, Sep 5, 2014 at 9:32 AM, Robert Lougher <<a href="mailto:rob.lougher@gmail.com" class="">rob.lougher@gmail.com</a>><br class="">wrote:<br class=""><br class=""><br class="">Unfortunately, another team, while doing internal testing has seen the<br class="">new path generating illegal insertps masks. A sample here:<br class=""><br class=""> vinsertps $256, %xmm0, %xmm13, %xmm4 # xmm4 = xmm0[0],xmm13[1,2,3]<br class=""> vinsertps $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3]<br class=""> vinsertps $256, %xmm13, %xmm1, %xmm7 # xmm7 = xmm13[0],xmm1[1,2,3]<br class=""> vinsertps $416, %xmm1, %xmm4, %xmm14 # xmm14 =<br class="">xmm4[0,1],xmm1[2],xmm4[3]<br class=""> vinsertps $416, %xmm13, %xmm6, %xmm13 # xmm13 =<br class="">xmm6[0,1],xmm13[2],xmm6[3]<br class=""> vinsertps $416, %xmm0, %xmm7, %xmm0 # xmm0 =<br class="">xmm7[0,1],xmm0[2],xmm7[3]<br class=""><br class="">We'll continue to look into this and do additional testing.<br class=""><br class=""><br class=""><br class="">Interesting. Let me know if you get a test case. The insertps code path<br class="">was<br class="">added recently though and has been much less well tested. I'll start fuzz<br class="">testing it and should hopefully uncover the bug.<br class=""><br class=""><br class="">Here's two small test cases. Hope they are of use.<br class=""><br class="">Thanks,<br class="">Rob.<br class=""><br class="">------<br class="">define <4 x float> @test(<4 x float> %xyzw, <4 x float> %abcd) {<br class=""> %1 = extractelement <4 x float> %xyzw, i32 0<br class=""> %2 = insertelement <4 x float> undef, float %1, i32 0<br class=""> %3 = insertelement <4 x float> %2, float 0.000000e+00, i32 1<br class=""> %4 = shufflevector <4 x float> %3, <4 x float> %xyzw, <4 x i32> <i32<br class="">0, i32 1, i32 6, i32 undef><br class=""> %5 = shufflevector <4 x float> %4, <4 x float> %abcd, <4 x i32> <i32<br class="">0, i32 1, i32 2, i32 4><br class=""> ret <4 x float> %5<br class="">}<br class=""><br class="">define <4 x float> @test2(<4 x float> %xyzw, <4 x float> %abcd) {<br class=""> %1 = shufflevector <4 x float> %xyzw, <4 x float> %abcd, <4 x i32><br class=""><i32 0, i32 undef, i32 2, i32 4><br class=""> %2 = shufflevector <4 x float> <float undef, float 0.000000e+00,<br class="">float undef, float undef>, <4 x float> %1, <4 x i32> <i32 4, i32 1,<br class="">i32 6, i32 7><br class=""> ret <4 x float> %2<br class="">}<br class=""><br class=""><br class="">llc -march=x86-64 -mattr=+avx test.ll -o -<br class=""><br class="">test: # @test<br class=""> vxorps %xmm2, %xmm2, %xmm2<br class=""> vmovss %xmm0, %xmm2, %xmm2<br class=""> vblendps $4, %xmm0, %xmm2, %xmm0 # xmm0 = xmm2[0,1],xmm0[2],xmm2[3]<br class=""> vinsertps $48, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0]<br class=""> retl<br class=""><br class="">test2: # @test2<br class=""> vinsertps $48, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0]<br class=""> vxorps %xmm1, %xmm1, %xmm1<br class=""> vblendps $13, %xmm0, %xmm1, %xmm0 # xmm0 =<br class="">xmm0[0],xmm1[1],xmm0[2,3]<br class=""> retl<br class=""><br class="">llc -march=x86-64 -mattr=+avx<br class="">-x86-experimental-vector-shuffle-lowering test.ll -o -<br class=""><br class="">test: # @test<br class=""> vinsertps $270, %xmm0, %xmm0, %xmm2 # xmm2 = xmm0[0],zero,zero,zero<br class=""> vinsertps $416, %xmm0, %xmm2, %xmm0 # xmm0 =<br class="">xmm2[0,1],xmm0[2],xmm2[3]<br class=""> vinsertps $304, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0]<br class=""> retl<br class=""><br class="">test2: # @test2<br class=""> vinsertps $304, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0]<br class=""> vxorps %xmm1, %xmm1, %xmm1<br class=""> vinsertps $336, %xmm1, %xmm0, %xmm0 # xmm0 =<br class="">xmm0[0],xmm1[1],xmm0[2,3]<br class=""> retl<br class="">_______________________________________________<br class="">LLVM Developers mailing list<br class=""><a href="mailto:LLVMdev@cs.uiuc.edu" class="">LLVMdev@cs.uiuc.edu</a> <a href="http://llvm.cs.uiuc.edu" class="">http://llvm.cs.uiuc.edu</a><br class=""><a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" class="">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br class=""><br class=""><br class=""><br class="">_______________________________________________<br class="">LLVM Developers mailing list<br class="">LLVMdev@cs.uiuc.edu http://llvm.cs.uiuc.edu<br class="">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev<br class=""><br class=""></blockquote><br class=""></blockquote><br class=""><br class="">_______________________________________________<br class="">LLVM Developers mailing list<br class=""><a href="mailto:LLVMdev@cs.uiuc.edu" class="">LLVMdev@cs.uiuc.edu</a> <a href="http://llvm.cs.uiuc.edu" class="">http://llvm.cs.uiuc.edu</a><br class=""><a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" class="">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br class=""><br class=""><br class="">_______________________________________________<br class="">LLVM Developers mailing list<br class="">LLVMdev@cs.uiuc.edu http://llvm.cs.uiuc.edu<br class="">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev<br class=""><br class=""><br class=""><br class="">_______________________________________________<br class="">LLVM Developers mailing list<br class="">LLVMdev@cs.uiuc.edu http://llvm.cs.uiuc.edu<br class="">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev<br class=""><br class=""></blockquote></div></blockquote></div><br class=""></div></body></html>