<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">Hi Chandler,<div class=""><br class=""></div><div class="">I had observed some improvements and regressions with the new lowering.</div><div class=""><br class=""></div><div class="">Here are the numbers for an Ivy Bridge machine fixed at 2900MHz.</div><div class=""><br class=""></div><div class="">I’ll look into the regressions to provide test cases.</div><div class=""><br class=""></div><div class="">** Numbers **</div><div class=""><br class=""></div><div class="">Smaller is better. Only reported tests that run for at least one second.</div><div class="">Reference is the default lowering, Test is the new lowering.</div><div class="">The Os numbers are overall neutral, but the O3 numbers mainly expose regressions.</div><div class=""><br class=""></div><div class="">Note: I can attach the raw numbers if you want.</div><div class=""><br class=""></div><div class="">* Os *</div><div class=""><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">Benchmark_ID    <span class="Apple-tab-span" style="white-space:pre"> </span>Reference<span class="Apple-tab-span" style="white-space:pre">   </span>Test    <span class="Apple-tab-span" style="white-space:pre">  </span>Expansion <span class="Apple-tab-span" style="white-space:pre">  </span>Percent</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">-------------------------------------------------------------------------------</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">External/Nurbs/nurbs                   <span class="Apple-tab-span" style="white-space:pre"> </span>       2.3302<span class="Apple-tab-span" style="white-space:pre">        </span>       2.3122<span class="Apple-tab-span" style="white-space:pre">        </span>    0.99<span class="Apple-tab-span" style="white-space:pre">  </span>    -1%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">External/SPEC/CFP2000/183.equake/183.eq<span class="Apple-tab-span" style="white-space:pre">      </span>       3.2606<span class="Apple-tab-span" style="white-space:pre">        </span>       3.2419<span class="Apple-tab-span" style="white-space:pre">        </span>    0.99<span class="Apple-tab-span" style="white-space:pre">  </span>    -1%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">External/SPEC/CFP2006/447.dealII/<a href="http://447.de" class="">447.de</a><span class="Apple-tab-span" style="white-space:pre"> </span>      16.4638<span class="Apple-tab-span" style="white-space:pre">        </span>      16.1313<span class="Apple-tab-span" style="white-space:pre">        </span>    0.98<span class="Apple-tab-span" style="white-space:pre">  </span>    -2%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">External/SPEC/CFP2006/470.lbm/470.lbm  <span class="Apple-tab-span" style="white-space:pre"> </span>       2.0159<span class="Apple-tab-span" style="white-space:pre">        </span>       1.9931<span class="Apple-tab-span" style="white-space:pre">        </span>    0.99<span class="Apple-tab-span" style="white-space:pre">  </span>    -1%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">External/SPEC/CINT2000/164.gzip/164.gzi<span class="Apple-tab-span" style="white-space:pre">      </span>       8.7611<span class="Apple-tab-span" style="white-space:pre">        </span>       8.6981<span class="Apple-tab-span" style="white-space:pre">        </span>    0.99<span class="Apple-tab-span" style="white-space:pre">  </span>    -1%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">External/SPEC/CINT2006/456.hmmer/456.hm<span class="Apple-tab-span" style="white-space:pre">      </span>       2.5674<span class="Apple-tab-span" style="white-space:pre">        </span>       2.5819<span class="Apple-tab-span" style="white-space:pre">        </span>    1.01<span class="Apple-tab-span" style="white-space:pre">  </span>    +1%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">External/SPEC/CINT2006/462.libquantum/4<span class="Apple-tab-span" style="white-space:pre">      </span>       1.2924<span class="Apple-tab-span" style="white-space:pre">        </span>        1.347<span class="Apple-tab-span" style="white-space:pre">   </span>    1.04<span class="Apple-tab-span" style="white-space:pre">  </span>    +4%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">MultiSource/Benchmarks/TSVC/CrossingThr<span class="Apple-tab-span" style="white-space:pre">      </span>       2.4703<span class="Apple-tab-span" style="white-space:pre">        </span>       2.4852<span class="Apple-tab-span" style="white-space:pre">        </span>    1.01<span class="Apple-tab-span" style="white-space:pre">  </span>    +1%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">MultiSource/Benchmarks/TSVC/LoopRerolli<span class="Apple-tab-span" style="white-space:pre">      </span>       2.6611<span class="Apple-tab-span" style="white-space:pre">        </span>       2.5668<span class="Apple-tab-span" style="white-space:pre">        </span>    0.96<span class="Apple-tab-span" style="white-space:pre">  </span>    -4%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">MultiSource/Benchmarks/mafft/pairlocala<span class="Apple-tab-span" style="white-space:pre">      </span>       24.676<span class="Apple-tab-span" style="white-space:pre">        </span>      24.5372<span class="Apple-tab-span" style="white-space:pre">        </span>    0.99<span class="Apple-tab-span" style="white-space:pre">  </span>    -1%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">SingleSource/Benchmarks/Adobe-C++/simpl<span class="Apple-tab-span" style="white-space:pre">      </span>       1.0579<span class="Apple-tab-span" style="white-space:pre">        </span>       1.1048<span class="Apple-tab-span" style="white-space:pre">        </span>    1.04<span class="Apple-tab-span" style="white-space:pre">  </span>    +4%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">SingleSource/Benchmarks/Linpack/linpack<span class="Apple-tab-span" style="white-space:pre">      </span>       4.2817<span class="Apple-tab-span" style="white-space:pre">        </span>       4.3298<span class="Apple-tab-span" style="white-space:pre">        </span>    1.01<span class="Apple-tab-span" style="white-space:pre">  </span>    +1%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">SingleSource/Benchmarks/Misc-C++/stepan<span class="Apple-tab-span" style="white-space:pre">      </span>       4.1821<span class="Apple-tab-span" style="white-space:pre">        </span>        4.226<span class="Apple-tab-span" style="white-space:pre">   </span>    1.01<span class="Apple-tab-span" style="white-space:pre">  </span>    +1%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">SingleSource/Benchmarks/Misc/oourafft  <span class="Apple-tab-span" style="white-space:pre"> </span>       3.0305<span class="Apple-tab-span" style="white-space:pre">        </span>       3.1777<span class="Apple-tab-span" style="white-space:pre">        </span>    1.05<span class="Apple-tab-span" style="white-space:pre">  </span>    +5%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">-------------------------------------------------------------------------------</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">Min (14)                               <span class="Apple-tab-span" style="white-space:pre">      </span>            -<span class="Apple-tab-span" style="white-space:pre"> </span>            -<span class="Apple-tab-span" style="white-space:pre"> </span>    0.96<span class="Apple-tab-span" style="white-space:pre">  </span>      -</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">-------------------------------------------------------------------------------</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">Max (14)                               <span class="Apple-tab-span" style="white-space:pre"> </span>            -<span class="Apple-tab-span" style="white-space:pre"> </span>            -<span class="Apple-tab-span" style="white-space:pre"> </span>    1.05<span class="Apple-tab-span" style="white-space:pre">  </span>      -</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">-------------------------------------------------------------------------------</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">Sum (14)                               <span class="Apple-tab-span" style="white-space:pre">    </span>           79<span class="Apple-tab-span" style="white-space:pre">      </span>           79<span class="Apple-tab-span" style="white-space:pre">      </span>       1<span class="Apple-tab-span" style="white-space:pre">     </span>    +0%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">-------------------------------------------------------------------------------</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">A.Mean (14)                            <span class="Apple-tab-span" style="white-space:pre">   </span>            -<span class="Apple-tab-span" style="white-space:pre"> </span>            -<span class="Apple-tab-span" style="white-space:pre"> </span>    1.01<span class="Apple-tab-span" style="white-space:pre">  </span>    +1%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">-------------------------------------------------------------------------------</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">G.Mean 2 (14)                          <span class="Apple-tab-span" style="white-space:pre">        </span>            -<span class="Apple-tab-span" style="white-space:pre"> </span>            -<span class="Apple-tab-span" style="white-space:pre"> </span>    1.01<span class="Apple-tab-span" style="white-space:pre">  </span>    +1%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">-------------------------------------------------------------------------------</div></div><div class=""><br class=""></div><div class="">* O3 *</div><div class=""><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">Benchmark_ID    <span class="Apple-tab-span" style="white-space:pre">     </span>Reference<span class="Apple-tab-span" style="white-space:pre">   </span>Test    <span class="Apple-tab-span" style="white-space:pre">  </span>Expansion <span class="Apple-tab-span" style="white-space:pre">  </span>Percent</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">-------------------------------------------------------------------------------</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">External/Nurbs/nurbs                   <span class="Apple-tab-span" style="white-space:pre"> </span>       2.2322<span class="Apple-tab-span" style="white-space:pre">        </span>       2.2131<span class="Apple-tab-span" style="white-space:pre">        </span>    0.99<span class="Apple-tab-span" style="white-space:pre">  </span>    -1%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">External/Povray/povray                 <span class="Apple-tab-span" style="white-space:pre"> </span>       2.2638<span class="Apple-tab-span" style="white-space:pre">        </span>       2.2762<span class="Apple-tab-span" style="white-space:pre">        </span>    1.01<span class="Apple-tab-span" style="white-space:pre">  </span>    +1%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">External/SPEC/CFP2000/177.mesa/177.mesa<span class="Apple-tab-span" style="white-space:pre">      </span>       1.6675<span class="Apple-tab-span" style="white-space:pre">        </span>       1.6828<span class="Apple-tab-span" style="white-space:pre">        </span>    1.01<span class="Apple-tab-span" style="white-space:pre">  </span>    +1%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">External/SPEC/CFP2000/188.ammp/188.ammp<span class="Apple-tab-span" style="white-space:pre">      </span>      10.9309<span class="Apple-tab-span" style="white-space:pre">        </span>      11.1191<span class="Apple-tab-span" style="white-space:pre">        </span>    1.02<span class="Apple-tab-span" style="white-space:pre">  </span>    +2%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">External/SPEC/CFP2006/433.milc/433.milc<span class="Apple-tab-span" style="white-space:pre">      </span>       6.9214<span class="Apple-tab-span" style="white-space:pre">        </span>       7.1696<span class="Apple-tab-span" style="white-space:pre">        </span>    1.04<span class="Apple-tab-span" style="white-space:pre">  </span>    +4%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">External/SPEC/CINT2000/164.gzip/164.gzi<span class="Apple-tab-span" style="white-space:pre">      </span>       8.5327<span class="Apple-tab-span" style="white-space:pre">        </span>       8.8114<span class="Apple-tab-span" style="white-space:pre">        </span>    1.03<span class="Apple-tab-span" style="white-space:pre">  </span>    +3%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">External/SPEC/CINT2000/186.crafty/186.c<span class="Apple-tab-span" style="white-space:pre">      </span>       4.1266<span class="Apple-tab-span" style="white-space:pre">        </span>         4.16<span class="Apple-tab-span" style="white-space:pre">   </span>    1.01<span class="Apple-tab-span" style="white-space:pre">  </span>    +1%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">External/SPEC/CINT2000/253.perlbmk/253.<span class="Apple-tab-span" style="white-space:pre">      </span>       5.6991<span class="Apple-tab-span" style="white-space:pre">        </span>       5.7309<span class="Apple-tab-span" style="white-space:pre">        </span>    1.01<span class="Apple-tab-span" style="white-space:pre">  </span>    +1%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">External/SPEC/CINT2000/256.bzip2/256.bz<span class="Apple-tab-span" style="white-space:pre">      </span>       6.7917<span class="Apple-tab-span" style="white-space:pre">        </span>       6.8763<span class="Apple-tab-span" style="white-space:pre">        </span>    1.01<span class="Apple-tab-span" style="white-space:pre">  </span>    +1%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">External/SPEC/CINT2006/400.perlbench/40<span class="Apple-tab-span" style="white-space:pre">      </span>        6.243<span class="Apple-tab-span" style="white-space:pre">   </span>       6.1464<span class="Apple-tab-span" style="white-space:pre">        </span>    0.98<span class="Apple-tab-span" style="white-space:pre">  </span>    -2%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">External/SPEC/CINT2006/401.bzip2/401.bz<span class="Apple-tab-span" style="white-space:pre">      </span>        2.095<span class="Apple-tab-span" style="white-space:pre">   </span>       2.0588<span class="Apple-tab-span" style="white-space:pre">        </span>    0.98<span class="Apple-tab-span" style="white-space:pre">  </span>    -2%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">External/SPEC/CINT2006/462.libquantum/4<span class="Apple-tab-span" style="white-space:pre">      </span>          1.2<span class="Apple-tab-span" style="white-space:pre">      </span>       1.2108<span class="Apple-tab-span" style="white-space:pre">        </span>    1.01<span class="Apple-tab-span" style="white-space:pre">  </span>    +1%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">MultiSource/Applications/SIBsim4/SIBsim<span class="Apple-tab-span" style="white-space:pre">      </span>       2.4547<span class="Apple-tab-span" style="white-space:pre">        </span>       2.5129<span class="Apple-tab-span" style="white-space:pre">        </span>    1.02<span class="Apple-tab-span" style="white-space:pre">  </span>    +2%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">MultiSource/Benchmarks/Bullet/bullet   <span class="Apple-tab-span" style="white-space:pre">    </span>       4.1687<span class="Apple-tab-span" style="white-space:pre">        </span>       4.0882<span class="Apple-tab-span" style="white-space:pre">        </span>    0.98<span class="Apple-tab-span" style="white-space:pre">  </span>    -2%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">MultiSource/Benchmarks/TSVC/LinearDepen<span class="Apple-tab-span" style="white-space:pre">      </span>       3.0389<span class="Apple-tab-span" style="white-space:pre">        </span>       3.0566<span class="Apple-tab-span" style="white-space:pre">        </span>    1.01<span class="Apple-tab-span" style="white-space:pre">  </span>    +1%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">MultiSource/Benchmarks/TSVC/LinearDepen<span class="Apple-tab-span" style="white-space:pre">      </span>       2.1298<span class="Apple-tab-span" style="white-space:pre">        </span>       2.1997<span class="Apple-tab-span" style="white-space:pre">        </span>    1.03<span class="Apple-tab-span" style="white-space:pre">  </span>    +3%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">MultiSource/Benchmarks/TSVC/LoopRerolli<span class="Apple-tab-span" style="white-space:pre">      </span>       2.6458<span class="Apple-tab-span" style="white-space:pre">        </span>       2.5552<span class="Apple-tab-span" style="white-space:pre">        </span>    0.97<span class="Apple-tab-span" style="white-space:pre">  </span>    -3%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">MultiSource/Benchmarks/TSVC/Symbolics-f<span class="Apple-tab-span" style="white-space:pre">      </span>       1.6243<span class="Apple-tab-span" style="white-space:pre">        </span>       1.6612<span class="Apple-tab-span" style="white-space:pre">        </span>    1.02<span class="Apple-tab-span" style="white-space:pre">  </span>    +2%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">MultiSource/Benchmarks/mafft/pairlocala<span class="Apple-tab-span" style="white-space:pre">      </span>      23.8979<span class="Apple-tab-span" style="white-space:pre">        </span>      24.0547<span class="Apple-tab-span" style="white-space:pre">        </span>    1.01<span class="Apple-tab-span" style="white-space:pre">  </span>    +1%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">SingleSource/Benchmarks/Misc/oourafft  <span class="Apple-tab-span" style="white-space:pre"> </span>       3.0374<span class="Apple-tab-span" style="white-space:pre">        </span>       3.1846<span class="Apple-tab-span" style="white-space:pre">        </span>    1.05<span class="Apple-tab-span" style="white-space:pre">  </span>    +5%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">SingleSource/Benchmarks/SmallPT/smallpt<span class="Apple-tab-span" style="white-space:pre">      </span>       6.5533<span class="Apple-tab-span" style="white-space:pre">        </span>       6.6683<span class="Apple-tab-span" style="white-space:pre">        </span>    1.02<span class="Apple-tab-span" style="white-space:pre">  </span>    +2%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">-------------------------------------------------------------------------------</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">Min (21)                               <span class="Apple-tab-span" style="white-space:pre">      </span>            -<span class="Apple-tab-span" style="white-space:pre"> </span>            -<span class="Apple-tab-span" style="white-space:pre"> </span>    0.97<span class="Apple-tab-span" style="white-space:pre">  </span>      -</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">-------------------------------------------------------------------------------</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">Max (21)                               <span class="Apple-tab-span" style="white-space:pre"> </span>            -<span class="Apple-tab-span" style="white-space:pre"> </span>            -<span class="Apple-tab-span" style="white-space:pre"> </span>    1.05<span class="Apple-tab-span" style="white-space:pre">  </span>      -</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">-------------------------------------------------------------------------------</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">Sum (21)                               <span class="Apple-tab-span" style="white-space:pre"> </span>          108<span class="Apple-tab-span" style="white-space:pre">      </span>          109<span class="Apple-tab-span" style="white-space:pre">      </span>    1.01<span class="Apple-tab-span" style="white-space:pre">  </span>    -1%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">-------------------------------------------------------------------------------</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">A.Mean (21)                            <span class="Apple-tab-span" style="white-space:pre">   </span>            -<span class="Apple-tab-span" style="white-space:pre"> </span>            -<span class="Apple-tab-span" style="white-space:pre"> </span>    1.01<span class="Apple-tab-span" style="white-space:pre">  </span>    +1%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">-------------------------------------------------------------------------------</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">G.Mean 2 (21)                          <span class="Apple-tab-span" style="white-space:pre">        </span>            -<span class="Apple-tab-span" style="white-space:pre"> </span>            -<span class="Apple-tab-span" style="white-space:pre"> </span>    1.01<span class="Apple-tab-span" style="white-space:pre">  </span>    +1%</div><div style="margin: 0px; font-size: 11px; font-family: Menlo;" class="">-------------------------------------------------------------------------------</div></div><div class=""><br class=""></div><div class="">Thanks,</div><div class="">-Quentin<br class=""><div><blockquote type="cite" class=""><div class="">On Sep 9, 2014, at 6:13 AM, Andrea Di Biagio <<a href="mailto:andrea.dibiagio@gmail.com" class="">andrea.dibiagio@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class="">Hi Chandler,<br class=""><br class="">Thanks for fixing the problem with the insertps mask.<br class=""><br class="">Generally the new shuffle lowering looks promising, however there are<br class="">some cases where the codegen is now worse causing runtime performance<br class="">regressions in some of our internal codebase.<br class=""><br class="">You have already mentioned how the new shuffle lowering is missing<br class="">some features; for example, you explicitly said that we currently lack<br class="">of SSE4.1 blend support. Unfortunately, this seems to be one of the<br class="">main reasons for the slowdown we are seeing.<br class=""><br class="">Here is a list of what we found so far that we think is causing most<br class="">of the slowdown:<br class="">1) shufps is always emitted in cases where we could emit a single<br class="">blendps; in these cases, blendps is preferable because it has better<br class="">reciprocal throughput (this is true on all modern Intel and AMD cpus).<br class=""><br class="">Things get worse when it comes to lowering shuffles where the shuffle<br class="">mask indices refer to elements from both input vectors in each lane.<br class="">For example, a shuffle mask of <0,5,2,7> could be easily lowered into<br class="">a single blendps; instead it gets lowered into two shufps<br class="">instructions.<br class=""><br class="">Example:<br class="">;;;<br class="">define <4 x float> @foo(<4 x float> %A, <4 x float> %B) {<br class="">  %1 = shufflevector <4 x float> %A, <4 x float> %B, <4 x i32> <i32 0,<br class="">i32 5, i32 2, i32 7><br class="">  ret <4 x float> %1<br class="">}<br class="">;;;<br class=""><br class="">llc (-mcpu=corei7-avx):<br class="">  vblendps  $10, %xmm1, %xmm0, %xmm0   # xmm0 = xmm0[0],xmm1[5],xmm0[2],xmm1[7]<br class=""><br class="">llc -x86-experimental-vector-shuffle-lowering (-mcpu=corei7-avx):<br class="">  vshufps $-40, %xmm0, %xmm1, %xmm0 # xmm0 = xmm1[0,2],xmm0[1,3]<br class="">  vshufps $-40, %xmm0, %xmm0, %xmm0 # xmm0[0,2,1,3]<br class=""><br class=""><br class="">2) On SSE4.1, we should try not to emit an insertps if the shuffle<br class="">mask identifies a blend. At the moment the new lowering logic is very<br class="">aggressively emitting insertps instead of cheaper blendps.<br class=""><br class="">Example:<br class="">;;;<br class="">define <4 x float> @bar(<4 x float> %A, <4 x float> %B) {<br class="">  %1 = shufflevector <4 x float> %A, <4 x float> %B, <4 x i32> <i32 4,<br class="">i32 5, i32 2, i32 7><br class="">  ret <4 x float> %1<br class="">}<br class="">;;;<br class=""><br class="">llc (-mcpu=corei7-avx):<br class="">  vblendps  $11, %xmm0, %xmm1, %xmm0   # xmm0 = xmm0[0,1],xmm1[2],xmm0[3]<br class=""><br class="">llc -x86-experimental-vector-shuffle-lowering (-mcpu=corei7-avx):<br class="">  vinsertps $-96, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1],xmm1[2],xmm0[3]<br class=""><br class=""><br class="">3) When a shuffle performs an insert at index 0 we always generate an<br class="">insertps, while a movss would do a better job.<br class="">;;;<br class="">define <4 x float> @baz(<4 x float> %A, <4 x float> %B) {<br class="">  %1 = shufflevector <4 x float> %A, <4 x float> %B, <4 x i32> <i32 4,<br class="">i32 1, i32 2, i32 3><br class="">  ret <4 x float> %1<br class="">}<br class="">;;;<br class=""><br class="">llc (-mcpu=corei7-avx):<br class="">  vmovss %xmm1, %xmm0, %xmm0<br class=""><br class="">llc -x86-experimental-vector-shuffle-lowering (-mcpu=corei7-avx):<br class="">  vinsertps $0, %xmm1, %xmm0, %xmm0 # xmm0 = xmm1[0],xmm0[1,2,3]<br class=""><br class="">I hope this is useful. We would be happy to contribute patches to<br class="">improve some of the above cases, but we obviously know that this is<br class="">still a work in progress, so we don't want to introduce conflicts with<br class="">your work. Please let us know what you think.<br class=""><br class="">We will keep looking at this and follow up with any further findings.<br class=""><br class="">Thanks,<br class="">Andrea Di Biagio<br class="">SN Systems - Sony Computer Entertainment Inc.<br class=""><br class="">On Mon, Sep 8, 2014 at 6:08 PM, Quentin Colombet <<a href="mailto:qcolombet@apple.com" class="">qcolombet@apple.com</a>> wrote:<br class=""><blockquote type="cite" class="">Hi Chandler,<br class=""><br class="">Forget about that I said.<br class="">It seems I have some weird dependencies in my built system.<br class="">My binaries are out-of-sync.<br class=""><br class="">Let me sort that out, this is likely the problem is already fixed, and I can<br class="">resume the measurements.<br class=""><br class="">Sorry for the noise.<br class=""><br class="">Q.<br class=""><br class="">On Sep 8, 2014, at 9:32 AM, Quentin Colombet <<a href="mailto:qcolombet@apple.com" class="">qcolombet@apple.com</a>> wrote:<br class=""><br class=""><br class="">On Sep 7, 2014, at 8:49 PM, Quentin Colombet <<a href="mailto:qcolombet@apple.com" class="">qcolombet@apple.com</a>> wrote:<br class=""><br class="">Sure,<br class=""><br class="">Here is the command line:<br class="">clang -cc1 -triple x86_64-apple-macosx -S -disable-free<br class="">-disable-llvm-verifier -main-file-name tmp.i -mrelocation-model pic<br class="">-pic-level 2 -mdisable-fp-elim -masm-verbose -munwind-tables -target-cpu<br class="">core-avx-i  -O3  -ferror-limit 19 -fmessage-length 114 -stack-protector 1<br class="">-mstackrealign -fblocks  -fencode-extended-block-signature<br class="">-fmax-type-align=16 -fdiagnostics-show-option -fcolor-diagnostics<br class="">-vectorize-loops -vectorize-slp -mllvm<br class="">-x86-experimental-vector-shuffle-lowering=true -o tmp.s -x cpp-output tmp.i<br class=""><br class="">This was with trunk 215249.<br class=""><br class="">I meant, r217281.<br class=""><br class=""><br class="">Thanks,<br class="">-Quentin<br class=""><br class=""><tmp.i><br class=""><br class="">On Sep 6, 2014, at 4:27 PM, Chandler Carruth <<a href="mailto:chandlerc@gmail.com" class="">chandlerc@gmail.com</a>> wrote:<br class=""><br class="">I've run the SingleSource test suite for core-avx-i and have no failures<br class="">here so a preprocessed file + commandline would be very useful if this<br class="">reproduces for you still.<br class=""><br class="">On Sat, Sep 6, 2014 at 4:07 PM, Chandler Carruth <<a href="mailto:chandlerc@gmail.com" class="">chandlerc@gmail.com</a>><br class="">wrote:<br class=""><blockquote type="cite" class=""><br class="">I'm having trouble reproducing this. I'm trying to get LNT to actually<br class="">run, but manually compiling the given source file didn't reproduce it for<br class="">me.<br class=""><br class="">It might have been fixed recently (although I'd be surprised if so), but<br class="">it would help to get the actual command line for which compiling this file<br class="">in the test suite failed.<br class=""><br class="">-Chandler<br class=""><br class="">On Fri, Sep 5, 2014 at 4:36 PM, Quentin Colombet <<a href="mailto:qcolombet@apple.com" class="">qcolombet@apple.com</a>><br class="">wrote:<br class=""><blockquote type="cite" class=""><br class="">Hi Chandler,<br class=""><br class="">While doing the performance measurement on a Ivy Bridge, I ran into<br class="">compile time errors.<br class=""><br class="">I saw a bunch of “cannot select" in the LLVM test suite with<br class="">-march=core-avx-i.<br class="">E.g., SingleSource/UnitTests/Vector/SSE/sse.isamax.c is failing at O3<br class="">-march=core-avx-i with:<br class="">fatal error: error in backend: Cannot select: 0x7f91b99a6420: v4i32 =<br class="">bitcast 0x7f91b99b0e10 [ORD=3] [ID=27]<br class="">  0x7f91b99b0e10: v4i64 = insert_subvector 0x7f91b99a7210,<br class="">0x7f91b99a6d68, 0x7f91b99ace70 [ORD=2] [ID=25]<br class="">    0x7f91b99a7210: v4i64 = undef [ID=15]<br class="">    0x7f91b99a6d68: v2i64 = scalar_to_vector 0x7f91b99ab840 [ORD=2]<br class="">[ID=23]<br class="">      0x7f91b99ab840: i64 = AssertZext 0x7f91b99acc60, 0x7f91b99ac738<br class="">[ORD=2] [ID=20]<br class="">        0x7f91b99acc60: i64,ch = CopyFromReg 0x7f91b8d52820,<br class="">0x7f91b99a3a10 [ORD=2] [ID=16]<br class="">          0x7f91b99a3a10: i64 = Register %vreg68 [ID=1]<br class="">    0x7f91b99ace70: i64 = Constant<0> [ID=3]<br class="">In function: isamax0<br class="">clang: error: clang frontend command failed with exit code 70 (use -v to<br class="">see invocation)<br class="">clang version 3.6.0 (215249)<br class="">Target: x86_64-apple-darwin14.0.0<br class=""><br class="">For some reason, I cannot reproduce the problem with the test case that<br class="">clang gives me using -emit-llvm. Since the source is public, I guess you can<br class="">try to reproduce on your side.<br class="">Indeed, if you run the test-suite with -march=core-avx-i you’ll likely<br class="">see all those failures.<br class=""><br class="">Let me know if you cannot and I’ll try harder to produce a test case.<br class=""><br class="">Note: This is the same failure all over the place, i.e., cannot select a<br class="">bit cast from various types to v4i32 or v4i64.<br class=""><br class="">Thanks,<br class="">-Quentin<br class=""><br class=""><br class="">On Sep 5, 2014, at 11:09 AM, Robert Lougher <rob.lougher@<br class=""><br class=""><a href="http://gmail.com" class="">gmail.com</a>> wrote:<br class=""><br class="">Hi Chandler,<br class=""><br class="">On 5 September 2014 17:38, Chandler Carruth <<a href="mailto:chandlerc@gmail.com" class="">chandlerc@gmail.com</a>> wrote:<br class=""><br class=""><br class="">On Fri, Sep 5, 2014 at 9:32 AM, Robert Lougher <<a href="mailto:rob.lougher@gmail.com" class="">rob.lougher@gmail.com</a>><br class="">wrote:<br class=""><br class=""><br class="">Unfortunately, another team, while doing internal testing has seen the<br class="">new path generating illegal insertps masks.  A sample here:<br class=""><br class="">   vinsertps    $256, %xmm0, %xmm13, %xmm4 # xmm4 = xmm0[0],xmm13[1,2,3]<br class="">   vinsertps    $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3]<br class="">   vinsertps    $256, %xmm13, %xmm1, %xmm7 # xmm7 = xmm13[0],xmm1[1,2,3]<br class="">   vinsertps    $416, %xmm1, %xmm4, %xmm14 # xmm14 =<br class="">xmm4[0,1],xmm1[2],xmm4[3]<br class="">   vinsertps    $416, %xmm13, %xmm6, %xmm13 # xmm13 =<br class="">xmm6[0,1],xmm13[2],xmm6[3]<br class="">   vinsertps    $416, %xmm0, %xmm7, %xmm0 # xmm0 =<br class="">xmm7[0,1],xmm0[2],xmm7[3]<br class=""><br class="">We'll continue to look into this and do additional testing.<br class=""><br class=""><br class=""><br class="">Interesting. Let me know if you get a test case. The insertps code path<br class="">was<br class="">added recently though and has been much less well tested. I'll start fuzz<br class="">testing it and should hopefully uncover the bug.<br class=""><br class=""><br class="">Here's two small test cases.  Hope they are of use.<br class=""><br class="">Thanks,<br class="">Rob.<br class=""><br class="">------<br class="">define <4 x float> @test(<4 x float> %xyzw, <4 x float> %abcd) {<br class=""> %1 = extractelement <4 x float> %xyzw, i32 0<br class=""> %2 = insertelement <4 x float> undef, float %1, i32 0<br class=""> %3 = insertelement <4 x float> %2, float 0.000000e+00, i32 1<br class=""> %4 = shufflevector <4 x float> %3, <4 x float> %xyzw, <4 x i32> <i32<br class="">0, i32 1, i32 6, i32 undef><br class=""> %5 = shufflevector <4 x float> %4, <4 x float> %abcd, <4 x i32> <i32<br class="">0, i32 1, i32 2, i32 4><br class=""> ret <4 x float> %5<br class="">}<br class=""><br class="">define <4 x float> @test2(<4 x float> %xyzw, <4 x float> %abcd) {<br class=""> %1 = shufflevector <4 x float> %xyzw, <4 x float> %abcd, <4 x i32><br class=""><i32 0, i32 undef, i32 2, i32 4><br class=""> %2 = shufflevector <4 x float> <float undef, float 0.000000e+00,<br class="">float undef, float undef>, <4 x float> %1, <4 x i32> <i32 4, i32 1,<br class="">i32 6, i32 7><br class=""> ret <4 x float> %2<br class="">}<br class=""><br class=""><br class="">llc -march=x86-64 -mattr=+avx test.ll -o -<br class=""><br class="">test:                                   # @test<br class="">   vxorps    %xmm2, %xmm2, %xmm2<br class="">   vmovss    %xmm0, %xmm2, %xmm2<br class="">   vblendps    $4, %xmm0, %xmm2, %xmm0 # xmm0 = xmm2[0,1],xmm0[2],xmm2[3]<br class="">   vinsertps    $48, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0]<br class="">   retl<br class=""><br class="">test2:                                  # @test2<br class="">   vinsertps    $48, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0]<br class="">   vxorps    %xmm1, %xmm1, %xmm1<br class="">   vblendps    $13, %xmm0, %xmm1, %xmm0 # xmm0 =<br class="">xmm0[0],xmm1[1],xmm0[2,3]<br class="">   retl<br class=""><br class="">llc -march=x86-64 -mattr=+avx<br class="">-x86-experimental-vector-shuffle-lowering test.ll -o -<br class=""><br class="">test:                                   # @test<br class="">   vinsertps    $270, %xmm0, %xmm0, %xmm2 # xmm2 = xmm0[0],zero,zero,zero<br class="">   vinsertps    $416, %xmm0, %xmm2, %xmm0 # xmm0 =<br class="">xmm2[0,1],xmm0[2],xmm2[3]<br class="">   vinsertps    $304, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0]<br class="">   retl<br class=""><br class="">test2:                                  # @test2<br class="">   vinsertps    $304, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0]<br class="">   vxorps    %xmm1, %xmm1, %xmm1<br class="">   vinsertps    $336, %xmm1, %xmm0, %xmm0 # xmm0 =<br class="">xmm0[0],xmm1[1],xmm0[2,3]<br class="">   retl<br class="">_______________________________________________<br class="">LLVM Developers mailing list<br class=""><a href="mailto:LLVMdev@cs.uiuc.edu" class="">LLVMdev@cs.uiuc.edu</a>         <a href="http://llvm.cs.uiuc.edu" class="">http://llvm.cs.uiuc.edu</a><br class=""><a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" class="">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br class=""><br class=""><br class=""><br class="">_______________________________________________<br class="">LLVM Developers mailing list<br class="">LLVMdev@cs.uiuc.edu         http://llvm.cs.uiuc.edu<br class="">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev<br class=""><br class=""></blockquote><br class=""></blockquote><br class=""><br class="">_______________________________________________<br class="">LLVM Developers mailing list<br class=""><a href="mailto:LLVMdev@cs.uiuc.edu" class="">LLVMdev@cs.uiuc.edu</a>         <a href="http://llvm.cs.uiuc.edu" class="">http://llvm.cs.uiuc.edu</a><br class=""><a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" class="">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br class=""><br class=""><br class="">_______________________________________________<br class="">LLVM Developers mailing list<br class="">LLVMdev@cs.uiuc.edu         http://llvm.cs.uiuc.edu<br class="">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev<br class=""><br class=""><br class=""><br class="">_______________________________________________<br class="">LLVM Developers mailing list<br class="">LLVMdev@cs.uiuc.edu         http://llvm.cs.uiuc.edu<br class="">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev<br class=""><br class=""></blockquote></div></blockquote></div><br class=""></div></body></html>