<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Sep 9, 2014 at 12:53 PM, Quentin Colombet <span dir="ltr"><<a href="mailto:qcolombet@apple.com" target="_blank">qcolombet@apple.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word">Hi Chandler,<div><br></div><div>I had observed some improvements and regressions with the new lowering.</div><div><br></div><div>Here are the numbers for an Ivy Bridge machine fixed at 2900MHz.</div><div><br></div><div>I’ll look into the regressions to provide test cases.</div><div><br></div><div>** Numbers **</div><div><br></div><div>Smaller is better. Only reported tests that run for at least one second.</div><div>Reference is the default lowering, Test is the new lowering.</div><div>The Os numbers are overall neutral, but the O3 numbers mainly expose regressions.</div><div><br></div><div>Note: I can attach the raw numbers if you want.</div></div></blockquote><div><br></div><div>That would be great. Please do.</div><div><br></div><div>-- Sean Silva</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><div><br></div><div>* Os *</div><div><div style="margin:0px;font-size:11px;font-family:Menlo">Benchmark_ID  Â  <span style="white-space:pre-wrap">       </span>Reference<span style="white-space:pre-wrap">       </span>Test  Â  <span style="white-space:pre-wrap">      </span>Expansion <span style="white-space:pre-wrap">      </span>Percent</div><div style="margin:0px;font-size:11px;font-family:Menlo">-------------------------------------------------------------------------------</div><div style="margin:0px;font-size:11px;font-family:Menlo">External/Nurbs/nurbs  Â  Â  Â  Â  Â  Â  Â  Â  Â <span style="white-space:pre-wrap">       </span> Â  Â  Â  2.3302<span style="white-space:pre-wrap">        </span> Â  Â  Â  2.3122<span style="white-space:pre-wrap">        </span>  Â  0.99<span style="white-space:pre-wrap">      </span>  Â  -1%</div><div style="margin:0px;font-size:11px;font-family:Menlo">External/SPEC/CFP2000/183.equake/183.eq<span style="white-space:pre-wrap">   </span> Â  Â  Â  3.2606<span style="white-space:pre-wrap">        </span> Â  Â  Â  3.2419<span style="white-space:pre-wrap">        </span>  Â  0.99<span style="white-space:pre-wrap">      </span>  Â  -1%</div><div style="margin:0px;font-size:11px;font-family:Menlo">External/SPEC/CFP2006/447.dealII/<a href="http://447.de" target="_blank">447.de</a><span style="white-space:pre-wrap">       </span>  Â  Â  16.4638<span style="white-space:pre-wrap">        </span>  Â  Â  16.1313<span style="white-space:pre-wrap">        </span>  Â  0.98<span style="white-space:pre-wrap">      </span>  Â  -2%</div><div style="margin:0px;font-size:11px;font-family:Menlo">External/SPEC/CFP2006/470.lbm/470.lbm Â <span style="white-space:pre-wrap">  </span> Â  Â  Â  2.0159<span style="white-space:pre-wrap">        </span> Â  Â  Â  1.9931<span style="white-space:pre-wrap">        </span>  Â  0.99<span style="white-space:pre-wrap">      </span>  Â  -1%</div><div style="margin:0px;font-size:11px;font-family:Menlo">External/SPEC/CINT2000/164.gzip/164.gzi<span style="white-space:pre-wrap">   </span> Â  Â  Â  8.7611<span style="white-space:pre-wrap">        </span> Â  Â  Â  8.6981<span style="white-space:pre-wrap">        </span>  Â  0.99<span style="white-space:pre-wrap">      </span>  Â  -1%</div><div style="margin:0px;font-size:11px;font-family:Menlo">External/SPEC/CINT2006/456.hmmer/<a href="http://456.hm" target="_blank">456.hm</a><span style="white-space:pre-wrap">       </span> Â  Â  Â  2.5674<span style="white-space:pre-wrap">        </span> Â  Â  Â  2.5819<span style="white-space:pre-wrap">        </span>  Â  1.01<span style="white-space:pre-wrap">      </span>  Â  +1%</div><div style="margin:0px;font-size:11px;font-family:Menlo">External/SPEC/CINT2006/462.libquantum/4<span style="white-space:pre-wrap">   </span> Â  Â  Â  1.2924<span style="white-space:pre-wrap">        </span>  Â  Â  Â  1.347<span style="white-space:pre-wrap">       </span>  Â  1.04<span style="white-space:pre-wrap">      </span>  Â  +4%</div><div style="margin:0px;font-size:11px;font-family:Menlo">MultiSource/Benchmarks/TSVC/CrossingThr<span style="white-space:pre-wrap">   </span> Â  Â  Â  2.4703<span style="white-space:pre-wrap">        </span> Â  Â  Â  2.4852<span style="white-space:pre-wrap">        </span>  Â  1.01<span style="white-space:pre-wrap">      </span>  Â  +1%</div><div style="margin:0px;font-size:11px;font-family:Menlo">MultiSource/Benchmarks/TSVC/LoopRerolli<span style="white-space:pre-wrap">   </span> Â  Â  Â  2.6611<span style="white-space:pre-wrap">        </span> Â  Â  Â  2.5668<span style="white-space:pre-wrap">        </span>  Â  0.96<span style="white-space:pre-wrap">      </span>  Â  -4%</div><div style="margin:0px;font-size:11px;font-family:Menlo">MultiSource/Benchmarks/mafft/pairlocala<span style="white-space:pre-wrap">   </span> Â  Â  Â  24.676<span style="white-space:pre-wrap">        </span>  Â  Â  24.5372<span style="white-space:pre-wrap">        </span>  Â  0.99<span style="white-space:pre-wrap">      </span>  Â  -1%</div><div style="margin:0px;font-size:11px;font-family:Menlo">SingleSource/Benchmarks/Adobe-C++/simpl<span style="white-space:pre-wrap">   </span> Â  Â  Â  1.0579<span style="white-space:pre-wrap">        </span> Â  Â  Â  1.1048<span style="white-space:pre-wrap">        </span>  Â  1.04<span style="white-space:pre-wrap">      </span>  Â  +4%</div><div style="margin:0px;font-size:11px;font-family:Menlo">SingleSource/Benchmarks/Linpack/linpack<span style="white-space:pre-wrap">   </span> Â  Â  Â  4.2817<span style="white-space:pre-wrap">        </span> Â  Â  Â  4.3298<span style="white-space:pre-wrap">        </span>  Â  1.01<span style="white-space:pre-wrap">      </span>  Â  +1%</div><div style="margin:0px;font-size:11px;font-family:Menlo">SingleSource/Benchmarks/Misc-C++/stepan<span style="white-space:pre-wrap">   </span> Â  Â  Â  4.1821<span style="white-space:pre-wrap">        </span>  Â  Â  Â  4.226<span style="white-space:pre-wrap">       </span>  Â  1.01<span style="white-space:pre-wrap">      </span>  Â  +1%</div><div style="margin:0px;font-size:11px;font-family:Menlo">SingleSource/Benchmarks/Misc/oourafft Â <span style="white-space:pre-wrap">  </span> Â  Â  Â  3.0305<span style="white-space:pre-wrap">        </span> Â  Â  Â  3.1777<span style="white-space:pre-wrap">        </span>  Â  1.05<span style="white-space:pre-wrap">      </span>  Â  +5%</div><div style="margin:0px;font-size:11px;font-family:Menlo">-------------------------------------------------------------------------------</div><div style="margin:0px;font-size:11px;font-family:Menlo">Min (14) Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  <span style="white-space:pre-wrap">        </span>  Â  Â  Â  Â  Â  -<span style="white-space:pre-wrap">     </span>  Â  Â  Â  Â  Â  -<span style="white-space:pre-wrap">     </span>  Â  0.96<span style="white-space:pre-wrap">      </span>  Â  Â  -</div><div style="margin:0px;font-size:11px;font-family:Menlo">-------------------------------------------------------------------------------</div><div style="margin:0px;font-size:11px;font-family:Menlo">Max (14) Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  <span style="white-space:pre-wrap">       </span>  Â  Â  Â  Â  Â  -<span style="white-space:pre-wrap">     </span>  Â  Â  Â  Â  Â  -<span style="white-space:pre-wrap">     </span>  Â  1.05<span style="white-space:pre-wrap">      </span>  Â  Â  -</div><div style="margin:0px;font-size:11px;font-family:Menlo">-------------------------------------------------------------------------------</div><div style="margin:0px;font-size:11px;font-family:Menlo">Sum (14)  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â <span style="white-space:pre-wrap">      </span> Â  Â  Â  Â  Â  79<span style="white-space:pre-wrap">      </span> Â  Â  Â  Â  Â  79<span style="white-space:pre-wrap">      </span> Â  Â  Â  1<span style="white-space:pre-wrap">     </span>  Â  +0%</div><div style="margin:0px;font-size:11px;font-family:Menlo">-------------------------------------------------------------------------------</div><div style="margin:0px;font-size:11px;font-family:Menlo">A.Mean (14)  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  <span style="white-space:pre-wrap"> </span>  Â  Â  Â  Â  Â  -<span style="white-space:pre-wrap">     </span>  Â  Â  Â  Â  Â  -<span style="white-space:pre-wrap">     </span>  Â  1.01<span style="white-space:pre-wrap">      </span>  Â  +1%</div><div style="margin:0px;font-size:11px;font-family:Menlo">-------------------------------------------------------------------------------</div><div style="margin:0px;font-size:11px;font-family:Menlo">G.Mean 2 (14)  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  <span style="white-space:pre-wrap">  </span>  Â  Â  Â  Â  Â  -<span style="white-space:pre-wrap">     </span>  Â  Â  Â  Â  Â  -<span style="white-space:pre-wrap">     </span>  Â  1.01<span style="white-space:pre-wrap">      </span>  Â  +1%</div><div style="margin:0px;font-size:11px;font-family:Menlo">-------------------------------------------------------------------------------</div></div><div><br></div><div>* O3 *</div><div><div style="margin:0px;font-size:11px;font-family:Menlo">Benchmark_ID  Â  <span style="white-space:pre-wrap">       </span>Reference<span style="white-space:pre-wrap">       </span>Test  Â  <span style="white-space:pre-wrap">      </span>Expansion <span style="white-space:pre-wrap">      </span>Percent</div><div style="margin:0px;font-size:11px;font-family:Menlo">-------------------------------------------------------------------------------</div><div style="margin:0px;font-size:11px;font-family:Menlo">External/Nurbs/nurbs  Â  Â  Â  Â  Â  Â  Â  Â  Â <span style="white-space:pre-wrap">       </span> Â  Â  Â  2.2322<span style="white-space:pre-wrap">        </span> Â  Â  Â  2.2131<span style="white-space:pre-wrap">        </span>  Â  0.99<span style="white-space:pre-wrap">      </span>  Â  -1%</div><div style="margin:0px;font-size:11px;font-family:Menlo">External/Povray/povray  Â  Â  Â  Â  Â  Â  Â  Â <span style="white-space:pre-wrap">  </span> Â  Â  Â  2.2638<span style="white-space:pre-wrap">        </span> Â  Â  Â  2.2762<span style="white-space:pre-wrap">        </span>  Â  1.01<span style="white-space:pre-wrap">      </span>  Â  +1%</div><div style="margin:0px;font-size:11px;font-family:Menlo">External/SPEC/CFP2000/177.mesa/177.mesa<span style="white-space:pre-wrap">   </span> Â  Â  Â  1.6675<span style="white-space:pre-wrap">        </span> Â  Â  Â  1.6828<span style="white-space:pre-wrap">        </span>  Â  1.01<span style="white-space:pre-wrap">      </span>  Â  +1%</div><div style="margin:0px;font-size:11px;font-family:Menlo">External/SPEC/CFP2000/188.ammp/188.ammp<span style="white-space:pre-wrap">   </span>  Â  Â  10.9309<span style="white-space:pre-wrap">        </span>  Â  Â  11.1191<span style="white-space:pre-wrap">        </span>  Â  1.02<span style="white-space:pre-wrap">      </span>  Â  +2%</div><div style="margin:0px;font-size:11px;font-family:Menlo">External/SPEC/CFP2006/433.milc/433.milc<span style="white-space:pre-wrap">   </span> Â  Â  Â  6.9214<span style="white-space:pre-wrap">        </span> Â  Â  Â  7.1696<span style="white-space:pre-wrap">        </span>  Â  1.04<span style="white-space:pre-wrap">      </span>  Â  +4%</div><div style="margin:0px;font-size:11px;font-family:Menlo">External/SPEC/CINT2000/164.gzip/164.gzi<span style="white-space:pre-wrap">   </span> Â  Â  Â  8.5327<span style="white-space:pre-wrap">        </span> Â  Â  Â  8.8114<span style="white-space:pre-wrap">        </span>  Â  1.03<span style="white-space:pre-wrap">      </span>  Â  +3%</div><div style="margin:0px;font-size:11px;font-family:Menlo">External/SPEC/CINT2000/186.crafty/186.c<span style="white-space:pre-wrap">   </span> Â  Â  Â  4.1266<span style="white-space:pre-wrap">        </span> Â  Â  Â  Â  4.16<span style="white-space:pre-wrap">       </span>  Â  1.01<span style="white-space:pre-wrap">      </span>  Â  +1%</div><div style="margin:0px;font-size:11px;font-family:Menlo">External/SPEC/CINT2000/253.perlbmk/253.<span style="white-space:pre-wrap">   </span> Â  Â  Â  5.6991<span style="white-space:pre-wrap">        </span> Â  Â  Â  5.7309<span style="white-space:pre-wrap">        </span>  Â  1.01<span style="white-space:pre-wrap">      </span>  Â  +1%</div><div style="margin:0px;font-size:11px;font-family:Menlo">External/SPEC/CINT2000/256.bzip2/<a href="http://256.bz" target="_blank">256.bz</a><span style="white-space:pre-wrap">       </span> Â  Â  Â  6.7917<span style="white-space:pre-wrap">        </span> Â  Â  Â  6.8763<span style="white-space:pre-wrap">        </span>  Â  1.01<span style="white-space:pre-wrap">      </span>  Â  +1%</div><div style="margin:0px;font-size:11px;font-family:Menlo">External/SPEC/CINT2006/400.perlbench/40<span style="white-space:pre-wrap">   </span>  Â  Â  Â  6.243<span style="white-space:pre-wrap">       </span> Â  Â  Â  6.1464<span style="white-space:pre-wrap">        </span>  Â  0.98<span style="white-space:pre-wrap">      </span>  Â  -2%</div><div style="margin:0px;font-size:11px;font-family:Menlo">External/SPEC/CINT2006/401.bzip2/<a href="http://401.bz" target="_blank">401.bz</a><span style="white-space:pre-wrap">       </span>  Â  Â  Â  2.095<span style="white-space:pre-wrap">       </span> Â  Â  Â  2.0588<span style="white-space:pre-wrap">        </span>  Â  0.98<span style="white-space:pre-wrap">      </span>  Â  -2%</div><div style="margin:0px;font-size:11px;font-family:Menlo">External/SPEC/CINT2006/462.libquantum/4<span style="white-space:pre-wrap">   </span>  Â  Â  Â  Â  1.2<span style="white-space:pre-wrap">      </span> Â  Â  Â  1.2108<span style="white-space:pre-wrap">        </span>  Â  1.01<span style="white-space:pre-wrap">      </span>  Â  +1%</div><div style="margin:0px;font-size:11px;font-family:Menlo">MultiSource/Applications/SIBsim4/SIBsim<span style="white-space:pre-wrap">   </span> Â  Â  Â  2.4547<span style="white-space:pre-wrap">        </span> Â  Â  Â  2.5129<span style="white-space:pre-wrap">        </span>  Â  1.02<span style="white-space:pre-wrap">      </span>  Â  +2%</div><div style="margin:0px;font-size:11px;font-family:Menlo">MultiSource/Benchmarks/Bullet/bullet  Â <span style="white-space:pre-wrap"> </span> Â  Â  Â  4.1687<span style="white-space:pre-wrap">        </span> Â  Â  Â  4.0882<span style="white-space:pre-wrap">        </span>  Â  0.98<span style="white-space:pre-wrap">      </span>  Â  -2%</div><div style="margin:0px;font-size:11px;font-family:Menlo">MultiSource/Benchmarks/TSVC/LinearDepen<span style="white-space:pre-wrap">   </span> Â  Â  Â  3.0389<span style="white-space:pre-wrap">        </span> Â  Â  Â  3.0566<span style="white-space:pre-wrap">        </span>  Â  1.01<span style="white-space:pre-wrap">      </span>  Â  +1%</div><div style="margin:0px;font-size:11px;font-family:Menlo">MultiSource/Benchmarks/TSVC/LinearDepen<span style="white-space:pre-wrap">   </span> Â  Â  Â  2.1298<span style="white-space:pre-wrap">        </span> Â  Â  Â  2.1997<span style="white-space:pre-wrap">        </span>  Â  1.03<span style="white-space:pre-wrap">      </span>  Â  +3%</div><div style="margin:0px;font-size:11px;font-family:Menlo">MultiSource/Benchmarks/TSVC/LoopRerolli<span style="white-space:pre-wrap">   </span> Â  Â  Â  2.6458<span style="white-space:pre-wrap">        </span> Â  Â  Â  2.5552<span style="white-space:pre-wrap">        </span>  Â  0.97<span style="white-space:pre-wrap">      </span>  Â  -3%</div><div style="margin:0px;font-size:11px;font-family:Menlo">MultiSource/Benchmarks/TSVC/Symbolics-f<span style="white-space:pre-wrap">   </span> Â  Â  Â  1.6243<span style="white-space:pre-wrap">        </span> Â  Â  Â  1.6612<span style="white-space:pre-wrap">        </span>  Â  1.02<span style="white-space:pre-wrap">      </span>  Â  +2%</div><div style="margin:0px;font-size:11px;font-family:Menlo">MultiSource/Benchmarks/mafft/pairlocala<span style="white-space:pre-wrap">   </span>  Â  Â  23.8979<span style="white-space:pre-wrap">        </span>  Â  Â  24.0547<span style="white-space:pre-wrap">        </span>  Â  1.01<span style="white-space:pre-wrap">      </span>  Â  +1%</div><div style="margin:0px;font-size:11px;font-family:Menlo">SingleSource/Benchmarks/Misc/oourafft Â <span style="white-space:pre-wrap">  </span> Â  Â  Â  3.0374<span style="white-space:pre-wrap">        </span> Â  Â  Â  3.1846<span style="white-space:pre-wrap">        </span>  Â  1.05<span style="white-space:pre-wrap">      </span>  Â  +5%</div><div style="margin:0px;font-size:11px;font-family:Menlo">SingleSource/Benchmarks/SmallPT/smallpt<span style="white-space:pre-wrap">   </span> Â  Â  Â  6.5533<span style="white-space:pre-wrap">        </span> Â  Â  Â  6.6683<span style="white-space:pre-wrap">        </span>  Â  1.02<span style="white-space:pre-wrap">      </span>  Â  +2%</div><div style="margin:0px;font-size:11px;font-family:Menlo">-------------------------------------------------------------------------------</div><div style="margin:0px;font-size:11px;font-family:Menlo">Min (21) Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  <span style="white-space:pre-wrap">        </span>  Â  Â  Â  Â  Â  -<span style="white-space:pre-wrap">     </span>  Â  Â  Â  Â  Â  -<span style="white-space:pre-wrap">     </span>  Â  0.97<span style="white-space:pre-wrap">      </span>  Â  Â  -</div><div style="margin:0px;font-size:11px;font-family:Menlo">-------------------------------------------------------------------------------</div><div style="margin:0px;font-size:11px;font-family:Menlo">Max (21) Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  <span style="white-space:pre-wrap">       </span>  Â  Â  Â  Â  Â  -<span style="white-space:pre-wrap">     </span>  Â  Â  Â  Â  Â  -<span style="white-space:pre-wrap">     </span>  Â  1.05<span style="white-space:pre-wrap">      </span>  Â  Â  -</div><div style="margin:0px;font-size:11px;font-family:Menlo">-------------------------------------------------------------------------------</div><div style="margin:0px;font-size:11px;font-family:Menlo">Sum (21) Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  <span style="white-space:pre-wrap">       </span>  Â  Â  Â  Â  108<span style="white-space:pre-wrap">      </span>  Â  Â  Â  Â  109<span style="white-space:pre-wrap">      </span>  Â  1.01<span style="white-space:pre-wrap">      </span>  Â  -1%</div><div style="margin:0px;font-size:11px;font-family:Menlo">-------------------------------------------------------------------------------</div><div style="margin:0px;font-size:11px;font-family:Menlo">A.Mean (21)  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  <span style="white-space:pre-wrap"> </span>  Â  Â  Â  Â  Â  -<span style="white-space:pre-wrap">     </span>  Â  Â  Â  Â  Â  -<span style="white-space:pre-wrap">     </span>  Â  1.01<span style="white-space:pre-wrap">      </span>  Â  +1%</div><div style="margin:0px;font-size:11px;font-family:Menlo">-------------------------------------------------------------------------------</div><div style="margin:0px;font-size:11px;font-family:Menlo">G.Mean 2 (21)  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  <span style="white-space:pre-wrap">  </span>  Â  Â  Â  Â  Â  -<span style="white-space:pre-wrap">     </span>  Â  Â  Â  Â  Â  -<span style="white-space:pre-wrap">     </span>  Â  1.01<span style="white-space:pre-wrap">      </span>  Â  +1%</div><div style="margin:0px;font-size:11px;font-family:Menlo">-------------------------------------------------------------------------------</div></div><div><br></div><div>Thanks,</div><div>-Quentin<div><div class="h5"><br><div><blockquote type="cite"><div>On Sep 9, 2014, at 6:13 AM, Andrea Di Biagio <<a href="mailto:andrea.dibiagio@gmail.com" target="_blank">andrea.dibiagio@gmail.com</a>> wrote:</div><br><div>Hi Chandler,<br><br>Thanks for fixing the problem with the insertps mask.<br><br>Generally the new shuffle lowering looks promising, however there are<br>some cases where the codegen is now worse causing runtime performance<br>regressions in some of our internal codebase.<br><br>You have already mentioned how the new shuffle lowering is missing<br>some features; for example, you explicitly said that we currently lack<br>of SSE4.1 blend support. Unfortunately, this seems to be one of the<br>main reasons for the slowdown we are seeing.<br><br>Here is a list of what we found so far that we think is causing most<br>of the slowdown:<br>1) shufps is always emitted in cases where we could emit a single<br>blendps; in these cases, blendps is preferable because it has better<br>reciprocal throughput (this is true on all modern Intel and AMD cpus).<br><br>Things get worse when it comes to lowering shuffles where the shuffle<br>mask indices refer to elements from both input vectors in each lane.<br>For example, a shuffle mask of <0,5,2,7> could be easily lowered into<br>a single blendps; instead it gets lowered into two shufps<br>instructions.<br><br>Example:<br>;;;<br>define <4 x float> @foo(<4 x float> %A, <4 x float> %B) {<br> Â %1 = shufflevector <4 x float> %A, <4 x float> %B, <4 x i32> <i32 0,<br>i32 5, i32 2, i32 7><br> Â ret <4 x float> %1<br>}<br>;;;<br><br>llc (-mcpu=corei7-avx):<br> Â vblendps Â $10, %xmm1, %xmm0, %xmm0 Â Â # xmm0 = xmm0[0],xmm1[5],xmm0[2],xmm1[7]<br><br>llc -x86-experimental-vector-shuffle-lowering (-mcpu=corei7-avx):<br> Â vshufps $-40, %xmm0, %xmm1, %xmm0 # xmm0 = xmm1[0,2],xmm0[1,3]<br> Â vshufps $-40, %xmm0, %xmm0, %xmm0 # xmm0[0,2,1,3]<br><br><br>2) On SSE4.1, we should try not to emit an insertps if the shuffle<br>mask identifies a blend. At the moment the new lowering logic is very<br>aggressively emitting insertps instead of cheaper blendps.<br><br>Example:<br>;;;<br>define <4 x float> @bar(<4 x float> %A, <4 x float> %B) {<br> Â %1 = shufflevector <4 x float> %A, <4 x float> %B, <4 x i32> <i32 4,<br>i32 5, i32 2, i32 7><br> Â ret <4 x float> %1<br>}<br>;;;<br><br>llc (-mcpu=corei7-avx):<br> Â vblendps Â $11, %xmm0, %xmm1, %xmm0 Â Â # xmm0 = xmm0[0,1],xmm1[2],xmm0[3]<br><br>llc -x86-experimental-vector-shuffle-lowering (-mcpu=corei7-avx):<br> Â vinsertps $-96, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1],xmm1[2],xmm0[3]<br><br><br>3) When a shuffle performs an insert at index 0 we always generate an<br>insertps, while a movss would do a better job.<br>;;;<br>define <4 x float> @baz(<4 x float> %A, <4 x float> %B) {<br> Â %1 = shufflevector <4 x float> %A, <4 x float> %B, <4 x i32> <i32 4,<br>i32 1, i32 2, i32 3><br> Â ret <4 x float> %1<br>}<br>;;;<br><br>llc (-mcpu=corei7-avx):<br> Â vmovss %xmm1, %xmm0, %xmm0<br><br>llc -x86-experimental-vector-shuffle-lowering (-mcpu=corei7-avx):<br> Â vinsertps $0, %xmm1, %xmm0, %xmm0 # xmm0 = xmm1[0],xmm0[1,2,3]<br><br>I hope this is useful. We would be happy to contribute patches to<br>improve some of the above cases, but we obviously know that this is<br>still a work in progress, so we don't want to introduce conflicts with<br>your work. Please let us know what you think.<br><br>We will keep looking at this and follow up with any further findings.<br><br>Thanks,<br>Andrea Di Biagio<br>SN Systems - Sony Computer Entertainment Inc.<br><br>On Mon, Sep 8, 2014 at 6:08 PM, Quentin Colombet <<a href="mailto:qcolombet@apple.com" target="_blank">qcolombet@apple.com</a>> wrote:<br><blockquote type="cite">Hi Chandler,<br><br>Forget about that I said.<br>It seems I have some weird dependencies in my built system.<br>My binaries are out-of-sync.<br><br>Let me sort that out, this is likely the problem is already fixed, and I can<br>resume the measurements.<br><br>Sorry for the noise.<br><br>Q.<br><br>On Sep 8, 2014, at 9:32 AM, Quentin Colombet <<a href="mailto:qcolombet@apple.com" target="_blank">qcolombet@apple.com</a>> wrote:<br><br><br>On Sep 7, 2014, at 8:49 PM, Quentin Colombet <<a href="mailto:qcolombet@apple.com" target="_blank">qcolombet@apple.com</a>> wrote:<br><br>Sure,<br><br>Here is the command line:<br>clang -cc1 -triple x86_64-apple-macosx -S -disable-free<br>-disable-llvm-verifier -main-file-name tmp.i -mrelocation-model pic<br>-pic-level 2 -mdisable-fp-elim -masm-verbose -munwind-tables -target-cpu<br>core-avx-i Â -O3 Â -ferror-limit 19 -fmessage-length 114 -stack-protector 1<br>-mstackrealign -fblocks Â -fencode-extended-block-signature<br>-fmax-type-align=16 -fdiagnostics-show-option -fcolor-diagnostics<br>-vectorize-loops -vectorize-slp -mllvm<br>-x86-experimental-vector-shuffle-lowering=true -o tmp.s -x cpp-output tmp.i<br><br>This was with trunk 215249.<br><br>I meant, r217281.<br><br><br>Thanks,<br>-Quentin<br><br><tmp.i><br><br>On Sep 6, 2014, at 4:27 PM, Chandler Carruth <<a href="mailto:chandlerc@gmail.com" target="_blank">chandlerc@gmail.com</a>> wrote:<br><br>I've run the SingleSource test suite for core-avx-i and have no failures<br>here so a preprocessed file + commandline would be very useful if this<br>reproduces for you still.<br><br>On Sat, Sep 6, 2014 at 4:07 PM, Chandler Carruth <<a href="mailto:chandlerc@gmail.com" target="_blank">chandlerc@gmail.com</a>><br>wrote:<br><blockquote type="cite"><br>I'm having trouble reproducing this. I'm trying to get LNT to actually<br>run, but manually compiling the given source file didn't reproduce it for<br>me.<br><br>It might have been fixed recently (although I'd be surprised if so), but<br>it would help to get the actual command line for which compiling this file<br>in the test suite failed.<br><br>-Chandler<br><br>On Fri, Sep 5, 2014 at 4:36 PM, Quentin Colombet <<a href="mailto:qcolombet@apple.com" target="_blank">qcolombet@apple.com</a>><br>wrote:<br><blockquote type="cite"><br>Hi Chandler,<br><br>While doing the performance measurement on a Ivy Bridge, I ran into<br>compile time errors.<br><br>I saw a bunch of â€œcannot select" in the LLVM test suite with<br>-march=core-avx-i.<br>E.g., SingleSource/UnitTests/Vector/SSE/sse.isamax.c is failing at O3<br>-march=core-avx-i with:<br>fatal error: error in backend: Cannot select: 0x7f91b99a6420: v4i32 =<br>bitcast 0x7f91b99b0e10 [ORD=3] [ID=27]<br> Â 0x7f91b99b0e10: v4i64 = insert_subvector 0x7f91b99a7210,<br>0x7f91b99a6d68, 0x7f91b99ace70 [ORD=2] [ID=25]<br> Â Â Â 0x7f91b99a7210: v4i64 = undef [ID=15]<br> Â Â Â 0x7f91b99a6d68: v2i64 = scalar_to_vector 0x7f91b99ab840 [ORD=2]<br>[ID=23]<br> Â Â Â Â Â 0x7f91b99ab840: i64 = AssertZext 0x7f91b99acc60, 0x7f91b99ac738<br>[ORD=2] [ID=20]<br> Â Â Â Â Â Â Â 0x7f91b99acc60: i64,ch = CopyFromReg 0x7f91b8d52820,<br>0x7f91b99a3a10 [ORD=2] [ID=16]<br> Â Â Â Â Â Â Â Â Â 0x7f91b99a3a10: i64 = Register %vreg68 [ID=1]<br> Â Â Â 0x7f91b99ace70: i64 = Constant<0> [ID=3]<br>In function: isamax0<br>clang: error: clang frontend command failed with exit code 70 (use -v to<br>see invocation)<br>clang version 3.6.0 (215249)<br>Target: x86_64-apple-darwin14.0.0<br><br>For some reason, I cannot reproduce the problem with the test case that<br>clang gives me using -emit-llvm. Since the source is public, I guess you can<br>try to reproduce on your side.<br>Indeed, if you run the test-suite with -march=core-avx-i you’ll likely<br>see all those failures.<br><br>Let me know if you cannot and I’ll try harder to produce a test case.<br><br>Note: This is the same failure all over the place, i.e., cannot select a<br>bit cast from various types to v4i32 or v4i64.<br><br>Thanks,<br>-Quentin<br><br><br>On Sep 5, 2014, at 11:09 AM, Robert Lougher <rob.lougher@<br><br><a href="http://gmail.com" target="_blank">gmail.com</a>> wrote:<br><br>Hi Chandler,<br><br>On 5 September 2014 17:38, Chandler Carruth <<a href="mailto:chandlerc@gmail.com" target="_blank">chandlerc@gmail.com</a>> wrote:<br><br><br>On Fri, Sep 5, 2014 at 9:32 AM, Robert Lougher <<a href="mailto:rob.lougher@gmail.com" target="_blank">rob.lougher@gmail.com</a>><br>wrote:<br><br><br>Unfortunately, another team, while doing internal testing has seen the<br>new path generating illegal insertps masks. Â A sample here:<br><br> Â Â vinsertps Â Â Â $256, %xmm0, %xmm13, %xmm4 # xmm4 = xmm0[0],xmm13[1,2,3]<br> Â Â vinsertps Â Â Â $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3]<br> Â Â vinsertps Â Â Â $256, %xmm13, %xmm1, %xmm7 # xmm7 = xmm13[0],xmm1[1,2,3]<br> Â Â vinsertps Â Â Â $416, %xmm1, %xmm4, %xmm14 # xmm14 =<br>xmm4[0,1],xmm1[2],xmm4[3]<br> Â Â vinsertps Â Â Â $416, %xmm13, %xmm6, %xmm13 # xmm13 =<br>xmm6[0,1],xmm13[2],xmm6[3]<br> Â Â vinsertps Â Â Â $416, %xmm0, %xmm7, %xmm0 # xmm0 =<br>xmm7[0,1],xmm0[2],xmm7[3]<br><br>We'll continue to look into this and do additional testing.<br><br><br><br>Interesting. Let me know if you get a test case. The insertps code path<br>was<br>added recently though and has been much less well tested. I'll start fuzz<br>testing it and should hopefully uncover the bug.<br><br><br>Here's two small test cases. Â Hope they are of use.<br><br>Thanks,<br>Rob.<br><br>------<br>define <4 x float> @test(<4 x float> %xyzw, <4 x float> %abcd) {<br> %1 = extractelement <4 x float> %xyzw, i32 0<br> %2 = insertelement <4 x float> undef, float %1, i32 0<br> %3 = insertelement <4 x float> %2, float 0.000000e+00, i32 1<br> %4 = shufflevector <4 x float> %3, <4 x float> %xyzw, <4 x i32> <i32<br>0, i32 1, i32 6, i32 undef><br> %5 = shufflevector <4 x float> %4, <4 x float> %abcd, <4 x i32> <i32<br>0, i32 1, i32 2, i32 4><br> ret <4 x float> %5<br>}<br><br>define <4 x float> @test2(<4 x float> %xyzw, <4 x float> %abcd) {<br> %1 = shufflevector <4 x float> %xyzw, <4 x float> %abcd, <4 x i32><br><i32 0, i32 undef, i32 2, i32 4><br> %2 = shufflevector <4 x float> <float undef, float 0.000000e+00,<br>float undef, float undef>, <4 x float> %1, <4 x i32> <i32 4, i32 1,<br>i32 6, i32 7><br> ret <4 x float> %2<br>}<br><br><br>llc -march=x86-64 -mattr=+avx test.ll -o -<br><br>test: Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â # @test<br> Â Â vxorps Â Â Â %xmm2, %xmm2, %xmm2<br> Â Â vmovss Â Â Â %xmm0, %xmm2, %xmm2<br> Â Â vblendps Â Â Â $4, %xmm0, %xmm2, %xmm0 # xmm0 = xmm2[0,1],xmm0[2],xmm2[3]<br> Â Â vinsertps Â Â Â $48, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0]<br> Â Â retl<br><br>test2: Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â # @test2<br> Â Â vinsertps Â Â Â $48, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0]<br> Â Â vxorps Â Â Â %xmm1, %xmm1, %xmm1<br> Â Â vblendps Â Â Â $13, %xmm0, %xmm1, %xmm0 # xmm0 =<br>xmm0[0],xmm1[1],xmm0[2,3]<br> Â Â retl<br><br>llc -march=x86-64 -mattr=+avx<br>-x86-experimental-vector-shuffle-lowering test.ll -o -<br><br>test: Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â # @test<br> Â Â vinsertps Â Â Â $270, %xmm0, %xmm0, %xmm2 # xmm2 = xmm0[0],zero,zero,zero<br> Â Â vinsertps Â Â Â $416, %xmm0, %xmm2, %xmm0 # xmm0 =<br>xmm2[0,1],xmm0[2],xmm2[3]<br> Â Â vinsertps Â Â Â $304, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0]<br> Â Â retl<br><br>test2: Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â # @test2<br> Â Â vinsertps Â Â Â $304, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0]<br> Â Â vxorps Â Â Â %xmm1, %xmm1, %xmm1<br> Â Â vinsertps Â Â Â $336, %xmm1, %xmm0, %xmm0 # xmm0 =<br>xmm0[0],xmm1[1],xmm0[2,3]<br> Â Â retl<br>_______________________________________________<br>LLVM Developers mailing list<br><a href="mailto:LLVMdev@cs.uiuc.edu" target="_blank">LLVMdev@cs.uiuc.edu</a> Â Â Â Â Â Â Â Â <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br><a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br><br><br><br>_______________________________________________<br>LLVM Developers mailing list<br><a href="mailto:LLVMdev@cs.uiuc.edu" target="_blank">LLVMdev@cs.uiuc.edu</a> Â Â Â Â Â Â Â Â <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br><a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br><br></blockquote><br></blockquote><br><br>_______________________________________________<br>LLVM Developers mailing list<br><a href="mailto:LLVMdev@cs.uiuc.edu" target="_blank">LLVMdev@cs.uiuc.edu</a> Â Â Â Â Â Â Â Â <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br><a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br><br><br>_______________________________________________<br>LLVM Developers mailing list<br><a href="mailto:LLVMdev@cs.uiuc.edu" target="_blank">LLVMdev@cs.uiuc.edu</a> Â Â Â Â Â Â Â Â <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br><a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br><br><br><br>_______________________________________________<br>LLVM Developers mailing list<br><a href="mailto:LLVMdev@cs.uiuc.edu" target="_blank">LLVMdev@cs.uiuc.edu</a> Â Â Â Â Â Â Â Â <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br><a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br><br></blockquote></div></blockquote></div><br></div></div></div></div><br>_______________________________________________<br>
LLVM Developers mailing list<br>
<a href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a>  Â  Â  Â  Â <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>
<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>
<br></blockquote></div><br></div></div>