<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><div>Hi Hal, </div><div><br></div><div>I agree that <3 x float> is really important for some codes, such as graphics. We disable non-power-of-two at the moment because the cost model is less accurate for these types. I think that the next step in the development of the SLP-vectorizer should be to work on non-power-of-two types.  In the meanwhile we may want to add a command-line flag to enable non-power-of-two for the brave people who are willing to experiment. </div><div><br></div><div>Thanks,</div><div>Nadav</div><br><div><div>On Aug 31, 2013, at 8:09 AM, Hal Finkel <<a href="mailto:hfinkel@anl.gov">hfinkel@anl.gov</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div style="font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><br>----- Yi Jiang <<a href="mailto:yjiang@apple.com">yjiang@apple.com</a>> wrote:<br><blockquote type="cite">Hi,<span class="Apple-converted-space"> </span><br><br>In this patch we are trying to do two things:<br>1) If the width of vectorization list candidate is bigger than vector reg width, we will break it down to fit the vector reg.<br>2) We do not vectorize the width which is not power of two.<span class="Apple-converted-space"> </span><br></blockquote><br>Can you please explain the motivation for the power-of-two restriction? Vectorizing into 3xsomething is an important use case. If you must disable it because of poor codegen on some platforms, please provide a TTI function so that it can be enabled on targets that do a better job.<br><br>Thanks,<br>Hal<br><br><br><br><blockquote type="cite"><br>Here is the performance result of the change and we use ref input. Any comments and suggestions are appreciated.<span class="Apple-converted-space"> </span><br><br><br><br>Performance Regressions - Execution Time<span class="Apple-tab-span" style="white-space: pre;">    </span>Δ<span class="Apple-tab-span" style="white-space: pre;">        </span>Previous<span class="Apple-tab-span" style="white-space: pre;">  </span>Current<span class="Apple-tab-span" style="white-space: pre;">   </span>σ<span class="Apple-tab-span" style="white-space: pre;">        </span>Δ (B)<span class="Apple-tab-span" style="white-space: pre;">    </span>σ (B)<br>External/SPEC/CINT95/134_perl/134_perl<span class="Apple-tab-span" style="white-space: pre;">    </span>2.76%<span class="Apple-tab-span" style="white-space: pre;">     </span>2.9533<span class="Apple-tab-span" style="white-space: pre;">    </span>3.0348<span class="Apple-tab-span" style="white-space: pre;">    </span>0.0024<span class="Apple-tab-span" style="white-space: pre;">    </span>0.00%<span class="Apple-tab-span" style="white-space: pre;">     </span>0.0024<br><br><br>Performance Improvements - Execution Time<span class="Apple-tab-span" style="white-space: pre;">     </span>Δ<span class="Apple-tab-span" style="white-space: pre;">        </span>Previous<span class="Apple-tab-span" style="white-space: pre;">  </span>Current<span class="Apple-tab-span" style="white-space: pre;">   </span>σ<span class="Apple-tab-span" style="white-space: pre;">        </span>Δ (B)<span class="Apple-tab-span" style="white-space: pre;">    </span>σ (B)<br>External/SPEC/CFP2000/177_mesa/177_mesa<span class="Apple-tab-span" style="white-space: pre;">   </span>-6.97%<span class="Apple-tab-span" style="white-space: pre;">    </span>21.1910<span class="Apple-tab-span" style="white-space: pre;">   </span>19.7130<span class="Apple-tab-span" style="white-space: pre;">   </span>0.0223<span class="Apple-tab-span" style="white-space: pre;">    </span>0.00%<span class="Apple-tab-span" style="white-space: pre;">     </span>0.0223<br>SingleSource/Benchmarks/BenchmarkGame/partialsums<span class="Apple-tab-span" style="white-space: pre;"> </span>-5.19%<span class="Apple-tab-span" style="white-space: pre;">    </span>0.2969<span class="Apple-tab-span" style="white-space: pre;">    </span>0.2815<span class="Apple-tab-span" style="white-space: pre;">    </span>-<span class="Apple-tab-span" style="white-space: pre;"> </span>0.00%<span class="Apple-tab-span" style="white-space: pre;">     </span>-<br>External/SPEC/CFP2000/188_ammp/188_ammp<span class="Apple-tab-span" style="white-space: pre;">        </span>-1.54%<span class="Apple-tab-span" style="white-space: pre;">    </span>88.2623<span class="Apple-tab-span" style="white-space: pre;">   </span>86.9050<span class="Apple-tab-span" style="white-space: pre;">   </span>0.0823<span class="Apple-tab-span" style="white-space: pre;">    </span>0.00%<span class="Apple-tab-span" style="white-space: pre;">     </span>0.0823<br><br><br><br><br><blockquote type="cite"><br></blockquote><br></blockquote><br>--<span class="Apple-converted-space"> </span><br>Hal Finkel<br>Assistant Computational Scientist<br>Leadership Computing Facility<br>Argonne National Laboratory<br><br>_______________________________________________<br>llvm-commits mailing list<br><a href="mailto:llvm-commits@cs.uiuc.edu">llvm-commits@cs.uiuc.edu</a><br><a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a></div></blockquote></div><br></body></html>