I have a super-simple test case 4x4 matrix * 4-vector which gets correctly unrolled, but is not vectorized by -bb-vectorize. (I used llvm 3.1svn)<br>I attached the test case so you can see what is going wrong there.<br><br>

<div class="gmail_quote">2012/2/3 Hal Finkel <span dir="ltr"><<a href="mailto:hfinkel@anl.gov">hfinkel@anl.gov</a>></span><br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

As some of you may know, I committed my basic-block autovectorization<br>

pass a few days ago. I encourage anyone interested to try it out (pass<br>

-vectorize to opt or -mllvm -vectorize to clang) and provide feedback.<br>

Especially in combination with -unroll-allow-partial, I have observed<br>

some significant benchmark speedups, but, I have also observed some<br>

significant slowdowns. I would like to share my thoughts, and hopefully<br>

get feedback, on next steps.<br>

<br>

1. "Target Data" for vectorization - I think that in order to improve<br>

the vectorization quality, the vectorizer will need more information<br>

about the target. This information could be provided in the form of a<br>

kind of extended target data. This extended target data might contain:<br>

 - What basic types can be vectorized, and how many of them will fit<br>

into (the largest) vector registers<br>

 - What classes of operations can be vectorized (division, conversions /<br>

sign extension, etc. are not always supported)<br>

 - What alignment is necessary for loads and stores<br>

 - Is scalar-to-vector free?<br>

<br>

2. Feedback between passes - We may to implement a closer coupling<br>

between optimization passes than currently exists. Specifically, I have<br>

in mind two things:<br>

 - The vectorizer should communicate more closely with the loop<br>

unroller. First, the loop unroller should try to unroll to preserve<br>

maximal load/store alignments. Second, I think it would make a lot of<br>

sense to be able to unroll and, only if this helps vectorization should<br>

the unrolled version be kept in preference to the original. With basic<br>

block vectorization, it is often necessary to (partially) unroll in<br>

order to vectorize. Even when we also have real loop vectorization,<br>

however, I still think that it will be important for the loop unroller<br>

to communicate with the vectorizer.<br>

 - After vectorization, it would make sense for the vectorization pass<br>

to request further simplification, but only on those parts of the code<br>

that it modified.<br>

<br>

3. Loop vectorization - It would be nice to have, in addition to<br>

basic-block vectorization, a more-traditional loop vectorization pass. I<br>

think that we'll need a better loop analysis pass in order for this to<br>

happen. Some of this was started in LoopDependenceAnalysis, but that<br>

pass is not yet finished. We'll need something like this to recognize<br>

affine memory references, etc.<br>

<br>

I look forward to hearing everyone's thoughts.<br>

<span class="HOEnZb"><font color="#888888"><br>

 -Hal<br>

<br>

--<br>

Hal Finkel<br>

Postdoctoral Appointee<br>

Leadership Computing Facility<br>

Argonne National Laboratory<br>

<br>

_______________________________________________<br>

LLVM Developers mailing list<br>

<a href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a>         <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>

<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>

</font></span></blockquote></div><br>