[llvm-commits] [LLVMdev] [PATCH] BasicBlock Autovectorization Pass

Tue Jan 17 12:10:08 PST 2012

On Tue, 2012-01-17 at 13:25 -0600, Sebastian Pop wrote:
> Hi,
> 
> On Fri, Dec 30, 2011 at 3:09 AM, Tobias Grosser <tobias at grosser.es> wrote:
> > As it seems my intuition is wrong, I am very eager to see and understand
> > an example where a search limit of 4000 is really needed.
> >
> 
> To make the ball roll again, 

I will post an updated patch shortly. I have been "stress-testing" the
patch to ensure correctness, and have corrected a few bugs. The most
non-trivial issue that I've discovered was the possibility of generating
non-trivial (meaning > 2 in length) pairing-induced dependency cycles.
To prevent this from happening I've implemented a cycle check, but this
increases the algorithmic complexity of the pair-selection process, and
makes the vectorizer quite slow on some blocks. I can see two ways to
proceed here:
 1. Improve the cycle detection algorithm used (for example, I can use
the algorithm currently used for
ScheduleDAGTopologicalSort::WillCreateCycle, or something similar).
 2. Late abort on non-trivial cycles. This will make the fusing process
more expensive, but will not increase the algorithmic complexity. It
would, however, degrade the quality of the vectorization, because it
would mean that pairs otherwise selected for vectorization might, at the
very end, not end up fused into vector instructions [this seems
relatively rare, so it might not be a big deal].

The best thing may be to implement both and give the user an option of a
fast way and a slower way (maybe this can be done post-commit).

> I attached a testcase that can be tuned
> to understand the impact on compile time for different sizes of a
> basic block.  One can also set the number of iterations in the loop to
> 1 to test the vectorizer with no loops around.
> 

Thanks!

> Hal, could you please report the compile times with/without the
> vectorizer for different basic block sizes?

Absolutely!

> 
> Once this parameter is tuned, could we get this code committed to llvm?

Also, I've re-run the test suite, and with my load/store reordering
patch also applied, a much smaller look-ahead value is optimal. In the
name of solving one issue at a time, I would probably want to commit
with the smaller default.

Thanks again,
Hal

> 
> Thanks,
> Sebastian
> 
> PS: this testcase is also a compile time hog for GCC at -O3 when the
> loop vectorizer is running.
> 
> --
> Qualcomm Innovation Center, Inc is a member of Code Aurora Forum

-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory