[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass

Tue Jan 24 14:08:07 PST 2012

On Mon, Jan 23, 2012 at 10:13 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> On Tue, 2012-01-17 at 13:25 -0600, Sebastian Pop wrote:
>> Hi,
>>
>> On Fri, Dec 30, 2011 at 3:09 AM, Tobias Grosser <tobias at grosser.es> wrote:
>> > As it seems my intuition is wrong, I am very eager to see and understand
>> > an example where a search limit of 4000 is really needed.
>> >
>>
>> To make the ball roll again, I attached a testcase that can be tuned
>> to understand the impact on compile time for different sizes of a
>> basic block.  One can also set the number of iterations in the loop to
>> 1 to test the vectorizer with no loops around.
>>
>> Hal, could you please report the compile times with/without the
>> vectorizer for different basic block sizes?
>
> I've looked at your test case, and I am pleased to report a negligible
> compile-time increase! Also, there is no vectorization of the main

Good!

> loop :) Here's why: (as you know) the main part of the loop is
> essentially one long dependency chain, and so there is nothing to
> vectorize there. The only vectorization opportunities come from
> unrolling the loop. Using the default thresholds, the loop will not even
> partially unroll (because the body is too large). As a result,
> essentially nothing happens.
>
> I've prepared a reduced version of your test case (attached). Using
> -unroll-threshold=300 (along with -unroll-allow-partial), I can make the
> loop unroll partially (the reduced loop size is 110, so this allows
> unrolling 2 iterations). Once this is done, the vectorizer finds
> candidate pairs and vectorizes [as a practical manner, you need -basicaa
> too].
>
> I think that even this is probably too big for a regression test. I
> don't think that the basic structure really adds anything over existing
> tests (although I need to make sure that alias-analysis use is otherwise
> covered), but I'll copy-and-paste a small portion into a regression test
> to cover the search limit logic (which is currently uncovered). We
> should probably discuss different situations that we'd like to see
> covered in the regression suite (perhaps post-commit).
>
> Thanks for working on this! I'll post an updated patch for review
> shortly.

Thanks for the new patch.

I will send you some more comments on the patch as I'm advancing
through testing: I found some interesting benchmarks in which
enabling vectorization gets the performance down by 80% on ARM.
I will prepare a reduced testcase and try to find out the reason.
As a first shot, I would say that this comes from the vectorization of
code in a loop and the overhead of transfer between scalar and
vector registers.

I would like to not stop you from committing the patch just because
of performance issues: let's address any further improvements once
the patch is installed on tot.

Thanks again,
Sebastian
--
Qualcomm Innovation Center, Inc is a member of Code Aurora Forum