[llvm-commits] [PATCH] BasicBlock Autovectorization Pass

Tobias Grosser tobias at grosser.es
Tue Nov 8 03:12:24 PST 2011


On 11/08/2011 11:45 AM, Hal Finkel wrote:
> I've attached the latest version of my autovectorization patch.
>
> Working through the test suite has proved to be a productive
> experience ;) -- And almost all of the bugs that it revealed have now
> been fixed. There are still two programs that don't compile with
> vectorization turned on, and I'm working on those now, but in case
> anyone feels like playing with vectorization, this patch will probably
> work for you.

Hey Hal,

those are great news. Especially as the numbers seem to show that 
vectorization has a significant performance impact. What did you compare 
exactly. 'clang -O3' against 'clang -O3 -mllvm -vectorize'?

> The largest three performance speedups are:
> SingleSource/Benchmarks/BenchmarkGame/puzzle - 59.2% speedup
> SingleSource/UnitTests/Vector/multiplies - 57.7% speedup
> SingleSource/Benchmarks/Misc/flops-7 - 50.75% speedup
>
> The largest three performance slowdowns are:
> MultiSource/Benchmarks/MiBench/security-rijndael/security-rijndael -
> 114% slowdown
> MultiSource/Benchmarks/MiBench/network-patricia/network-patricia - 66.6%
> slowdown
> SingleSource/Benchmarks/Misc/flops-8 - 64.2% slowdown
>
Interesting. Do you understand what causes these slowdowns? Can your 
heuristic be improved?

> Largest three compile-time slowdowns:
> MultiSource/Benchmarks/MiBench/security-rijndael/security-rijndael -
> 1276% slowdown
> SingleSource/Benchmarks/Misc/salsa20 - 1000% slowdown
> MultiSource/Benchmarks/Trimaran/enc-3des/enc-3des - 508% slowdown

Yes, that is a lot. Do you understand if this time is invested well 
(does it give significant speedups)?

If I understood correctly it seems your vectorizer has quadratic 
complexity which may cause large slowdowns. Do you think it may be 
useful/possible to make it linear by introducing a constant upper bound 
somewhere? E.g. limiting it to 10/20/100 steps. Maybe we are lucky and 
most of the vectorization opportunities are close by (in some sense), 
such that we get most of the speedup by locking at a subset of the problem.

> Not everything slows down, MultiSource/Benchmarks/Prolangs-C
> ++/city/city, for example, compiles 10% faster with vectorization
> enabled; but, for the most part, things certainly take longer to compile
> with vectorization enabled. The average slowdown over all tests was 29%,
> the median was 11%. On the other hand, the average speedup over all
> tests was 5.2%, the median was 1.3%.
Nice. I think this is a great start.

> Compared to previous patches, which had a minimum required chain length
> of 3 or 4, I've now made the default 6. While using a chain length of 4
> worked well for targeted benchmarks, it caused an overall slowdown on
> almost all test-suite programs. Using a minimum length of 6 causes, on
> average, a speedup; so I think that is a better default choice.

I also try to understand if it is possible to use your vectorizer for 
Polly. My idea is to do some clever loop unrolling.

Starting from this loop.

for (int i = 0; i < 4; i++)
    A[i] += 1;
    A[i] = B[i] + 3;
    C[i] = A[i];

The classical unroller would create this code:

    A[0] += 1;
    A[0] = B[i] + 3;
    C[0] = A[i];

    A[1] += 1;
    A[1] = B[i] + 3;
    C[1] = A[i];

    A[2] += 1;
    A[2] = B[i] + 3;
    C[2] = A[i];

    A[3] += 1;
    A[3] = B[i] + 3;
    C[3] = A[i];

However, in case I can prove this loop is parallel, I want to create 
this code:

    A[0] += 1;
    A[1] += 1;
    A[2] += 1;
    A[3] += 1;

    A[0] = B[i] + 3;
    A[1] = B[i] + 3;
    A[2] = B[i] + 3;
    A[3] = B[i] + 3;

    C[0] = A[i];
    C[1] = A[i];
    C[2] = A[i];
    C[3] = A[i];

I assume this will allow the vectorization of test cases, that failed 
because of possible aliasing. However, I am more interested, if the
execution order change could also improve the vectorization outcome or 
reduce compile time overhead of your vectorizer.

Thanks for working on the vectorization
Cheers

Tobi








More information about the llvm-commits mailing list