[llvm-commits] [PATCH] BasicBlock Autovectorization Pass
Hal Finkel
hfinkel at anl.gov
Tue Nov 1 16:54:54 PDT 2011
On Tue, 2011-11-01 at 16:59 -0500, Hal Finkel wrote:
> On Tue, 2011-11-01 at 19:19 +0000, Tobias Grosser wrote:
> > On 11/01/2011 06:32 PM, Hal Finkel wrote:
> > > Any objections to me committing this? [And some relevant docs changes] I
> > > think that it is ready at this point.
> >
> > First of all. I think it is great to see work starting on an
> > autovectorizer for LLVM. Unfortunately I did not have time to test your
> > vectorizer pass intensively, but here my first comments:
> >
> > 1. This patch breaks the --enable-shared/BUILD_SHARED_LIBS build. The
> > following patch fixes this for cmake:
> > 0001-Add-vectorizer-to-libraries-used-by-Transforms-IPO.patch
> >
>
> Thanks!
>
> > Can you check the autoconf build with --enable-shared?
>
> I will check.
This appears to work as it should.
>
> >
> > 2. Did you run this pass on the llvm test-suite? Does your vectorizer
> > introduce any correctness regressions? What are the top 10 compile
> > time increases/decreases. How about run time?
> >
>
> I'll try to get this setup and post the results.
>
> > 3. I did not really test this intensively, but I had the feeling the
> > compile time increase for large basic blocks is quite a lot.
> > I still need to extract a test case. Any comments on the complexity
> > of your vectorizer?
>
> This may very will be true. As is, I would not recommend activating this
> pass by default (at -O3) because it is fairly slow and the resulting
> performance increase, while significant in many cases, is not large
> enough to, IMHO, justify the extra base compile-time increase. Ideally,
> this kind of vectorization should be the "vectorizer of last resort" --
> the pass that tries really hard to squeeze the last little bit of
> vectorization possible out of the code. At the moment, it is all that we
> have, but I hope that will change. I've not yet done any real profiling,
> so I'll hold off on commenting about future performance improvements.
>
> Base complexity is a bit difficult, there are certainly a few stages,
> including that initial one, that are O(n^2), where n is the number of
> instructions in the block. The "connection-finding" stage should also be
> O(n^2) in practice, but is really iterating over instruction-user pairs
> and so could be worse in pathological cases. Note, however, that in the
> latter stages, that n^2 is not the number of instructions in the block,
> but rather the number of (unordered) candidate instruction pairs (which
> is going to be must less than the n^2 from just the number of
> instructions in the block). It should be possible to generate a
> compile-time scaling plot by taking a loop and compiling it with partial
> unrolling, looking at how the compile time changes with the unrolling
> limit; I'll try and so that.
So for this test, I ran:
time opt -S -O3 -unroll-allow-partial -vectorize -o /dev/null q.ll
where q.ll contains the output from clang -O3 of the vbor function from
the benchmarks I've been posting recently. The first column is the value
of -unroll-threshold, the second column is the time with vectorization,
and the third column is the time without vectorization (time in seconds
for a release build).
100 0.030 0.000
200 0.130 0.030
300 0.770 0.030
400 1.240 0.040
500 1.280 0.050
600 9.450 0.060
700 29.300 0.060
I am not sure why the 400 and 500 times are so close. Obviously, it is
not linear ;) I am not sure that enumerating the possible pairings can
be done in a sub-quadratic way, but I will do some profiling and see if
I can make things better. To be fair, this test creates a kind of a
worse-case scenario: an increasingly large block of instructions, almost
all of which are potentially fusable.
It may also be possible to design additional heuristics to help the
situation. For example, we might introduce a target chain length such
that if the vectorizer finds a chain of a given length, it selects it,
foregoing the remainder of the search for the selected starting
instruction. This kind of thing will require further research and
testing.
-Hal
>
> I'm writing a paper on the vectorizer, so within a few weeks there will
> be a very good description (complete with diagrams) :)
>
> >
> > I plan to look into your vectorizer during the next couple of
> > days/weeks, but will most probably not have the time to do this tonight.
> > Sorry. :-(
>
> Not a problem; it seems that I have some homework to do first ;)
>
> Thanks,
> Hal
>
> >
> > Cheers
> > Tobi
>
--
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory
More information about the llvm-commits
mailing list