[LLVMdev] Loop unrolling opportunity in SPEC's libquantum with profile info

Tue Jan 21 14:01:45 PST 2014

Just to add a few notes...

On Tue, Jan 21, 2014 at 1:31 PM, Andrew Trick <atrick at apple.com> wrote:

> Chandler suggested a way around the problem. I'll work on that first.
>
>
> It is very difficult to deal with the LoopPassManager. The concept doesn’t
> fit with typical loop passes, which may need to rerun function level
> analyses, and can affect code outside the current loop. One nice thing
> about it is that a pass can push a new loop onto the queue and rerun all
> managed loop passes just on that loop. In the past I’ve seen efforts to
> consolidate multiple loop passes into the same LoopPassManager, but don’t
> really understand the benefit other than reducing the one-time overhead of
> instantiating multiple pass managers and some minor IR access locality
> improvements. Maybe Chandler’s work will help with the overhead of
> instantiating pass managers.
>

Maybe, but currently I don't really understand the ideal state for loop
pass pipeline formation. Maybe I'll sit down with you, Owen, and/or others
that are more deeply involved in the loop passes to get a good design in my
head. But we're not quite there yet. =]

>
> Anyway, I see no reason that the vectorizer shouldn’t run in its own loop
> pass manager.
>

It turns out, this is all moot - the loop vectorizer *already* doesn't run
as part of these loop passes. It runs on its own during the late
optimization phase.

If it is a loop pass at all, I would just suggest making it a function pass
and being done with it. Then there will be no problems here. That make
sense to you guys as well?

I’m sure we could be more aggressive than we are currently at O3. If the
> vectorizer unroller’s conditions are too strict, we could consider calling
> the standard partial unroller only on the loops that didn’t get vectorized.
>

I think we can be more aggressive across the board. I've got a pretty good
conservative benchmark suite and rig for increasing the aggressiveness of
these kinds of optimizations (it is very sensitive to code size and
consists of macro benchmarks that aren't usually bottlenecked on a single
loop). I'm going to try some basic relaxing of the unrolling thresholds in
the loop vectorizer to see if there is at least an *obvious* better
baseline level. Then we can see if there is more detailed tuning to do.

However, I'd like to do that after the if-conversion issue is fixed if
possible. I have bandwidth to do the evaluation of thresholds, but not
really to hack on the if-conversion stuff. Any guesses if you guys will
have time to look at that aspect?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140121/cc2b77d5/attachment.html>