[llvm-dev] [InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines

Tue Jan 24 13:46:17 PST 2017

On Tue, Jan 24, 2017 at 1:20 PM, Sanjay Patel via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

>
> I started looking at the log files that you attached, and I'm confused.
> The code that is supposedly causing the perf regression is created by the
> loop vectorizer, right? Except the bad code is not in the "vector.body", so
> is there something peculiar about this benchmark that the hot loop is not
> the vector loop? But there's another mystery: there are no vector ops in
> the "vector.body"!
>
>
I haven't looked at this particular example, but this isn't very rare.

1) The vectorizer actually doubles as an unroller, under some
circumstances. So it's conceivable to have a vector.body without vector
instructions.

2) From what was pasted above, the "bad" code is in the runtime checks that
LV generates.
The general problem is that if we don't know what the loop count of a loop
is going to be, we assume it's high, so the cost of the runtime checks is
negligible. This fails if we have a loop with a low (but statically
unknown) trip-count nested inside a hot loop with a high trip-count. In
this case, the overhead from the runtime checks, that end up being inside
the outer loop, may cost more than what we gain from vectorization.
To try to avoid this, we have a threshold that prevents us from generating
too costly runtime checks, but I guess this particular check is not
considered complex enough.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170124/959d5a70/attachment.html>