[llvm-dev] [InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines
Michael Kuperstein via llvm-dev
llvm-dev at lists.llvm.org
Tue Jan 24 13:50:02 PST 2017
Also, the difference between X86 and AArch64 is that on X86, the vectorizer
decides not to unroll.
On Tue, Jan 24, 2017 at 1:46 PM, Michael Kuperstein <mkuper at google.com>
> On Tue, Jan 24, 2017 at 1:20 PM, Sanjay Patel via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>> I started looking at the log files that you attached, and I'm confused.
>> The code that is supposedly causing the perf regression is created by the
>> loop vectorizer, right? Except the bad code is not in the "vector.body", so
>> is there something peculiar about this benchmark that the hot loop is not
>> the vector loop? But there's another mystery: there are no vector ops in
>> the "vector.body"!
> I haven't looked at this particular example, but this isn't very rare.
> 1) The vectorizer actually doubles as an unroller, under some
> circumstances. So it's conceivable to have a vector.body without vector
> 2) From what was pasted above, the "bad" code is in the runtime checks
> that LV generates.
> The general problem is that if we don't know what the loop count of a loop
> is going to be, we assume it's high, so the cost of the runtime checks is
> negligible. This fails if we have a loop with a low (but statically
> unknown) trip-count nested inside a hot loop with a high trip-count. In
> this case, the overhead from the runtime checks, that end up being inside
> the outer loop, may cost more than what we gain from vectorization.
> To try to avoid this, we have a threshold that prevents us from generating
> too costly runtime checks, but I guess this particular check is not
> considered complex enough.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev