[llvm-dev] [InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines

Wed Jan 25 09:53:08 PST 2017

Hi Michael,

Thank you for information. You’ve gave me ideas what to look at.
It looks like SCEV might reconstruct things InstCombine have optimized out to create runtime checks.

Thanks,
Evgeny

From: Michael Kuperstein [mailto:mkuper at google.com]
Sent: Tuesday, January 24, 2017 9:46 PM
To: Sanjay Patel
Cc: Evgeny Astigeevich; llvm-dev; nd
Subject: Re: [llvm-dev] [InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines

On Tue, Jan 24, 2017 at 1:20 PM, Sanjay Patel via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:

I started looking at the log files that you attached, and I'm confused. The code that is supposedly causing the perf regression is created by the loop vectorizer, right? Except the bad code is not in the "vector.body", so is there something peculiar about this benchmark that the hot loop is not the vector loop? But there's another mystery: there are no vector ops in the "vector.body"!

I haven't looked at this particular example, but this isn't very rare.

1) The vectorizer actually doubles as an unroller, under some circumstances. So it's conceivable to have a vector.body without vector instructions.

2) From what was pasted above, the "bad" code is in the runtime checks that LV generates.
The general problem is that if we don't know what the loop count of a loop is going to be, we assume it's high, so the cost of the runtime checks is negligible. This fails if we have a loop with a low (but statically unknown) trip-count nested inside a hot loop with a high trip-count. In this case, the overhead from the runtime checks, that end up being inside the outer loop, may cost more than what we gain from vectorization.
To try to avoid this, we have a threshold that prevents us from generating too costly runtime checks, but I guess this particular check is not considered complex enough.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170125/27c83277/attachment.html>