[llvm-dev] [InstCombine] rL292492 affected LoopVectorizer and caused 17.30%/11.37% perf regressions on Cortex-A53/Cortex-A15 LNT machines

Tue Jan 24 09:26:26 PST 2017

On Tue, Jan 24, 2017 at 8:56 AM, Mehdi Amini <mehdi.amini at apple.com> wrote:
>
> On Jan 24, 2017, at 8:55 AM, Sanjay Patel <spatel at rotateright.com> wrote:
>
>
>
> On Tue, Jan 24, 2017 at 9:24 AM, Mehdi Amini <mehdi.amini at apple.com> wrote:
>>
>>
>> On Jan 24, 2017, at 7:18 AM, Sanjay Patel <spatel at rotateright.com> wrote:
>>
>>
>>
>> On Mon, Jan 23, 2017 at 10:53 PM, Mehdi Amini <mehdi.amini at apple.com>
>> wrote:
>>>
>>>
>>> On Jan 23, 2017, at 3:48 PM, Sanjay Patel via llvm-dev
>>> <llvm-dev at lists.llvm.org> wrote:
>>>
>>> All targets are likely affected in some way by the icmp+shl fold
>>> introduced with r292492. It's a basic pattern that occurs in lots of code.
>>> Did you see any perf wins on your targets with this commit?
>>>
>>> Sadly, it is also likely that many (all?) targets are negatively impacted
>>> on the particular test (SingleSource/Benchmarks/Shootout/sieve) that you
>>> have pointed out here because the IR is now decidedly worse.
>>>
>>> IMO, we should not revert the commit because it exposed shortcomings in
>>> the optimizer. It's an "obvious" fold/canonicalization, and the related
>>> 'nuw' variant of this fold has existed in trunk since:
>>> https://reviews.llvm.org/rL285729
>>>
>>> We need to dissect what analysis/folds are missing to restore the IR to
>>> the better form that existed before, but this is probably going to be a long
>>> process because we treat min/max like an optimization fence.
>>>
>>>
>>> If this is gonna be a long process to recover, this looks like something
>>> to be reverted in the 4.0 branch (unless I missed that there is a
>>> correctness fix involved?).
>>>
>>
>> Nope - this is just about perf, not correctness. Of course, the intent was
>> that this transform should only improve perf, so I wonder if we can pin any
>> other perf changes from this commit.
>>
>> I'm new to using the LNT site, but this should be the full set of results
>> for the A53 machine in question with a baseline (r292491) before this patch
>> and current (r292522) :
>> http://llvm.org/perf/db_default/v4/nts/107364
>>
>> If these are reliable results, we have 2 perf wins (puzzle, gramschmidt)
>> on the A53 machine. How do we determine the importance of the sieve
>> benchmark vs. the rest of the suite?
>>
>> An x86 machine doesn't show any regressions from this change:
>> http://llvm.org/perf/db_default/v4/nts/107353
>>
>> Are there target-scope-based guidelines for when something is bad enough
>> to revert?
>>
>>
>> I don’t think we have any guidelines.
>>
>> I think my suggestion was more about other regression that we would
>> discover after the release, it was more of a “maturity” call: if we just
>> notice a problem with the commit right before the release, it may not have
>> been in tree long enough to get enough scrutiny.
>
>
> That makes sense. I have no stake in any particular branch, so I have no
> objection to revert from the release branch if that's what people would like
> to do.
>
>
> +Hans for advising on what to on clang-4.0.

The commit in question, r292492, occurred after the branch point
(r291814), so the branch is not affected by this.

Thanks,
Hans