regression on Adobe-C++/loop_unroll

Sat Feb 8 02:51:15 PST 2014

See http://llvm.org/PR18773 for details now that I have a concrete test
case.

It seems the problem is actually eliminating dead stores. For some reason
this is sensitive to whether we sink the load out of the loop?!?! No idea
why yet, will shift to using the bug to track this.
On Fri Feb 07 2014 at 6:33:44 PM, Chandler Carruth <chandlerc at gmail.com>
wrote:

> So, fun story.
>
> LICM is actually getting *more* powerful with this revision. Which causes
> a load to be sunk out of a loop in the second run of the pass pipeline. So
> why does that slow everything down?
>
> Because the backend then duplicates the load back into the loop. Before
> LICM got better, there was a load before the loop and it just sat in a
> register. Afterward, the backend loaded it on each trip.
>
> I almost have this nicely reduced to a reasonably small IR program that
> *really* highlights the performance impact of this transformation in the
> backend. I'll then work on a minimal test case that triggers the bad
> behavior in the backend. All of this should get filed as a PR against the
> x86 backend, but this is not a bug in r200067, or even a bug "uncovered" by
> that commit. This badness has been happening in *many* places for a long
> time, we just never had a test case which showed us one case where it used
> to *not* happen (by accident) and now does with a simple A/B comparison.
>
> On Fri Feb 07 2014 at 2:06:52 PM, Gerolf Hoflehner <ghoflehner at apple.com>
> wrote:
>
>
>
> The key option is -flto. Just O3 does not show the regression.
>
> Compiler version:
> /Users/ghoflehner/dev/regress_builds/build/Debug+Asserts/bin/clang++ -v
> clang version 3.5 (trunk 199970) (llvm/trunk 200067)
> Target: x86_64-apple-darwin13.1.0
> Thread model: posix
>
> Options:
> $CCROOT/bin/clang++ -I/Users/ghoflehner/dev/test_
> builds/test-2014-01-28_04-36-58/SingleSource/Benchmarks/Adobe-C++
> -I/Users/ghoflehner/dev/llvm-test-suite/SingleSource/Benchmarks/Adobe-C++
> -I/Users/ghoflehner/dev/llvm-test-suite/include -I../../../include
> -D_GNU_SOURCE -D__STDC_LIMIT_MACROS -DNDEBUG  -O3 -Wno-implicit-function-declaration
> -Wno-incompatible-pointer-types -flto -isysroot /Applications/Xcode.app/
> Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.9.sdk
> -mavx  -m64 -fomit-frame-pointer -mdynamic-no-pic -c
> /Users/ghoflehner/dev/llvm-test-suite/SingleSource/
> Benchmarks/Adobe-C++/loop_unroll.cpp -o loop_unroll.o
> env DYLD_LIBRARY_PATH=$CCROOT/lib $CCROOT/bin/clang++ -o
> loop_unroll.simple loop_unroll.o -lm -lstdc++  -Wno-implicit-function-declaration
> -Wno-incompatible-pointer-types -isysroot /Applications/Xcode.app/
> Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.9.sdk
> -mavx -m64 -fomit-frame-pointer -mdynamic-no-pic
>
>
> On Feb 7, 2014, at 1:33 AM, Chandler Carruth <chandlerc at gmail.com> wrote:
>
> I've diffed the IR at -O3 before and after r200067 and it is identical
> other than SSA value names that are minutely different.
>
> I think i'll need before/after IR or an exact commandline to produce the
> exact same result you're seeing to make much progress here.
>
> On Fri Feb 07 2014 at 1:16:35 AM, Chandler Carruth <chandlerc at gmail.com>
> wrote:
>
> So this doesn't reproduce for me at all when I just compile and
> loop_unroll.cpp from the test suite for x86 (sandybridge). Not really sure
> what all is required to observe this slowdown.
>
> I've also checked and we have no interesting benchmark regressions on our
> internal benchmarks with that revision... Really weird.
>
> Can you provide before/after bitcode? File a bug as well to track it if
> its this severe?
>
> On Fri Feb 07 2014 at 12:32:34 AM, Chandler Carruth <chandlerc at gmail.com>
> wrote:
>
> It's really weird because the public performance bots don't show this
> regression:
>
> http://llvm.org/perf/db_default/v4/nts/21277
>
> It would be really nice to get the stuff that people have hard
> requirements on posted to the LNT dashboard...
>
> It's also particularly weird because this shouldn't really have changed
> the way LICM works. Looking into it though...
>
> On Thu Feb 06 2014 at 11:19:21 AM, Gerolf Hoflehner <ghoflehner at apple.com>
> wrote:
>
> Hi Chandler
>
> your commit seems to cause a major regression (>20%)  on loop_unroll and
> other benchmarks as well. This is ( at least)  on x86 under -O3 -flto and
> seems to be due to the loss of LICM. Please take a look.
>
> Cheers
> Gerolf
>
> ------------------------------------------------------------------------
> r200067 | chandlerc | 2014-01-24 20:07:24 -0800 (Fri, 24 Jan 2014) | 44
> lines
>
> [LPM] Make LCSSA a utility with a FunctionPass that applies it to all
> the loops in a function, and teach LICM to work in the presance of
> LCSSA.
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140208/c9047d20/attachment.html>