[llvm] r204076 - Use range metadata instead of introducing selects.

Tue Apr 8 12:07:12 PDT 2014

On Mon, Apr 7, 2014 at 5:09 PM, Andrew Trick <atrick at apple.com> wrote:

>
> On Mar 25, 2014, at 10:50 AM, Dan Gohman <dan433584 at gmail.com> wrote:
>
>
>
>
> On Tue, Mar 25, 2014 at 7:24 AM, Rafael Espíndola <
> rafael.espindola at gmail.com> wrote:
>
>> On 25 March 2014 09:49, Dan Gohman <dan433584 at gmail.com> wrote:
>> > Hi Lang,
>> >
>> > I can reproduce the performance regression on fourinarow, at least.
>> With the
>> > patch, the code size and static instruction count of the benchmark's one
>> > embarassingly-hot function is lower, the dynamic instruction count is
>> lower,
>> > and the stack frame is smaller, but it still runs slower. Instruction
>> > selection is basically the same, except that there are fewer cmovs.
>> There
>> > appears to be a minor difference in instruction scheduling in the hot
>> > function. The regression disappeared when I experimented with
>> non-default
>> > values for -pre-RA-sched. However, I'm not prepared for the adventure of
>> > changing the instruction scheduler's heuristics at this time, so I'll
>> just
>> > let this patch go for now.
>>
>> Do you have a small .ll testcase?
>>
>
> Not handy anymore, but it's just MultiSource/Benchmarks/
> FreeBench/fourinarow/fourinarow with -O3 -flto on x86-64.
>
>
> fourinarow is jittery, sensitive to register pressure, and doesn't like
> codegen changes. Were there several other significant regressions and no
> significant improvements? Were the results overall bad on non -flto builds
> too? Or did we just have bad luck with LTO? Are there regressions on any
> real benchmarks?
>

There is a very significant improvement in one of my own benchmarks. Also,
it's an intuitively appealing patch because 0 selects is nicer than 1
select, and there are no apparent significant downsides. That said, in the
LLVM testsuite, there appear to have been several regressions and no
improvements.

> Is there any reason to believe this patch is chronically increasing
> register pressure?
>

No.

> The default SD scheduler should be simply preserving IR order. If the
> patch fundamentally makes sense, and the generated code before register
> coalescing looks better by simple metrics: dynamic instruction count and
> critical path, then the only way forward is to file a bug against the
> register coalescer and MI scheduler (which are often two sides of the same
> problem).
>

The dynamic instruction count was lower. The main difference is that a cmov
is removed. I'll make a note to myself to file a bug against the MI
scheduler.

Dan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140408/de24da35/attachment.html>