[llvm] r204076 - Use range metadata instead of introducing selects.

Tue Apr 8 12:58:43 PDT 2014

On Apr 8, 2014, at 12:07 PM, Dan Gohman <dan433584 at gmail.com> wrote:

> 
> 
> 
> On Mon, Apr 7, 2014 at 5:09 PM, Andrew Trick <atrick at apple.com> wrote:
> 
> On Mar 25, 2014, at 10:50 AM, Dan Gohman <dan433584 at gmail.com> wrote:
> 
>> 
>> 
>> 
>> On Tue, Mar 25, 2014 at 7:24 AM, Rafael Espíndola <rafael.espindola at gmail.com> wrote:
>> On 25 March 2014 09:49, Dan Gohman <dan433584 at gmail.com> wrote:
>> > Hi Lang,
>> >
>> > I can reproduce the performance regression on fourinarow, at least. With the
>> > patch, the code size and static instruction count of the benchmark's one
>> > embarassingly-hot function is lower, the dynamic instruction count is lower,
>> > and the stack frame is smaller, but it still runs slower. Instruction
>> > selection is basically the same, except that there are fewer cmovs. There
>> > appears to be a minor difference in instruction scheduling in the hot
>> > function. The regression disappeared when I experimented with non-default
>> > values for -pre-RA-sched. However, I'm not prepared for the adventure of
>> > changing the instruction scheduler's heuristics at this time, so I'll just
>> > let this patch go for now.
>> 
>> Do you have a small .ll testcase?
>> 
>> Not handy anymore, but it's just MultiSource/Benchmarks/
>> FreeBench/fourinarow/fourinarow with -O3 -flto on x86-64.
> 
> fourinarow is jittery, sensitive to register pressure, and doesn’t like codegen changes. Were there several other significant regressions and no significant improvements? Were the results overall bad on non -flto builds too? Or did we just have bad luck with LTO? Are there regressions on any real benchmarks?
> 
> There is a very significant improvement in one of my own benchmarks. Also, it's an intuitively appealing patch because 0 selects is nicer than 1 select, and there are no apparent significant downsides. That said, in the LLVM testsuite, there appear to have been several regressions and no improvements.
>  
> Is there any reason to believe this patch is chronically increasing register pressure?
> 
> No.
>  
> The default SD scheduler should be simply preserving IR order. If the patch fundamentally makes sense, and the generated code before register coalescing looks better by simple metrics: dynamic instruction count and critical path, then the only way forward is to file a bug against the register coalescer and MI scheduler (which are often two sides of the same problem).
> 
> The dynamic instruction count was lower. The main difference is that a cmov is removed. I'll make a note to myself to file a bug against the MI scheduler.

Ok. If we know that the number instructions before coalescing is less-or-equal and the spill count is greater in each of the regressions, that enough of a clue to pin it on coalescing/scheduling/regalloc.

I don’t like to prevent the right thing at IR level just because downstream codegen happens to make bad decisions.

-Andy

> Dan
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140408/1a11da45/attachment.html>