[llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!

Thu May 25 13:53:41 PDT 2017

Hi Kristof,

> On May 25, 2017, at 2:09 AM, Kristof Beyls <kristof.beyls at arm.com> wrote:
> 
>> 
>> On 24 May 2017, at 22:01, Quentin Colombet <qcolombet at apple.com <mailto:qcolombet at apple.com>> wrote:
>> 
>> Hi Kristof,
>> 
>> Thanks for going back so fast!
>> 
>>> On May 24, 2017, at 12:57 PM, Kristof Beyls <kristof.beyls at arm.com <mailto:kristof.beyls at arm.com>> wrote:
>>> 
>>>> 
>>>> On 24 May 2017, at 19:31, Quentin Colombet <qcolombet at apple.com <mailto:qcolombet at apple.com>> wrote:
>>>> 
>>>> Hi Kristof,
>>>> 
>>>> Thanks for the measurements.
>>>> 
>>>>> On May 24, 2017, at 6:00 AM, Kristof Beyls <kristof.beyls at arm.com <mailto:kristof.beyls at arm.com>> wrote:
>>>>> 
>>>>>> 
>>>>>>> - Comparing against -O0 without globalisel but with the above regalloc options: 5.6% performance drop, 1% code size drop.
>>>>>>> 
>>>>>>> In summary, the measurements indicate some good improvements.
>>>>>>> I also haven't measure the impact on compile time.
>>>>>> 
>>>>>> Do you have a mean to make this measurement?
>>>>>> Ahmed did a bunch of compile time measurements on our side and I wanted to see if I need to put him on the hook again :).
>>>>> 
>>>>> I did a quick setup with CTMark (part of the test-suite). I ran each of
>>>>> * '-O0 -g',
>>>>> * '-O0 -g -mllvm -global-isel=true -mllvm -global-isel-abort=0', and
>>>>> * '-O0 -g -mllvm -global-isel=true -mllvm -global-isel-abort=0 -mllvm -optimize-regalloc -mllvm -regalloc=greedy'
>>>>> 5 times, cross-compiling from X86 to AArch64, and took the median measured compile times.
>>>>> In summary, I see GlobalISel having a compile time that's 3.5% higher than the current -O0 default.
>>>>> With enabling the greedy register allocator, this increases to 28%.
>>>>> 28% is probably too high?
>>>> 
>>>> I think it is yes.
>>>> I have attached a quick hack to the greedy allocator to feature a fast mode.
>>>> Could you give it a try?
>>>> 
>>>> To enable the fast mode, please use (-mllvm) -regalloc-greedy-fast=true (default is false).
>>> 
>>> I'm afraid it doesn't seem to save much compile time. On geomean, I see about 26% compile time increase against the current -O0 default (compared to 28% increase for regalloc greedy without your patch).
>> 
>> Interesting, I guess a lot of time is spent in the coalescer. Could you give a try with -join-liveintervals=false?
> 
> With adding -join-liveintervals=false, I see the compile time increase going up to 28% again.

Heh, I am mildly surprised we hand much more live-ranges to the allocator when we do that.

> 
>> 
>> Do you know where the time is spent (-time-passes)?
> 
> I'm afraid I won't have time to have a closer look in the next couple of days - I don't know where the time is spent at the moment.

Fair enough, will investigate later.

> 
>> 
>> Anyhow, fixing all of those, although this is I think the right approach, will take time, so we can go with the localizer.
> 
> Right, I don't understand the register allocator well enough to know if that compile time overhead can be fixed, while still getting the necessary codegen benefits the greedy allocator gives.
> Is there any specific help you're looking for with getting the localizer work well enough for production use?

I’ll clean-up the WIP patch for the localizer, then you guys can fix the bug that you found.

I’ll do that tomorrow.

Cheers,
-Quentin

> 
> Thanks,
> 
> Kristof

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170525/d3645aff/attachment.html>