[llvm-dev] [GlobalISel][AArch64] Toward flipping the switch for O0: Please give it a try!

Kristof Beyls via llvm-dev llvm-dev at lists.llvm.org
Mon May 22 00:51:06 PDT 2017


> On 22 May 2017, at 09:09, Diana Picus <diana.picus at linaro.org> wrote:
> 
> Hi Quentin,
> 
> I actually did a run with -mllvm -optimize-regalloc -mllvm
> -regalloc=greedy over the weekend and the test does pass with that.
> Haven't measured the compile time though.
> 
> Cheers,
> Diana

I also did my usual benchmarking run with the same options as Diana did above:
- Comparing against -O0 without globalisel: 2.5% performance drop, 0.8% code size improvement.
- Comparing against -O0 without globalisel but with the above regalloc options: 5.6% performance drop, 1% code size drop.

In summary, the measurements indicate some good improvements.
I also haven't measure the impact on compile time.

Doing a few quick spot checks on the generated code, I still see some constants being created in the entry BB and stored on the stack.
I haven't tried to investigate if the entry BB seems like a good place to materialize those remaining constants.

Thanks,

Kristof

> 
> On 19 May 2017 at 19:06, Quentin Colombet <qcolombet at apple.com> wrote:
>> Hi Diana,
>> 
>> On May 19, 2017, at 1:33 AM, Diana Picus <diana.picus at linaro.org> wrote:
>> 
>> On 18 May 2017 at 19:09, Quentin Colombet <qcolombet at apple.com> wrote:
>> 
>> Hi Diana,
>> 
>> On May 18, 2017, at 1:15 AM, Diana Picus <diana.picus at linaro.org> wrote:
>> 
>> On 18 May 2017 at 09:06, Kristof Beyls <Kristof.Beyls at arm.com> wrote:
>> 
>> I think Diana found that when enabling r299283, the bootstrap failed with
>> llvm-tblgen segfaulting.
>> So there clearly is some work required there.
>> 
>> 
>> Indeed.
>> 
>> @Quentin, what is the status of that patch?
>> 
>> 
>> Initially it was meant as a prototype to investigate how we could address
>> those issues at O0. I didn’t mean to publish it.
>> 
>> Have you been working on
>> it since then?
>> 
>> 
>> No, I haven’t touched it and I honestly didn’t plan to do that.
>> 
>> Should I investigate the segfault more and send you a
>> reproducer?
>> 
>> 
>> Depends, do you guys want to pursue in that direction?
>> 
>> 
>> Not necessarily, if you think a better reg alloc scheme will do the
>> trick then I see no point in complicating the pass pipeline for now.
>> 
>> My though was that it is probably best to rely on a better reg allocator
>> (basic or greedy) scheme for O0. I believe the only concern may be compile
>> time but it may  just fly (+ I have ideas how to make it better pretty
>> easy).
>> 
>> Is this patch the way forward, or do you have other ideas
>> for reducing the stack usage?
>> 
>> 
>> Better reg alloc scheme for O0 :).
>> 
>> 
>> Ok, let's go with that then :)
>> 
>> 
>> Sounds good, let me push a patch that would allow to use the greedy
>> allocator at O0 and see if that’s sufficient. Then, we will look at the
>> compile time that change will imply.
>> 
>> Cheers,
>> -Quentin
>> 
>> 
>> 
>> 
>> Thanks,
>> Diana
>> 
>> 
>> Thanks,
>> 
>> Kristof
>> 
>> 
>> On 16 May 2017, at 14:06, Diana Picus via llvm-dev <llvm-dev at lists.llvm.org>
>> wrote:
>> 
>> Turns out it really is a GlobalISel issue - we eat up too much stack
>> space because all the constants used in the switches are stored on the
>> stack. We need to fix this somehow before changing the default. I'll
>> try to give it a run with Quentin's r299283 on top to see if it helps.
>> 
>> Cheers,
>> Diana
>> 
>> On 15 May 2017 at 09:38, Diana Picus <diana.picus at linaro.org> wrote:
>> 
>> Got another one: https://bugs.llvm.org/show_bug.cgi?id=33036
>> 
>> I'm still investigating whether this is an actual GlobalISel issue or
>> something else (I'll also start a build on a more recent revision to
>> see how that behaves).
>> 
>> On 12 May 2017 at 20:06, Eric Christopher <echristo at gmail.com> wrote:
>> 
>> Agreed. That was probably just luck before :)
>> 
>> -eric
>> 
>> On Fri, May 12, 2017 at 5:22 AM Diana Picus via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>> 
>> 
>> It turns out that can be fixed by adding -lm to the link line, so I
>> will probably convert it into a test-suite bug.
>> 
>> I don't suppose it's crucial to handle the fabs intrinsic nicely at -O0.
>> 
>> On 12 May 2017 at 13:44, Diana Picus <diana.picus at linaro.org> wrote:
>> 
>> Hi,
>> 
>> I ran into a little snag on the test-suite:
>> https://bugs.llvm.org/show_bug.cgi?id=33021
>> It boils down to GlobalISel generating calls to fabs instead of using
>> FABSDr (so we get undefined references).
>> 
>> Cheers,
>> Diana
>> 
>> On 11 May 2017 at 18:40, Quentin Colombet <qcolombet at apple.com> wrote:
>> 
>> Hi Diana,
>> 
>> Thanks for the summary.
>> 
>> On May 11, 2017, at 4:01 AM, Diana Picus <diana.picus at linaro.org>
>> wrote:
>> 
>> Hi all,
>> 
>> I'm still running some validation on this, I'll send an email when
>> it's done. If that goes well I don't have anything against making the
>> switch.
>> 
>> For the record, here's a summary of issues that were deferred for
>> later on (some of which are optimization-ish and we might decide to
>> never do at -O0 at all):
>> * Crash in RegBankSelect for half fp types:
>> https://bugs.llvm.org/show_bug.cgi?id=32560
>> 
>> 
>> I’ll have a look.
>> 
>> * Improving constant placement:
>> http://bugs.llvm.org/show_bug.cgi?id=32561
>> 
>> 
>> I’ve commented in the PR to mention the localizer technic I was playing
>> with, if someone wants to give it a try.
>> 
>> * Fancy switch lowering
>> * Transforming division-by-constant-power-of-2 into right shift
>> 
>> 
>> AFAICT all the other issues that were brought up were fixed (yay!).
>> 
>> Cheers,
>> Diana
>> 
>> 
>> Cheers,
>> -Quentin
>> 
>> 
>> 
>> On 11 May 2017 at 08:44, Kristof Beyls via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>> 
>> 
>> On 10 May 2017, at 17:36, Quentin Colombet <qcolombet at apple.com> wrote:
>> 
>> 
>> MultiSource/Benchmarks/mediabench/mpeg2/mpeg2dec/mpeg2decode (46%):
>> Function
>> Reference_IDCT: Probably due to creating all constants in the entry BB
>> +
>> spilling floating point data through an X register:
>> 
>> FastISel:
>> fadd d0, d1, d0
>> str d0, [sp,#528]
>> GlobalISel:
>> fadd d0, d1, d0
>> fmov x9, d0
>> stur x9, [x29,#-48]
>> 
>> 
>> Good finding, I forgot to do stores in my previous fix. I’ll do them
>> shortly.
>> 
>> 
>> Should be fixed by r302679
>> 
>> 
>> Thanks Quentin,
>> 
>> That reduces the slow-down when enabling globalisel at -O0 from 13% (on
>> r302453)  to 9.5% (on r302679) in my experiments.
>> The code size increase also reduces from just over 3% to 2.8%.
>> 
>> Kristof
>> 
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> 
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> 
>> 



More information about the llvm-dev mailing list