[LLVMdev] [RFC] AArch64: Should we disable GlobalMerge?
Jim Grosbach
grosbach at apple.com
Fri Feb 27 12:42:14 PST 2015
Hi Kristof,
Our tests are on iOS, which definitely uses the LOH optimizations for ARM64.
-Jim
> On Feb 26, 2015, at 2:33 AM, Kristof Beyls <kristof.beyls at arm.com> wrote:
>
> Hi Ahmed,
>
> Did you run these experiments on a platform with a linker that makes
> use of the AArch64CollectLOH-pass-produced information?
> I'm guessing that the AArch64CollectLOH-pass information and a linker
> that makes use of that information could affect the profitability of
> the GlobalMerge pass?
>
> Thanks,
>
> Kristof
>
>> -----Original Message-----
>> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu]
>> On Behalf Of Ahmed Bougacha
>> Sent: 26 February 2015 01:13
>> To: LLVM Dev
>> Subject: Re: [LLVMdev] [RFC] AArch64: Should we disable GlobalMerge?
>>
>> With the numbers!
>> -Ahmed
>>
>>
>> On Wed, Feb 25, 2015 at 4:57 PM, Ahmed Bougacha
>> <ahmed.bougacha at gmail.com> wrote:
>>> Hi all,
>>>
>>> I've started looking at the GlobalMerge pass, enabled by default on
>>> ARM and AArch64. I think we should reconsider that, at least for
>>> AArch64.
>>>
>>> As is, the pass just merges all globals together, in groups of 4KB
>>> (AArch64, 128B on ARM).
>>>
>>> At the time it was enabled, the general thinking was "it's almost
>>> free, it doesn't affect performance much, we might as well use it".
>>> Now, it's preventing some link-time optimizations (as acknowledged in
>>> one of the FIXMEs).
>>>
>>>
>>> -- Performance impact
>>> Overall, it isn't that profitable on the test-suite, and actually
>>> degrades performance on a lot of other - "non-benchmark" - projects I
>>> tried (where the main reason to use a global is file- or function-
>>> static variables, only accessed through a single getter function).
>>>
>>> Across several runs on the entire test-suite, when disabling the pass,
>>> I measured:
>>> without LTO, a -0.19% geomean improvement with LTO, a +0.11% geomean
>>> regression.
>>>
>>> As for just SPEC2006, there are two big regressions: 400.perlbench
>>> (10.6% w/ LTO, 2.7% w/o) and 471.omnetpp (2.3% w/, 3.9% w/o).
>>>
>>> Numbers are attached.
>>>
>>>
>>> -- A way forward
>>> One obvious way to improve it is: look at uses of globals, and try to
>>> form sets of globals commonly used together. The tricky part is to
>>> define heuristics for "commonly". Also, the pass then becomes much
>>> more expensive. I'm currently looking into improving it, and will
>>> report if I come up with a good solution. But this shouldn't stop us
>>> from disabling it, for now.
>>>
>>> Also, the pass seems like a good candidate for
>>> -O3/CodeGenOpt::Aggressive. However, the latter is implied by LTO,
>>> which IMO shouldn't include these not-always-profitable optimizations.
>>> That's another problem though.
>>>
>>>
>>>
>>> Right now, I think we should disable the pass by default, until it's
>>> deemed profitable enough.
>>>
>>> -Ahmed
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
More information about the llvm-dev
mailing list