[LLVMdev] [RFC] AArch64: Should we disable GlobalMerge?

Fri Feb 27 13:26:34 PST 2015

On Thu, Feb 26, 2015 at 2:33 AM, Kristof Beyls <kristof.beyls at arm.com> wrote:
>
> Hi Ahmed,
>
> Did you run these experiments on a platform with a linker that makes
> use of the AArch64CollectLOH-pass-produced information?

As Jim says, I'm on iOS, so yes.  However, I'm mostly running tests
with the pass disabled.

>
> I'm guessing that the AArch64CollectLOH-pass information and a linker
> that makes use of that information could affect the profitability of
> the GlobalMerge pass?

It could, and does, from what I've seen (beware anecdata):
- reusing the adrp base prevents optimizing it (the various
Adrp*{ldr,str} LOHs).
- reusing the adrp+add MergedGlobal pointer, with indexed addressing,
doesn't prevent the AdrpAdd optimization.

All in all, whether GlobalMerge is profitable or not (by increasing
register pressure, or adding another indirection), whenever the LOH
optimizations fire, they reduce its usefulness.

AFAICT, the only case where LOHs help GlobalMerge is when the
MergedGlobal base is closer to the adrp sequence than the actual
global.  Given that we only merge 4k of globals, on a 1MB range this
doesn't happen very often.

Which brings us to my fallback proposal:  what about disabling the
pass on darwin only?  Various darwin-enabled features (e.g., LOHs)
help mitigate the adrp problem, and global usage is usually frowned
upon in those circles (except for singletons, class-/function-statics
and whatnot, which I'm trying to address in an upcoming patch).

As for other targets, as a first step, making the pass run under -O3
rather than -O1 is hopefully agreeable to everyone?  After all, it is
"aggressive", and isn't always profitable.  That's pretty much the
description of -O3.
We can still run into problematic cases under LTO, though.

-Ahmed

>
> Thanks,
>
> Kristof
>
> > -----Original Message-----
> > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu]
> > On Behalf Of Ahmed Bougacha
> > Sent: 26 February 2015 01:13
> > To: LLVM Dev
> > Subject: Re: [LLVMdev] [RFC] AArch64: Should we disable GlobalMerge?
> >
> > With the numbers!
> > -Ahmed
> >
> >
> > On Wed, Feb 25, 2015 at 4:57 PM, Ahmed Bougacha
> > <ahmed.bougacha at gmail.com> wrote:
> > > Hi all,
> > >
> > > I've started looking at the GlobalMerge pass, enabled by default on
> > > ARM and AArch64.  I think we should reconsider that, at least for
> > > AArch64.
> > >
> > > As is, the pass just merges all globals together, in groups of 4KB
> > > (AArch64, 128B on ARM).
> > >
> > > At the time it was enabled, the general thinking was "it's almost
> > > free, it doesn't affect performance much, we might as well use it".
> > > Now, it's preventing some link-time optimizations (as acknowledged in
> > > one of the FIXMEs).
> > >
> > >
> > > -- Performance impact
> > > Overall, it isn't that profitable on the test-suite, and actually
> > > degrades performance on a lot of other - "non-benchmark" - projects I
> > > tried (where the main reason to use a global is file- or function-
> > > static variables, only accessed through a single getter function).
> > >
> > > Across several runs on the entire test-suite, when disabling the pass,
> > > I measured:
> > > without LTO, a -0.19% geomean improvement with LTO, a +0.11% geomean
> > > regression.
> > >
> > > As for just SPEC2006, there are two big regressions: 400.perlbench
> > > (10.6% w/ LTO, 2.7% w/o) and 471.omnetpp (2.3% w/, 3.9% w/o).
> > >
> > > Numbers are attached.
> > >
> > >
> > > -- A way forward
> > > One obvious way to improve it is: look at uses of globals, and try to
> > > form sets of globals commonly used together.  The tricky part is to
> > > define heuristics for "commonly".  Also, the pass then becomes much
> > > more expensive.  I'm currently looking into improving it, and will
> > > report if I come up with a good solution.  But this shouldn't stop us
> > > from disabling it, for now.
> > >
> > > Also, the pass seems like a good candidate for
> > > -O3/CodeGenOpt::Aggressive.  However, the latter is implied by LTO,
> > > which IMO shouldn't include these not-always-profitable optimizations.
> > > That's another problem though.
> > >
> > >
> > >
> > > Right now, I think we should disable the pass by default, until it's
> > > deemed profitable enough.
> > >
> > > -Ahmed
>
>
>