[LLVMdev] [RFC] AArch64: Should we disable GlobalMerge?

Eric Christopher echristo at gmail.com
Fri Feb 27 13:42:56 PST 2015


On Fri, Feb 27, 2015 at 1:38 PM Ahmed Bougacha <ahmed.bougacha at gmail.com>
wrote:

> On Thu, Feb 26, 2015 at 2:33 AM, Kristof Beyls <kristof.beyls at arm.com>
> wrote:
> >
> > Hi Ahmed,
> >
> > Did you run these experiments on a platform with a linker that makes
> > use of the AArch64CollectLOH-pass-produced information?
>
> As Jim says, I'm on iOS, so yes.  However, I'm mostly running tests
> with the pass disabled.
>
> >
> > I'm guessing that the AArch64CollectLOH-pass information and a linker
> > that makes use of that information could affect the profitability of
> > the GlobalMerge pass?
>
> It could, and does, from what I've seen (beware anecdata):
> - reusing the adrp base prevents optimizing it (the various
> Adrp*{ldr,str} LOHs).
> - reusing the adrp+add MergedGlobal pointer, with indexed addressing,
> doesn't prevent the AdrpAdd optimization.
>
> All in all, whether GlobalMerge is profitable or not (by increasing
> register pressure, or adding another indirection), whenever the LOH
> optimizations fire, they reduce its usefulness.
>
> AFAICT, the only case where LOHs help GlobalMerge is when the
> MergedGlobal base is closer to the adrp sequence than the actual
> global.  Given that we only merge 4k of globals, on a 1MB range this
> doesn't happen very often.
>
>
>
> Which brings us to my fallback proposal:  what about disabling the
> pass on darwin only?  Various darwin-enabled features (e.g., LOHs)
> help mitigate the adrp problem, and global usage is usually frowned
> upon in those circles (except for singletons, class-/function-statics
> and whatnot, which I'm trying to address in an upcoming patch).
>
>
Before making the disabling darwin only I'd like to see some analysis of
the regressions/improvements. Has anyone looked at the code for those yet?



> As for other targets, as a first step, making the pass run under -O3
> rather than -O1 is hopefully agreeable to everyone?  After all, it is
> "aggressive", and isn't always profitable.  That's pretty much the
> description of -O3.
> We can still run into problematic cases under LTO, though.
>
>
Seems reasonable to me, but probably want to see what happens with the
above questions first.

-eric


> -Ahmed
>
> >
> > Thanks,
> >
> > Kristof
> >
> > > -----Original Message-----
> > > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu]
> > > On Behalf Of Ahmed Bougacha
> > > Sent: 26 February 2015 01:13
> > > To: LLVM Dev
> > > Subject: Re: [LLVMdev] [RFC] AArch64: Should we disable GlobalMerge?
> > >
> > > With the numbers!
> > > -Ahmed
> > >
> > >
> > > On Wed, Feb 25, 2015 at 4:57 PM, Ahmed Bougacha
> > > <ahmed.bougacha at gmail.com> wrote:
> > > > Hi all,
> > > >
> > > > I've started looking at the GlobalMerge pass, enabled by default on
> > > > ARM and AArch64.  I think we should reconsider that, at least for
> > > > AArch64.
> > > >
> > > > As is, the pass just merges all globals together, in groups of 4KB
> > > > (AArch64, 128B on ARM).
> > > >
> > > > At the time it was enabled, the general thinking was "it's almost
> > > > free, it doesn't affect performance much, we might as well use it".
> > > > Now, it's preventing some link-time optimizations (as acknowledged in
> > > > one of the FIXMEs).
> > > >
> > > >
> > > > -- Performance impact
> > > > Overall, it isn't that profitable on the test-suite, and actually
> > > > degrades performance on a lot of other - "non-benchmark" - projects I
> > > > tried (where the main reason to use a global is file- or function-
> > > > static variables, only accessed through a single getter function).
> > > >
> > > > Across several runs on the entire test-suite, when disabling the
> pass,
> > > > I measured:
> > > > without LTO, a -0.19% geomean improvement with LTO, a +0.11% geomean
> > > > regression.
> > > >
> > > > As for just SPEC2006, there are two big regressions: 400.perlbench
> > > > (10.6% w/ LTO, 2.7% w/o) and 471.omnetpp (2.3% w/, 3.9% w/o).
> > > >
> > > > Numbers are attached.
> > > >
> > > >
> > > > -- A way forward
> > > > One obvious way to improve it is: look at uses of globals, and try to
> > > > form sets of globals commonly used together.  The tricky part is to
> > > > define heuristics for "commonly".  Also, the pass then becomes much
> > > > more expensive.  I'm currently looking into improving it, and will
> > > > report if I come up with a good solution.  But this shouldn't stop us
> > > > from disabling it, for now.
> > > >
> > > > Also, the pass seems like a good candidate for
> > > > -O3/CodeGenOpt::Aggressive.  However, the latter is implied by LTO,
> > > > which IMO shouldn't include these not-always-profitable
> optimizations.
> > > > That's another problem though.
> > > >
> > > >
> > > >
> > > > Right now, I think we should disable the pass by default, until it's
> > > > deemed profitable enough.
> > > >
> > > > -Ahmed
> >
> >
> >
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150227/fa7eb96c/attachment.html>


More information about the llvm-dev mailing list