[llvm-commits] [PATCH] Stack Coloring optimization

Tue Sep 4 00:23:44 PDT 2012

On Tue, 2012-09-04 at 03:07 +0100, Hal Finkel wrote:
> On Mon, 3 Sep 2012 18:29:07 +0100
> James Molloy <James.Molloy at arm.com> wrote:
>
> > Hi Hal,
> >
> > It is along the same lines, and is very similar. It affects
> > PendingLoads in SelectionDAGBuilder.
> >
> > Where I've differed from you in algorithm (and I'm still trying to
> > prove to myself whether they should be functionally equivalent, yours
> > and mine...) is to try and keep as closely as possible to the
> > previous behaviour, i.e. bunching up loads but never bunching up
> > stores.
> >
> > Instead of calculating whether mem ops should be flushed in getRoot
> > as you do, I use the AliasSetTracker to maintain a chain root for
> > every known nonaliasing set of operations. Target memory intrinsics
> > and calls obviously serialize everything, and when AliasSets merge
> > their associated roots are TokenFactored.
> >
> > That way, we have several chains but the behaviour in each is very
> > similar to previously, so the ideal is that it doesn't affect
> > performance too much.
>
> Sounds good.
>
> >
> > Indeed, this appears to be the case. Because mine is not as
> > wide-ranging an optimisation as yours, the speedups are small (5-8%
> > on non-tiny benchmarks), but similarly the regressions are trivial
> > (0-1% if my numbers add up).
>
> This was measured on x86 or ARM? I ended up running into problems with
> the ILP-scheduling heuristics used for x86.
>

This was measured on X86 using the LLVM test-suite. My motivation was
mathematical kernels on ARM, and the test suite doesn't test those so
well (although Tobi has recently added PolyBench...)

> >
> > In synthetic benchmarks which resemble very closely OpenCL kernels
> > (unrolled loops where we often have the idiom "load stuff; do stuff;
> > store stuff;" and reordering loads past stores is very important for
> > ILP), I have measured around 40% speedup.
>
> Great. These kinds of unrolled kernels were also my motivation for
> looking at this.
>
>  -Hal
>
> >
> > Cheers,
> >
> > James
> > ________________________________________
> > From: Hal Finkel [hfinkel at anl.gov]
> > Sent: 03 September 2012 17:38
> > To: James Molloy
> > Cc: Jakob Stoklund Olesen; llvm-commits at cs.uiuc.edu
> > Subject: Re: [llvm-commits] [PATCH] Stack Coloring optimization
> >
> > On Mon, 03 Sep 2012 14:47:59 +0100
> > James Molloy <james.molloy at arm.com> wrote:
> >
> > > Hi,
> > >
> > > I'm interested in this; is this code in trunk at the moment?
> > >
> > > I've been working on an optimisation to put non-aliasing loads and
> > > stores on different chains during selectiondag creation - is this
> > > scheduler code supposed to reorder independent loads and stores?
> >
> > James,
> >
> > Is this different from the patch I proposed last year?
> >
> >  -Hal
> >
> > >
> > > Cheers,
> > >
> > > James
> > >
> > > On Thu, 2012-08-30 at 19:57 +0100, Jakob Stoklund Olesen wrote:
> > > > On Aug 30, 2012, at 2:51 AM, Nadav Rotem <nrotem at apple.com> wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > I've been working on a new optimization for reducing the stack
> > > > > size.  Currently, when we declare allocas in LLVM IR, these
> > > > > allocas are directly translated to stack slots. And when we
> > > > > inline small functions into larger function, these allocas add
> > > > > up and take up lots of space.  In some cases we know that the
> > > > > use of the allocas is bounded by disjoint regions.  In this
> > > > > optimization we merge multiple disjoint slots into a single
> > > > > slot.  LLVM uses the lifetime markers for specifying the
> > > > > regions in which the allcoa is used.  This patch propagates the
> > > > > lifetime markers through SelectionDAG and makes them pseudo
> > > > > ops.  Later, a pre-register-allocator pass constructs live
> > > > > intervals which represent the lifeless of different stack
> > > > > slots. Next, the pass merges disjoint intervals.  Notice that
> > > > > lifetime markers and not perfect single-entry-single exit
> > > > > regions. They may be removed by optimizations, they may start
> > > > > with two markers, and end with one, or even not end at all!
> > > > >
> > > > > So, why is this done in codegen?  There are a number of reasons.
> > > > > First, joining allocas may hinder alias analysis. Second, in the
> > > > > future we would like to share the alloca space with spill slots.
> > > >
> > > > About alias analysis. Andy was just showing me the scheduler's AA
> > > > code. It is using the memory operands to find the underlying LLVM
> > > > IR object. Loads and stores to different allocas are partitioned
> > > > according to their underlying IR object.
> > > >
> > > > Merging stack slots before the MI scheduler could invalidate this
> > > > form of alias analysis since two IR allocas can share a stack
> > > > slot.
> > > >
> > > > /jakob
> > > >
> > > > _______________________________________________
> > > > llvm-commits mailing list
> > > > llvm-commits at cs.uiuc.edu
> > > > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> > > >
> > >
> > >
> > >
> > >
> > > _______________________________________________
> > > llvm-commits mailing list
> > > llvm-commits at cs.uiuc.edu
> > > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >
> >
> >
> > --
> > Hal Finkel
> > Postdoctoral Appointee
> > Leadership Computing Facility
> > Argonne National Laboratory
> >
> >
> > -- IMPORTANT NOTICE: The contents of this email and any attachments
> > are confidential and may also be privileged. If you are not the
> > intended recipient, please notify the sender immediately and do not
> > disclose the contents to any other person, use it for any purpose, or
> > store or copy the information in any medium.  Thank you.
> >
>
>
>

-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium.  Thank you.