[llvm-commits] [PATCH] Stack Coloring optimization

Mon Sep 10 11:38:33 PDT 2012

James,

> I'll cancel my patch and wait for the MI sched to
> hit trunk.

  There is already a default MI scheduler on the trunk, and recently its
adaptation for Hexagon back end has been upstreamed.
In either case you probably want to look at the DAG construction portion of
it - there is some explicit code to break false mem control deps. Also
beware that DAG construction is used in many different places during
compilation including pre-/post- RA scheduling and bundling for platforms
that support it. If you need more precise pointers, just let me know.

Thanks.

Sergei

---
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by
The Linux Foundation

> -----Original Message-----
> From: llvm-commits-bounces at cs.uiuc.edu [mailto:llvm-commits-
> bounces at cs.uiuc.edu] On Behalf Of James Molloy
> Sent: Sunday, September 09, 2012 6:43 AM
> To: Evan Cheng
> Cc: llvm-commits at cs.uiuc.edu
> Subject: Re: [llvm-commits] [PATCH] Stack Coloring optimization
> 
> Hi Evan,
> 
> Thanks for this. This is why I was probing about the new MI scheduler
> and whether it was in trunk yet (I've been kind of poor at keeping up
> on llvmdev and llvm-commits recently).
> 
> This sounds great. I'll cancel my patch and wait for the MI sched to
> hit trunk.
> 
> Out of interest, I know we want to avoid adding extra complexity to
> SDIsel, but isn't it conceptually cleaner to create chains that
> actually denote the required memory op ordering in the first place than
> creating an overly strict chain then breaking it later? From a design
> point of view, is doing it in MISched necessarily better?
> 
> Cheers,
> 
> James
> ________________________________________
> From: Evan Cheng [evan.cheng at apple.com]
> Sent: 08 September 2012 23:20
> To: James Molloy
> Cc: Hal Finkel; llvm-commits at cs.uiuc.edu
> Subject: Re: [llvm-commits] [PATCH] Stack Coloring optimization
> 
> A quick note. It's probably not a good idea to spend effort to make
> SDISel to make use of AA information.
> 
> LLVM will likely default to Andy's new MI scheduler within a few
> months. The new scheduler is where we should use AA info to reorder
> memory ops. We'd like to avoid adding complexity to sdisel.
> 
> Evan
> 
> On Sep 3, 2012, at 10:29 AM, James Molloy <James.Molloy at arm.com> wrote:
> 
> > Hi Hal,
> >
> > It is along the same lines, and is very similar. It affects
> PendingLoads in SelectionDAGBuilder.
> >
> > Where I've differed from you in algorithm (and I'm still trying to
> prove to myself whether they should be functionally equivalent, yours
> and mine...) is to try and keep as closely as possible to the previous
> behaviour, i.e. bunching up loads but never bunching up stores.
> >
> > Instead of calculating whether mem ops should be flushed in getRoot
> as you do, I use the AliasSetTracker to maintain a chain root for every
> known nonaliasing set of operations. Target memory intrinsics and calls
> obviously serialize everything, and when AliasSets merge their
> associated roots are TokenFactored.
> >
> > That way, we have several chains but the behaviour in each is very
> similar to previously, so the ideal is that it doesn't affect
> performance too much.
> >
> > Indeed, this appears to be the case. Because mine is not as wide-
> ranging an optimisation as yours, the speedups are small (5-8% on non-
> tiny benchmarks), but similarly the regressions are trivial (0-1% if my
> numbers add up).
> >
> > In synthetic benchmarks which resemble very closely OpenCL kernels
> (unrolled loops where we often have the idiom "load stuff; do stuff;
> store stuff;" and reordering loads past stores is very important for
> ILP), I have measured around 40% speedup.
> >
> > Cheers,
> >
> > James
> > ________________________________________
> > From: Hal Finkel [hfinkel at anl.gov]
> > Sent: 03 September 2012 17:38
> > To: James Molloy
> > Cc: Jakob Stoklund Olesen; llvm-commits at cs.uiuc.edu
> > Subject: Re: [llvm-commits] [PATCH] Stack Coloring optimization
> >
> > On Mon, 03 Sep 2012 14:47:59 +0100
> > James Molloy <james.molloy at arm.com> wrote:
> >
> >> Hi,
> >>
> >> I'm interested in this; is this code in trunk at the moment?
> >>
> >> I've been working on an optimisation to put non-aliasing loads and
> >> stores on different chains during selectiondag creation - is this
> >> scheduler code supposed to reorder independent loads and stores?
> >
> > James,
> >
> > Is this different from the patch I proposed last year?
> >
> > -Hal
> >
> >>
> >> Cheers,
> >>
> >> James
> >>
> >> On Thu, 2012-08-30 at 19:57 +0100, Jakob Stoklund Olesen wrote:
> >>> On Aug 30, 2012, at 2:51 AM, Nadav Rotem <nrotem at apple.com> wrote:
> >>>
> >>>> Hi All,
> >>>>
> >>>> I've been working on a new optimization for reducing the stack
> >>>> size.  Currently, when we declare allocas in LLVM IR, these
> allocas
> >>>> are directly translated to stack slots. And when we inline small
> >>>> functions into larger function, these allocas add up and take up
> >>>> lots of space.  In some cases we know that the use of the allocas
> >>>> is bounded by disjoint regions.  In this optimization we merge
> >>>> multiple disjoint slots into a single slot.  LLVM uses the
> lifetime
> >>>> markers for specifying the regions in which the allcoa is used.
> >>>> This patch propagates the lifetime markers through SelectionDAG
> and
> >>>> makes them pseudo ops.  Later, a pre-register-allocator pass
> >>>> constructs live intervals which represent the lifeless of
> different
> >>>> stack slots. Next, the pass merges disjoint intervals.  Notice
> that
> >>>> lifetime markers and not perfect single-entry-single exit regions.
> >>>> They may be removed by optimizations, they may start with two
> >>>> markers, and end with one, or even not end at all!
> >>>>
> >>>> So, why is this done in codegen?  There are a number of reasons.
> >>>> First, joining allocas may hinder alias analysis. Second, in the
> >>>> future we would like to share the alloca space with spill slots.
> >>>
> >>> About alias analysis. Andy was just showing me the scheduler's AA
> >>> code. It is using the memory operands to find the underlying LLVM
> IR
> >>> object. Loads and stores to different allocas are partitioned
> >>> according to their underlying IR object.
> >>>
> >>> Merging stack slots before the MI scheduler could invalidate this
> >>> form of alias analysis since two IR allocas can share a stack slot.
> >>>
> >>> /jakob
> >>>
> >>> _______________________________________________
> >>> llvm-commits mailing list
> >>> llvm-commits at cs.uiuc.edu
> >>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >>>
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> llvm-commits mailing list
> >> llvm-commits at cs.uiuc.edu
> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >
> >
> >
> > --
> > Hal Finkel
> > Postdoctoral Appointee
> > Leadership Computing Facility
> > Argonne National Laboratory
> >
> >
> > -- IMPORTANT NOTICE: The contents of this email and any attachments
> are confidential and may also be privileged. If you are not the
> intended recipient, please notify the sender immediately and do not
> disclose the contents to any other person, use it for any purpose, or
> store or copy the information in any medium.  Thank you.
> >
> >
> > _______________________________________________
> > llvm-commits mailing list
> > llvm-commits at cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> 
> 
> 
> 
> -- IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended
> recipient, please notify the sender immediately and do not disclose the
> contents to any other person, use it for any purpose, or store or copy
> the information in any medium.  Thank you.
> 
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits