[llvm-commits] [PATCH] Stack Coloring optimization

Sat Sep 8 15:20:13 PDT 2012

A quick note. It's probably not a good idea to spend effort to make SDISel to make use of AA information.

LLVM will likely default to Andy's new MI scheduler within a few months. The new scheduler is where we should use AA info to reorder memory ops. We'd like to avoid adding complexity to sdisel.

Evan

On Sep 3, 2012, at 10:29 AM, James Molloy <James.Molloy at arm.com> wrote:

> Hi Hal,
> 
> It is along the same lines, and is very similar. It affects PendingLoads in SelectionDAGBuilder.
> 
> Where I've differed from you in algorithm (and I'm still trying to prove to myself whether they should be functionally equivalent, yours and mine...) is to try and keep as closely as possible to the previous behaviour, i.e. bunching up loads but never bunching up stores.
> 
> Instead of calculating whether mem ops should be flushed in getRoot as you do, I use the AliasSetTracker to maintain a chain root for every known nonaliasing set of operations. Target memory intrinsics and calls obviously serialize everything, and when AliasSets merge their associated roots are TokenFactored.
> 
> That way, we have several chains but the behaviour in each is very similar to previously, so the ideal is that it doesn't affect performance too much.
> 
> Indeed, this appears to be the case. Because mine is not as wide-ranging an optimisation as yours, the speedups are small (5-8% on non-tiny benchmarks), but similarly the regressions are trivial (0-1% if my numbers add up).
> 
> In synthetic benchmarks which resemble very closely OpenCL kernels (unrolled loops where we often have the idiom "load stuff; do stuff; store stuff;" and reordering loads past stores is very important for ILP), I have measured around 40% speedup.
> 
> Cheers,
> 
> James
> ________________________________________
> From: Hal Finkel [hfinkel at anl.gov]
> Sent: 03 September 2012 17:38
> To: James Molloy
> Cc: Jakob Stoklund Olesen; llvm-commits at cs.uiuc.edu
> Subject: Re: [llvm-commits] [PATCH] Stack Coloring optimization
> 
> On Mon, 03 Sep 2012 14:47:59 +0100
> James Molloy <james.molloy at arm.com> wrote:
> 
>> Hi,
>> 
>> I'm interested in this; is this code in trunk at the moment?
>> 
>> I've been working on an optimisation to put non-aliasing loads and
>> stores on different chains during selectiondag creation - is this
>> scheduler code supposed to reorder independent loads and stores?
> 
> James,
> 
> Is this different from the patch I proposed last year?
> 
> -Hal
> 
>> 
>> Cheers,
>> 
>> James
>> 
>> On Thu, 2012-08-30 at 19:57 +0100, Jakob Stoklund Olesen wrote:
>>> On Aug 30, 2012, at 2:51 AM, Nadav Rotem <nrotem at apple.com> wrote:
>>> 
>>>> Hi All,
>>>> 
>>>> I've been working on a new optimization for reducing the stack
>>>> size.  Currently, when we declare allocas in LLVM IR, these
>>>> allocas are directly translated to stack slots. And when we
>>>> inline small functions into larger function, these allocas add up
>>>> and take up lots of space.  In some cases we know that the use of
>>>> the allocas is bounded by disjoint regions.  In this optimization
>>>> we merge multiple disjoint slots into a single slot.  LLVM uses
>>>> the lifetime markers for specifying the regions in which the
>>>> allcoa is used.  This patch propagates the lifetime markers
>>>> through SelectionDAG and makes them pseudo ops.  Later, a
>>>> pre-register-allocator pass constructs live intervals which
>>>> represent the lifeless of different stack slots. Next, the pass
>>>> merges disjoint intervals.  Notice that lifetime markers and not
>>>> perfect single-entry-single exit regions. They may be removed by
>>>> optimizations, they may start with two markers, and end with one,
>>>> or even not end at all!
>>>> 
>>>> So, why is this done in codegen?  There are a number of reasons.
>>>> First, joining allocas may hinder alias analysis. Second, in the
>>>> future we would like to share the alloca space with spill slots.
>>> 
>>> About alias analysis. Andy was just showing me the scheduler's AA
>>> code. It is using the memory operands to find the underlying LLVM
>>> IR object. Loads and stores to different allocas are partitioned
>>> according to their underlying IR object.
>>> 
>>> Merging stack slots before the MI scheduler could invalidate this
>>> form of alias analysis since two IR allocas can share a stack slot.
>>> 
>>> /jakob
>>> 
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> 
> 
> 
> --
> Hal Finkel
> Postdoctoral Appointee
> Leadership Computing Facility
> Argonne National Laboratory
> 
> 
> -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium.  Thank you.
> 
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits