[llvm-commits] [PATCH] Stack Coloring optimization

Sun Sep 9 07:34:12 PDT 2012

Hi James,

> Thanks for this. This is why I was probing about the new MI scheduler and whether it was in trunk yet (I've been kind of poor at keeping up on llvmdev and llvm-commits recently).
>
> This sounds great. I'll cancel my patch and wait for the MI sched to hit trunk.
>
> Out of interest, I know we want to avoid adding extra complexity to SDIsel, but isn't it conceptually cleaner to create chains that actually denote the required memory op ordering in the first place than creating an overly strict chain then breaking it later? From a design point of view, is doing it in MISched necessarily better?

maybe both should be done.  It does seem natural to have SelectionDAGBuilder
already create sensible chains, e.g. that don't serialize memory operations
that alias analysis at the IR level says can run in parallel.  However codegen
can mulch things considerably and expose new opportunities for parallelism, so
also having something that runs much later makes sense to me too.

Ciao, Duncan.

>
> Cheers,
>
> James
> ________________________________________
> From: Evan Cheng [evan.cheng at apple.com]
> Sent: 08 September 2012 23:20
> To: James Molloy
> Cc: Hal Finkel; llvm-commits at cs.uiuc.edu
> Subject: Re: [llvm-commits] [PATCH] Stack Coloring optimization
>
> A quick note. It's probably not a good idea to spend effort to make SDISel to make use of AA information.
>
> LLVM will likely default to Andy's new MI scheduler within a few months. The new scheduler is where we should use AA info to reorder memory ops. We'd like to avoid adding complexity to sdisel.
>
> Evan
>
> On Sep 3, 2012, at 10:29 AM, James Molloy <James.Molloy at arm.com> wrote:
>
>> Hi Hal,
>>
>> It is along the same lines, and is very similar. It affects PendingLoads in SelectionDAGBuilder.
>>
>> Where I've differed from you in algorithm (and I'm still trying to prove to myself whether they should be functionally equivalent, yours and mine...) is to try and keep as closely as possible to the previous behaviour, i.e. bunching up loads but never bunching up stores.
>>
>> Instead of calculating whether mem ops should be flushed in getRoot as you do, I use the AliasSetTracker to maintain a chain root for every known nonaliasing set of operations. Target memory intrinsics and calls obviously serialize everything, and when AliasSets merge their associated roots are TokenFactored.
>>
>> That way, we have several chains but the behaviour in each is very similar to previously, so the ideal is that it doesn't affect performance too much.
>>
>> Indeed, this appears to be the case. Because mine is not as wide-ranging an optimisation as yours, the speedups are small (5-8% on non-tiny benchmarks), but similarly the regressions are trivial (0-1% if my numbers add up).
>>
>> In synthetic benchmarks which resemble very closely OpenCL kernels (unrolled loops where we often have the idiom "load stuff; do stuff; store stuff;" and reordering loads past stores is very important for ILP), I have measured around 40% speedup.
>>
>> Cheers,
>>
>> James
>> ________________________________________
>> From: Hal Finkel [hfinkel at anl.gov]
>> Sent: 03 September 2012 17:38
>> To: James Molloy
>> Cc: Jakob Stoklund Olesen; llvm-commits at cs.uiuc.edu
>> Subject: Re: [llvm-commits] [PATCH] Stack Coloring optimization
>>
>> On Mon, 03 Sep 2012 14:47:59 +0100
>> James Molloy <james.molloy at arm.com> wrote:
>>
>>> Hi,
>>>
>>> I'm interested in this; is this code in trunk at the moment?
>>>
>>> I've been working on an optimisation to put non-aliasing loads and
>>> stores on different chains during selectiondag creation - is this
>>> scheduler code supposed to reorder independent loads and stores?
>>
>> James,
>>
>> Is this different from the patch I proposed last year?
>>
>> -Hal
>>
>>>
>>> Cheers,
>>>
>>> James
>>>
>>> On Thu, 2012-08-30 at 19:57 +0100, Jakob Stoklund Olesen wrote:
>>>> On Aug 30, 2012, at 2:51 AM, Nadav Rotem <nrotem at apple.com> wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> I've been working on a new optimization for reducing the stack
>>>>> size.  Currently, when we declare allocas in LLVM IR, these
>>>>> allocas are directly translated to stack slots. And when we
>>>>> inline small functions into larger function, these allocas add up
>>>>> and take up lots of space.  In some cases we know that the use of
>>>>> the allocas is bounded by disjoint regions.  In this optimization
>>>>> we merge multiple disjoint slots into a single slot.  LLVM uses
>>>>> the lifetime markers for specifying the regions in which the
>>>>> allcoa is used.  This patch propagates the lifetime markers
>>>>> through SelectionDAG and makes them pseudo ops.  Later, a
>>>>> pre-register-allocator pass constructs live intervals which
>>>>> represent the lifeless of different stack slots. Next, the pass
>>>>> merges disjoint intervals.  Notice that lifetime markers and not
>>>>> perfect single-entry-single exit regions. They may be removed by
>>>>> optimizations, they may start with two markers, and end with one,
>>>>> or even not end at all!
>>>>>
>>>>> So, why is this done in codegen?  There are a number of reasons.
>>>>> First, joining allocas may hinder alias analysis. Second, in the
>>>>> future we would like to share the alloca space with spill slots.
>>>>
>>>> About alias analysis. Andy was just showing me the scheduler's AA
>>>> code. It is using the memory operands to find the underlying LLVM
>>>> IR object. Loads and stores to different allocas are partitioned
>>>> according to their underlying IR object.
>>>>
>>>> Merging stack slots before the MI scheduler could invalidate this
>>>> form of alias analysis since two IR allocas can share a stack slot.
>>>>
>>>> /jakob
>>>>
>>>> _______________________________________________
>>>> llvm-commits mailing list
>>>> llvm-commits at cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>
>>
>>
>> --
>> Hal Finkel
>> Postdoctoral Appointee
>> Leadership Computing Facility
>> Argonne National Laboratory
>>
>>
>> -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium.  Thank you.
>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
>
>
>
> -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium.  Thank you.
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>