[llvm-commits] [PATCH] Allow SelectionDAGBuilder to reorder loads past stores

Wed Apr 4 10:22:43 PDT 2012

On Mon, 02 Apr 2012 17:19:59 -0700
Andrew Trick <atrick at apple.com> wrote:

> 
> On Mar 28, 2012, at 10:10 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> 
> > Andrew, et al.,
> > 
> > I've attached a rebased patch. This is still a "for discussion
> > purposes only" change (although I do use it internally), but I
> > would very much appreciate it being reviewed for conceptual errors.
> > All feedback, of course, is welcome.
> 
> Hi Hal,
> 
> I think your changes make sense, and if they work for you that's
> great. There's more to worry about when you want to push them
> upstream. Lets talk about memory alias support and list-ilp
> heuristics separately even though there is a performance relationship.

Thanks for taking the time to look at this. I appreciate it.

> 
> 1) memory alias support
> 
> Conceptually your approach seems like it could work. When I tried
> your initial patch I saw some failures and didn't investigate
> further.

I think that the second version fixed a bug in the first, the most
recent version was just a rebase. Nevertheless, last I checked, the
patch still gives test-suite failures on x86. It is harder to tell on
PPC (because there are test-suite failures on PPC generally), but I
think that it works without issue there.

 It may not be a serious issue, but either way I'm not
> anxious to add complexity to the SelectionDAG. It's hard for me to
> say whether some code that I'm unfamiliar with makes assumptions
> about the SD structure. Long term, we want to simplify SelectionDAG
> scheduling. It should only need to serialize the DAG in order to
> respect physical register dependencies.

Fair enough.

Aggressive load/store
> reordering will be performed (eventually) in the MachineScheduler if
> needed.

Do you have any feeling on when eventually is?

> 
> We need alias analysis support in the MachineScheduler's DAG builder.
> In fact, it's so important that I think we should do it right.
> Although your patch is clearly an improvement over nothing, we want
> something more complete in MachineScheduler. It seems that your
> conservative approach can get lucky and do the right thing in narrow
> cases, but a single stray store could serialize all memory at the
> wrong point.

Quite true.

 Minimally, I think we need to track the alias sets
> within the current DAG, where each alias set is a set of
> AA::Locations occurring in the scheduling region that may alias.
> "unknown" locations get a special set. If we partition memory this
> way, and limit the memory partition into a small finite list of set,
> then we can treat each alias set as a separate root and efficiently
> build the DAG without forcing arbitrary serialization. There may be
> more clever ways to do this. The approach I described is a conceptual
> baseline that we need to surpass.

That sounds good. So long as we have a reasonable alias-set
implementation, further cleverness may actually be unnecessary.

> 
> 2) x86_64 list_ilp heuristics
> 
> Your improvements make at least as much sense as the existing
> heuristics, but you're building on an unstable set of hacks to begin
> with. Those hacks were tuned to handle certain benchmarks, so we have
> to carefully measure and reevaluate any changes in this area. It's
> time consuming.

The changes to the ILP heuristics seemed to effectively eliminate the
most egregious performance regressions caused by the aliasing-support
additions. Otherwise, however, there was still a fair amount of
movement in both directions.

> 
> It probably doesn't make sense to add these if we don't have AA
> support, so let's revisit the problem that you're solving when the
> MachineScheduler reaches that point.

I agree. One thing that I would like to understand is whether the
performance regressions caused by the aliasing-support additions were
driven by the lack of real itineraries on x86, or the lack of a cache
model (or some combination thereof), or something else.

> 
> Thanks, and good work!

Thanks!

 -Hal

> -Andy
> 
> 
> > Thanks again,
> > Hal
> > 
> > On Mon, 16 Jan 2012 18:18:43 -0600
> > Hal Finkel <hfinkel at anl.gov> wrote:
> > 
> >> Andrew,
> >> 
> >> I've attached an updated patch. It includes a new ILP scheduling
> >> heuristic that I found useful (on some x86_64 machines), and some
> >> other improvements over the previous version.
> >> 
> >> Regarding the partial-aliasing problem you had pointed out, I don't
> >> think that can be an problem here. The reason is that the patch
> >> maintains the following invariant: all currently pending loads and
> >> stores are commutable (exchangeable). There is still a critical
> >> chain, but instead of consisting of all stores, the chain now
> >> consists of all loads or stores that are not commutable with some
> >> previous load or store. As a result, I do not think that it is
> >> possible to queue a load or store that would cause an "aliasing
> >> cycle" such as in your example.
> >> 
> >> Please let me know what you think.
> >> 
> >> Thanks again,
> >> Hal 
> >> 
> >> On Wed, 2012-01-11 at 22:16 -0600, Hal Finkel wrote:
> >>> On Wed, 2012-01-11 at 18:45 -0800, Andrew Trick wrote:
> >>>> On Dec 19, 2011, at 9:42 AM, Hal Finkel wrote:
> >>>> 
> >>>>> The current SelectionDAGBuilder does not allow loads to be
> >>>>> reordered past stores, and does not allow stores to be
> >>>>> reordered. This is a side effect of the way the critical chain
> >>>>> is constructed: there is a queue of pending loads that is
> >>>>> flushed (in parallel) to the root of the chain upon
> >>>>> encountering any store (and that store is also appended to the
> >>>>> root of the chain). Among other things, loop unrolling is far
> >>>>> less effective than it otherwise could be.
> >>>>> 
> >>>>> The attached patch allows SelectionDAGBuilder to use the
> >>>>> available alias analysis to reorder independent loads and
> >>>>> stores. It changes the queue of pending loads into a more
> >>>>> general queue of pending memory operations, and flushes, in
> >>>>> parallel, all potentially-conflicting loads and stores as
> >>>>> necessary.
> >>>> 
> >>>> Hal,
> >>>> 
> >>>> I haven't tried out your patch, so correct me if I'm wrong. I
> >>>> don't understand how this can work with a simple alias analysis
> >>>> interface like we have in LLVM.
> >>>> 
> >>>> Root
> >>>> ld A
> >>>> st B
> >>>> st C
> >>>> 
> >>>> mayAlias(A,B) = true
> >>>> mayAlias(A,C) = true
> >>>> mayAlias(B,C) = false
> >>> 
> >>> This is a good point, your situation is something like this:
> >>> 
> >>> -------------------
> >>> |        A        |
> >>> |    B   |   C    |
> >>> -------------------
> >>> 
> >>> and without strict aliasing rules, this could certainly be an
> >>> issue.
> >>> 
> >>>> 
> >>>> So we have the chain Root->A->B and Root->C which may result in
> >>>> the schedule:
> >>>> 
> >>>> st C
> >>>> ld A
> >>>> st B
> >>>> 
> >>>> It seems that we have to order all stores if we want them to
> >>>> "cover" preceding loads. What am I missing?
> >>>> 
> >>> 
> >>> In the patch, aliasing loads and stores still set the current root
> >>> to their combined token factor. As a result, the reordering will
> >>> never be as severe as in your example; the patch is safe in this
> >>> particular case because the DAG root will be set to A once B is
> >>> visited and so C will also be forced to come after A. I'll need to
> >>> give your point further thought, however, because maybe this kind
> >>> of thing can bite in some other way.
> >>> 
> >>> We may need to build an aliasing set per chain in order to check
> >>> for this kind of thing.
> >>> 
> >>> Thanks again,
> >>> Hal
> >>> 
> >>>> -Andy
> >>> 
> >> 
> > 
> > 
> > 
> > -- 
> > Hal Finkel
> > Postdoctoral Appointee
> > Leadership Computing Facility
> > Argonne National Laboratory
> > <llvm_lsro-20120328.diff>
> 

-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory