[llvm-commits] Allow SelectionDAGBuilder to reorder loads past stores

Tue Jan 10 14:30:44 PST 2012

Hal, 

  Sorry, I missed your reply in the flurry of holiday-accumulated stuff... Until Andy's note today. 

Anyhow, in the form of update, we internally made great progress on catching up with the tip of SVN, so I should be able to submit my proposed implementation for VLIW scheduler in a matter of weeks.

  I still want to keep it in top-down form though. It is somewhat essential to the algorithm I am employing to detect clusters of dependencies to combat reg pressure. On the surface - it is your basic list scheduler with critical path first priority. What make it unique though is the use of DFA state machine to fill "parallel" issue slots for architectures that would benefit from it (new bundles if you please)... plus top-down dependency cluster detection which allows to serialize intelligently instruction issue when reg pressure budget is exceeded. (As I said before, ld/st dependencies have been already broken in scheduling DAG construction). Then I am employing stock post RA scheduler to deal with introduced spills/fills etc.
  With no dynamic scheduling in hardware it is critical for the Hexagon to do it this way, but I am yet to see what impact this new scheduler might have on other architectures... unless game rules do not change again before I get my chance in court :)

  Please stay tuned. 

Sergei.

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum.

> -----Original Message-----
> From: Hal Finkel [mailto:hfinkel at anl.gov]
> Sent: Thursday, December 22, 2011 6:16 PM
> To: Sergei Larin
> Cc: 'Jakob Stoklund Olesen'; llvm-commits at cs.uiuc.edu; 'Andrew Trick'
> Subject: RE: [llvm-commits] [PATCH] Allow SelectionDAGBuilder to
> reorder loads past stores
> 
> On Wed, 2011-12-21 at 10:44 -0600, Sergei Larin wrote:
> > Hal,
> >
> >   I have actually done the same fix internally (couple months ago)
> which
> > also resulted in severe performance degradation. To solve it for our
> back
> > end (Hexagon) I ended up modifying the scheduler. In fact I have
> introduced
> > our own (calling it VLIW) scheduler to handle newly available
> parallelism
> > and resulting reg pressure. Result was significant overall
> performance gain
> > on a wide (internal) test suite, with some kernels gaining 40-60%. I
> tried
> > to accomplish the same with existing infrastructure, but failed. Now
> you are
> > seeing similar issue with another architecture. I really wonder what
> your
> > next move shell be.
> 
> Sergei,
> 
> I have been able to extract a similar performance gain on a set of
> benchmarks I use internally by enabling load/store reordering
> (especially from those with partially-unrolled loops). I have the
> advantage of being able to, for the most part, use the existing
> infrastructure. The PPC 440-style chips that I work with, for example,
> are multi-pipeline but in-order and, once the artificial load/store
> dependencies are removed, the scoreboard hazard detection works pretty
> well. Combining the initial bottom-up scheduling with a post-RA top-
> down
> pass (after full anti-dependence breaking) generates highly-competitive
> schedules in many cases.
> 
> I can certainly understand, however, how the current schedulers would
> be
> suboptimal for your kind of architecture.
> 
> My current issue now is the ILP scheduling used for the x86
> architectures. Because they have no itineraries, the scheduling is
> purely heuristic, and the heuristics currently in place were never
> tuned
> without the strict critical-chain load/store ordering. When you hand
> this scheduler something else sometimes it does a great job and
> sometimes it does not. I don't expect to have changes accepted into
> trunk if they mess up performance on x86, and so I've been working to
> retune the heuristics to deal with more-independent loads and stores.
> 
> >   I have not checked my changes in for a simple reason that we are
> not
> > caught with the LLVM tip in our internal repository (we are several
> months
> > behind), and Evan has changed the game rules enough (I mean the
> removal of
> > top-down schedulers) ...for my design to be incompatible with the tip
> (my
> > scheduler is top-down).
> 
> Indeed. PPC 970 scheduling was non-existent for a while until I updated
> the hazard detector. For the time being, I changed it from a pre- to
> post-RA detector, so it is still top-down, but I've not really looked
> at
> how well that does.
> 
> >   I still plan to submit my work, but it needs to be changed it
> first, and
> > that takes time.
> 
> I am curious to know how you are doing that, algorithmically speaking.
> In some cases, just "inverting" the selection logic is sufficient, but
> it is not clear to me that is always the case.
> 
> >   Finally, what I am trying to say - if you are interested in what I
> have
> > been doing, or you know a better solution for the problem within
> existing
> > infrastructure, I would be very interested in talking about it.
> 
> I am interested in what you've been doing (and I'm sure a number of
> other people are interested as well). I don't really have a better
> solution for you (unless you can do everything you need with a post-RA
> hazard detector, in which case, use that for now).
> 
> More generally, however, since Evan has said that they'll be updating
> the schedulers in the coming year anyway, we should work (as a
> community) to make a clear set of requirements so that, hopefully,
> whatever comes out of the design process with work with as many
> architectures as possible.
> 
>  -Hal
> 
> >
> > Thanks.
> >
> >
> > Sergei Larin
> >
> > --
> > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum.
> >
> >
> > > -----Original Message-----
> > > From: llvm-commits-bounces at cs.uiuc.edu [mailto:llvm-commits-
> > > bounces at cs.uiuc.edu] On Behalf Of Hal Finkel
> > > Sent: Wednesday, December 21, 2011 7:50 AM
> > > To: Jakob Stoklund Olesen
> > > Cc: llvm-commits at cs.uiuc.edu
> > > Subject: Re: [llvm-commits] [PATCH] Allow SelectionDAGBuilder to
> > > reorder loads past stores
> > >
> > > It turns out that a significant cause of the performance
> regressions
> > > caused by this patch are related to this issue: with the patch
> applied
> > > the scheduler is now free to schedule many more things, especially
> > > stores, after calls (especially intrinsics that are expanded to lib
> > > calls). This tendency is bad because of the spilling necessary to
> cross
> > > the call boundary. I am working on a proposed solution, and I'll
> post
> > > an
> > > updated patch soon.
> > >
> > > Thanks again,
> > > Hal
> > >
> > > On Tue, 2011-12-20 at 12:52 -0600, Hal Finkel wrote:
> > > > On Tue, 2011-12-20 at 10:44 -0800, Jakob Stoklund Olesen wrote:
> > > > > On Dec 20, 2011, at 9:22 AM, Hal Finkel wrote:
> > > > >
> > > > > > when I later look at the register map, only XMM0 and XMM1 are
> > > ever
> > > > > > assigned to vregs, everything else is spilled. This is wrong.
> Do
> > > you
> > > > > > have any ideas on what could be going wrong or other things I
> > > should
> > > > > > examine? Could the register allocator not be accounting
> correctly
> > > for
> > > > > > callee-saved registers when computing live-interval
> interference
> > > > > > information?
> > > > >
> > > > > There are no callee-saved xmm registers.
> > > >
> > > > Thanks! I was mixing up the Win64 calling convention with the
> regular
> > > > one. That explains things, so, I suppose the right thing to do is
> to
> > > > make sure all stores are flushed before any call (which I think
> it
> > > > already does), and any intrinsic that will be expanded (which it
> will
> > > > not currently do).
> > > >
> > > >  -Hal
> > > >
> > > > >
> > > > > /jakob
> > > > >
> > > >
> > >
> > > --
> > > Hal Finkel
> > > Postdoctoral Appointee
> > > Leadership Computing Facility
> > > Argonne National Laboratory
> > >
> > > _______________________________________________
> > > llvm-commits mailing list
> > > llvm-commits at cs.uiuc.edu
> > > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >
> 
> --
> Hal Finkel
> Postdoctoral Appointee
> Leadership Computing Facility
> Argonne National Laboratory