[PATCH] D19950: Use frequency info to guide Loop Invariant Code Motion.

Tue May 10 13:58:45 PDT 2016

----- Original Message -----

> From: "Xinliang David Li" <davidxl at google.com>
> To: "Dehao Chen" <danielcdh at gmail.com>
> Cc: reviews+D19950+public+38ba22078c2035b8 at reviews.llvm.org, "David
> Majnemer" <david.majnemer at gmail.com>, "Hal Finkel"
> <hfinkel at anl.gov>, "Junbum Lim" <junbuml at codeaurora.org>,
> mcrosier at codeaurora.org, "llvm-commits"
> <llvm-commits at lists.llvm.org>, "amara emerson"
> <amara.emerson at arm.com>
> Sent: Tuesday, May 10, 2016 3:15:24 PM
> Subject: Re: [PATCH] D19950: Use frequency info to guide Loop
> Invariant Code Motion.

> On Tue, May 10, 2016 at 1:03 PM, Dehao Chen < danielcdh at gmail.com >
> wrote:

> > On Tue, May 10, 2016 at 11:48 AM, Xinliang David Li <
> > davidxl at google.com > wrote:
> 

> > > On Tue, May 10, 2016 at 11:01 AM, Dehao Chen <
> > > danielcdh at gmail.com
> > > >
> > > wrote:
> > 
> 

> > > > danielcdh added a comment.
> > > 
> > 
> 

> > > > In http://reviews.llvm.org/D19950#425287 , @hfinkel wrote:
> > > 
> > 
> 

> > > > > In http://reviews.llvm.org/D19950#425286 , @hfinkel wrote:
> > > 
> > 
> 
> > > > >
> > > 
> > 
> 
> > > > > > In http://reviews.llvm.org/D19950#425285 , @davidxl wrote:
> > > 
> > 
> 
> > > > > >
> > > 
> > 
> 
> > > > > > > Static prediction has been conservative in estimating
> > > > > > > loop
> > > > > > > trip
> > > > > > > count -- it produces something like 30ish iterations. If
> > > > > > > the
> > > > > > > a
> > > > > > > very hot loop has a big if-then-else (or switch), it is
> > > > > > > very
> > > > > > > likely to mark many bbs' to be colder than the loop
> > > > > > > header.
> > > > > > > Turning on this for static prediction really depends on
> > > > > > > the
> > > > > > > false rate. It seems to be this can get wrong pretty
> > > > > > > easily
> > > > > > > for very hot loops (which is also the most important
> > > > > > > thing
> > > > > > > to
> > > > > > > optimize for).
> > > 
> > 
> 
> > > > > >
> > > 
> > 
> 
> > > > > >
> > > 
> > 
> 
> > > > > > This is a good point. There's no universal conservative
> > > > > > choice
> > > > > > (assuming a small trip count is conservative in some cases,
> > > > > > and
> > > > > > assuming a large trip count is conservative in other
> > > > > > cases).
> > > 
> > 
> 
> > > > >
> > > 
> > 
> 
> > > > >
> > > 
> > 
> 
> > > > > Would it be better (and practical) if there were some way for
> > > > > the
> > > > > BFI client to specify which kind of 'conservative' is
> > > > > desired?
> > > 
> > 
> 
> > > > >
> > > 
> > 
> 
> > > > > Also, why are we doing this instead of sinking later (in CGP
> > > > > or
> > > > > similar)? LICM can expose optimization opportunities, plus
> > > > > represents a code pattern the user might input manually.
> > > > > Sinking
> > > > > later seems more robust.
> > > 
> > 
> 

> > > > I looked at CGP pass, looks like it's handling the sinking
> > > > case-by-case (e.g. there is separate routine to handle sinking
> > > > of
> > > > load, gep, etc. I'm afraid this would miss opportunities.
> > > > Additionally, the file-level comment of CGP pass says "This
> > > > works
> > > > around limitations in it's basic-block-at-a-time approach. It
> > > > should
> > > > eventually be removed."
> > > 
> > 
> 

Yes, but it will be "removed" when the entire subsystem is replaced by GlobalISel, and we'll certainly need to make GlobalISel profiling-data aware, so I expect this is the right path forward regardless. I agree, however, that we want a general sinking here based on profiling data, not just the specific existing heuristics for loads, GEPs, etc. 

> > > Perhaps you can do profile driven sinking CGP separately to
> > > handle
> > > manually hoisted code situation mentioned by Hal.
> > 
> 
> > Do you mean we still use frequency to decide whether to hoist code
> > in
> > LICM, additionally use frequency info to check if we want to sink
> > instructions in CGP?
> 
> yes -- that is the suggestion.
I'd prefer that we try to sink late first, and only if there are use cases that we can't handle this way, we consider throttling hoisting early. If we come across such use cases, I'd like to understand them better. Hoisting can expose other optimization opportunities, and you lose those opportunities if you don't hoist in the first place. 

-Hal 

> David

> > Dehao
> 

> > > David
> > 
> 

> > > > I'm not quite clear why it helps to move code out of loop early
> > > > and
> > > > later sink it inside. Could you give an example or some more
> > > > context?
> > > 
> > 
> 

> > > > Thanks,
> > > 
> > 
> 
> > > > Dehao
> > > 
> > 
> 

> > > > http://reviews.llvm.org/D19950
> > > 
> > 
> 

-- 

Hal Finkel 
Assistant Computational Scientist 
Leadership Computing Facility 
Argonne National Laboratory 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160510/cf961c67/attachment.html>