[PATCH] D19950: Use frequency info to guide Loop Invariant Code Motion.

Hal Finkel via llvm-commits llvm-commits at lists.llvm.org
Thu May 12 19:24:26 PDT 2016


----- Original Message -----

> From: "Dehao Chen" <danielcdh at gmail.com>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: "Xinliang David Li" <davidxl at google.com>,
> reviews+D19950+public+38ba22078c2035b8 at reviews.llvm.org, "David
> Majnemer" <david.majnemer at gmail.com>, "Junbum Lim"
> <junbuml at codeaurora.org>, mcrosier at codeaurora.org, "llvm-commits"
> <llvm-commits at lists.llvm.org>, "amara emerson"
> <amara.emerson at arm.com>, "Philip Reames" <listmail at philipreames.com>
> Sent: Thursday, May 12, 2016 9:20:35 PM
> Subject: Re: [PATCH] D19950: Use frequency info to guide Loop
> Invariant Code Motion.

> On Thu, May 12, 2016 at 6:56 PM, Hal Finkel < hfinkel at anl.gov >
> wrote:

> > > From: "Xinliang David Li" < davidxl at google.com >
> > 
> 
> > > To: "Dehao Chen" < danielcdh at gmail.com >
> > 
> 
> > > Cc: "Hal Finkel" < hfinkel at anl.gov >,
> > > reviews+D19950+public+38ba22078c2035b8 at reviews.llvm.org , "David
> > > Majnemer" < david.majnemer at gmail.com >, "Junbum Lim" <
> > > junbuml at codeaurora.org >, mcrosier at codeaurora.org ,
> > > "llvm-commits"
> > > <
> > > llvm-commits at lists.llvm.org >, "amara emerson" <
> > > amara.emerson at arm.com >
> > 
> 
> > > Sent: Wednesday, May 11, 2016 6:01:32 PM
> > 
> 
> > > Subject: Re: [PATCH] D19950: Use frequency info to guide Loop
> > > Invariant Code Motion.
> > 
> 

> > > This is probably just a concern in theory -- current store motion
> > > only does downward code motion (sink and merge).
> > 
> 
> > I'm not sure I understand the example. To host the cold_load, it
> > must
> > not alias with any stores in the loop. To hoist the store past the
> > loop, it also must not alias with anything in the loop.
> 

> The store is only aliased with the load. As the load is hoisted, the
> store can also be hoisted.
Fair enough. To sink the load you also need to sink the store. Maybe our sinking needs to be smart enough to do that. Good point. 

-Hal 

> > > While upward code motion for stores is also possible (e.g. to
> > > shrink
> > > live ranges of the stored value and address val), it is not
> > > likely
> > > done as an IR optimization pass.
> > 
> 
> > This reminds me of a conversation I was having with Philip some
> > weeks
> > ago about how InstCombine has a somewhat-unfortunate heuristic of
> > sinking instructions with a single use in only one predecessor into
> > that predecessor. It does this even for instructions with potential
> > side effects, and so we lose the ability to hoist them back again.
> > Hosting them back might be important later for scheduling, etc. For
> > what it wants to do, however, the heurisitic is also not strong
> > enough because it a) does not handle multiple uses in the
> > predecessor and b) only looks in direct predecessors, not any
> > dominated block. Sort of the worst of both worlds ;)
> 

> > -Hal
> 

> > > David
> > 
> 

> > > On Wed, May 11, 2016 at 3:44 PM, Dehao Chen < danielcdh at gmail.com
> > > >
> > > wrote:
> > 
> 

> > > > hoist-early sink-later may also introduces hoisted instructions
> > > > that
> > > > is not sinkable later.
> > > 
> > 
> 

> > > > e.g.
> > > 
> > 
> 
> > > > orig code:
> > > 
> > 
> 
> > > > for() {
> > > 
> > 
> 
> > > > if (cond) {
> > > 
> > 
> 
> > > > cold_load;
> > > 
> > 
> 
> > > > cold_code;
> > > 
> > 
> 
> > > > }
> > > 
> > 
> 
> > > > }
> > > 
> > 
> 
> > > > store;
> > > 
> > 
> 

> > > > after hoisting:
> > > 
> > 
> 

> > > > cold_load;
> > > 
> > 
> 
> > > > for() {
> > > 
> > 
> 
> > > > if (cond) {
> > > 
> > 
> 
> > > > cold_code;
> > > 
> > 
> 

> > > > }
> > > 
> > 
> 
> > > > }
> > > 
> > 
> 
> > > > store;
> > > 
> > 
> 

> > > > after other code motion:
> > > 
> > 
> 

> > > > cold_load;
> > > 
> > 
> 
> > > > store;
> > > 
> > 
> 
> > > > for() {
> > > 
> > 
> 
> > > > if (cond) {
> > > 
> > 
> 
> > > > cold_code;
> > > 
> > 
> 

> > > > }
> > > 
> > 
> 
> > > > }
> > > 
> > 
> 

> > > > then later in cgp, when you want to sink cold_load to its uses,
> > > > the
> > > > store may prevent the sinking due to aliasing.
> > > 
> > 
> 

> > > > On Wed, May 11, 2016 at 10:58 AM, Xinliang David Li <
> > > > davidxl at google.com > wrote:
> > > 
> > 
> 

> > > > > On Tue, May 10, 2016 at 3:14 PM, Hal Finkel < hfinkel at anl.gov
> > > > > >
> > > > > wrote:
> > > > 
> > > 
> > 
> 

> > > > > > > From: "Dehao Chen" < danielcdh at gmail.com >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > To: "Hal Finkel" < hfinkel at anl.gov >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > Cc: "Xinliang David Li" < davidxl at google.com >,
> > > > > > > reviews+D19950+public+38ba22078c2035b8 at reviews.llvm.org ,
> > > > > > > "David
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > Majnemer" < david.majnemer at gmail.com >, "Junbum Lim" <
> > > > > > > junbuml at codeaurora.org >, mcrosier at codeaurora.org ,
> > > > > > > "llvm-commits"
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > < llvm-commits at lists.llvm.org >, "amara emerson" <
> > > > > > > amara.emerson at arm.com >
> > > > > 
> > > > 
> > > 
> > 
> 

> > > > > > > Sent: Tuesday, May 10, 2016 4:10:49 PM
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > Subject: Re: [PATCH] D19950: Use frequency info to guide
> > > > > > > Loop
> > > > > > > Invariant Code Motion.
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > Thanks for the comment. I spent quite a while to think,
> > > > > > > but
> > > > > > > still
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > cannot think of an optimization that could be unblocked
> > > > > > > by
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > speculatively hoisting an loop invariant from an unlikely
> > > > > > > executed
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > path. Can you give some hint (or an example) on what type
> > > > > > > of
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > optimization can benefit from this case?
> > > > > 
> > > > 
> > > 
> > 
> 

> > > > > > I'm specifically thinking about this case (although I
> > > > > > suspect
> > > > > > there
> > > > > > are others):
> > > > > 
> > > > 
> > > 
> > 
> 

> > > > > > for (...) {
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > if (...) {
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > hoistable
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > cold_stuff
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > }
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > }
> > > > > 
> > > > 
> > > 
> > 
> 

> > > > > > for (...) {
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > if (...) {
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > hoistable
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > hot_stuff
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > }
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > }
> > > > > 
> > > > 
> > > 
> > 
> 

> > > > > > I expect that 'hoistable' will be hoisted by LICM out of
> > > > > > both
> > > > > > loops,
> > > > > > and then CSE'd by GVN.
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > I think this case can be/should be handled by more general
> > > > > profile
> > > > > driver speculative PRE. The above case may not be profitable
> > > > > even
> > > > > after GVN CSEed two expressions. On the other hand,
> > > > 
> > > 
> > 
> 

> > > > > ... = a * b;
> > > > 
> > > 
> > 
> 

> > > > > for (...) {
> > > > 
> > > 
> > 
> 
> > > > > if (cold) {
> > > > 
> > > 
> > 
> 
> > > > > .... = a * b;
> > > > 
> > > 
> > 
> 
> > > > > }
> > > > 
> > > 
> > 
> 
> > > > > }
> > > > 
> > > 
> > 
> 

> > > > > It will be good to hoist and CSE. Though in this case, we do
> > > > > not
> > > > > need
> > > > > LICM to enable this CSE. Another case:
> > > > 
> > > 
> > 
> 

> > > > > if (....) {
> > > > 
> > > 
> > 
> 
> > > > > ... = a*b;
> > > > 
> > > 
> > 
> 
> > > > > }
> > > > 
> > > 
> > 
> 

> > > > > for (....) {
> > > > 
> > > 
> > 
> 
> > > > > if (cold) {
> > > > 
> > > 
> > 
> 
> > > > > ... = a * b;
> > > > 
> > > 
> > 
> 
> > > > > }
> > > > 
> > > 
> > 
> 
> > > > > }
> > > > 
> > > 
> > 
> 

> > > > > Depending on the profile, it might be profitable to do:
> > > > 
> > > 
> > 
> 

> > > > > t = a * b;
> > > > 
> > > 
> > 
> 
> > > > > if (...) {
> > > > 
> > > 
> > 
> 
> > > > > .. = t;
> > > > 
> > > 
> > 
> 
> > > > > }
> > > > 
> > > 
> > 
> 
> > > > > for (...) {
> > > > 
> > > 
> > 
> 
> > > > > if (cold) {
> > > > 
> > > 
> > 
> 
> > > > > .. = t ;
> > > > 
> > > 
> > 
> 
> > > > > }
> > > > 
> > > 
> > 
> 
> > > > > }
> > > > 
> > > 
> > 
> 

> > > > > Again, LICM won't be necessary to enable this.
> > > > 
> > > 
> > 
> 

> > > > > > One might also imagine cases where the two hoistable
> > > > > > sections
> > > > > > are
> > > > > > SLP
> > > > > > vectorized.
> > > > > 
> > > > 
> > > 
> > 
> 

> > > > > Will that make it harder to undo the damage later ?
> > > > 
> > > 
> > 
> 

> > > > > > Failing to host the code might also prevent loop
> > > > > > unswitching
> > > > > > (by
> > > > > > failing to reduce the size of the loop body below the
> > > > > > threshold
> > > > > > size).
> > > > > 
> > > > 
> > > 
> > 
> 

> > > > > There are always existing cleanups that can only happen after
> > > > > loop-unswitching happens. IMO, loop unswiitching, like
> > > > > inliner
> > > > > should also look at the code state if the transformation
> > > > > happens.
> > > > 
> > > 
> > 
> 

> > > > > > Another potential issue is that the hoistable code might be
> > > > > > cold,
> > > > > > and
> > > > > > relatively cheap to hoist, but expensive to vectorize. As a
> > > > > > result,
> > > > > > failing to hoist the code might block otherwise-profitable
> > > > > > vectorization. Which reminds me, we need to fix the
> > > > > > vectorizer's
> > > > > > if-conversion heuristic to use profiling information too ;)
> > > > > 
> > > > 
> > > 
> > 
> 

> > > > > SLP vectorize? Any example like this? Can vectorizor be
> > > > > enhanced
> > > > > so
> > > > > that it can be done in absence of the hoisting?
> > > > 
> > > 
> > 
> 

> > > > > thanks,
> > > > 
> > > 
> > 
> 

> > > > > David
> > > > 
> > > 
> > 
> 

> > > > > > Thanks again,
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > Hal
> > > > > 
> > > > 
> > > 
> > 
> 

> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > Thanks,
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > Dehao
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > On Tue, May 10, 2016 at 1:58 PM, Hal Finkel <
> > > > > > > hfinkel at anl.gov
> > > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > wrote:
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > From: "Xinliang David Li" < davidxl at google.com >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > To: "Dehao Chen" < danielcdh at gmail.com >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > Cc:
> > > > > > > reviews+D19950+public+38ba22078c2035b8 at reviews.llvm.org
> > > > > > > ,
> > > > > > > "David
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > Majnemer" < david.majnemer at gmail.com >, "Hal Finkel" <
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > hfinkel at anl.gov >, "Junbum Lim" < junbuml at codeaurora.org
> > > > > > > >,
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > mcrosier at codeaurora.org , "llvm-commits" <
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > llvm-commits at lists.llvm.org >, "amara emerson" <
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > amara.emerson at arm.com >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > Sent: Tuesday, May 10, 2016 3:15:24 PM
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > Subject: Re: [PATCH] D19950: Use frequency info to guide
> > > > > > > Loop
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > Invariant Code Motion.
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > On Tue, May 10, 2016 at 1:03 PM, Dehao Chen <
> > > > > > > danielcdh at gmail.com
> > > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > wrote:
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > On Tue, May 10, 2016 at 11:48 AM, Xinliang David Li <
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > davidxl at google.com > wrote:
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > On Tue, May 10, 2016 at 11:01 AM, Dehao Chen <
> > > > > > > danielcdh at gmail.com
> > > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > wrote:
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > danielcdh added a comment.
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > In http://reviews.llvm.org/D19950#425287 , @hfinkel
> > > > > > > wrote:
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > In http://reviews.llvm.org/D19950#425286 , @hfinkel
> > > > > > > > wrote:
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 

> > > > > > > > > In http://reviews.llvm.org/D19950#425285 , @davidxl
> > > > > > > > > wrote:
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > Static prediction has been conservative in
> > > > > > > > > > estimating
> > > > > > > > > > loop
> > > > > > > > > > trip
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > count -- it produces something like 30ish
> > > > > > > > > > iterations.
> > > > > > > > > > If
> > > > > > > > > > the
> > > > > > > > > > a
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > very hot loop has a big if-then-else (or switch),
> > > > > > > > > > it
> > > > > > > > > > is
> > > > > > > > > > very
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > likely to mark many bbs' to be colder than the loop
> > > > > > > > > > header.
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > Turning on this for static prediction really
> > > > > > > > > > depends
> > > > > > > > > > on
> > > > > > > > > > the
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > false rate. It seems to be this can get wrong
> > > > > > > > > > pretty
> > > > > > > > > > easily
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > for very hot loops (which is also the most
> > > > > > > > > > important
> > > > > > > > > > thing
> > > > > > > > > > to
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > > optimize for).
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > This is a good point. There's no universal
> > > > > > > > > conservative
> > > > > > > > > choice
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > (assuming a small trip count is conservative in some
> > > > > > > > > cases,
> > > > > > > > > and
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > > assuming a large trip count is conservative in other
> > > > > > > > > cases).
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > Would it be better (and practical) if there were some
> > > > > > > > way
> > > > > > > > for
> > > > > > > > the
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > BFI client to specify which kind of 'conservative' is
> > > > > > > > desired?
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > Also, why are we doing this instead of sinking later
> > > > > > > > (in
> > > > > > > > CGP
> > > > > > > > or
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > similar)? LICM can expose optimization opportunities,
> > > > > > > > plus
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > represents a code pattern the user might input
> > > > > > > > manually.
> > > > > > > > Sinking
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > > later seems more robust.
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > I looked at CGP pass, looks like it's handling the
> > > > > > > sinking
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > case-by-case (e.g. there is separate routine to handle
> > > > > > > sinking
> > > > > > > of
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > load, gep, etc. I'm afraid this would miss opportunities.
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > Additionally, the file-level comment of CGP pass says
> > > > > > > "This
> > > > > > > works
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > around limitations in it's basic-block-at-a-time
> > > > > > > approach.
> > > > > > > It
> > > > > > > should
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > eventually be removed."
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > Yes, but it will be "removed" when the entire subsystem
> > > > > > > is
> > > > > > > replaced
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > by GlobalISel, and we'll certainly need to make
> > > > > > > GlobalISel
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > profiling-data aware, so I expect this is the right path
> > > > > > > forward
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > regardless. I agree, however, that we want a general
> > > > > > > sinking
> > > > > > > here
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > based on profiling data, not just the specific existing
> > > > > > > heuristics
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > for loads, GEPs, etc.
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > Perhaps you can do profile driven sinking CGP separately
> > > > > > > to
> > > > > > > handle
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > manually hoisted code situation mentioned by Hal.
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > Do you mean we still use frequency to decide whether to
> > > > > > > hoist
> > > > > > > code
> > > > > > > in
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > LICM, additionally use frequency info to check if we want
> > > > > > > to
> > > > > > > sink
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > instructions in CGP?
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > yes -- that is the suggestion. I'd prefer that we try to
> > > > > > > sink
> > > > > > > late
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > first, and only if there are use cases that we can't
> > > > > > > handle
> > > > > > > this
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > way, we consider throttling hoisting early. If we come
> > > > > > > across
> > > > > > > such
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > use cases, I'd like to understand them better. Hoisting
> > > > > > > can
> > > > > > > expose
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > other optimization opportunities, and you lose those
> > > > > > > opportunities
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > if you don't hoist in the first place.
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > -Hal
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > David
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > Dehao
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > David
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > I'm not quite clear why it helps to move code out of loop
> > > > > > > early
> > > > > > > and
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > later sink it inside. Could you give an example or some
> > > > > > > more
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > context?
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > Thanks,
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > Dehao
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > http://reviews.llvm.org/D19950
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > --
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > Hal Finkel
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > Assistant Computational Scientist
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > Leadership Computing Facility
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > > Argonne National Laboratory
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > > 
> > > > 
> > > 
> > 
> 

> > > > > > --
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > Hal Finkel
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > Assistant Computational Scientist
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > Leadership Computing Facility
> > > > > 
> > > > 
> > > 
> > 
> 
> > > > > > Argonne National Laboratory
> > > > > 
> > > > 
> > > 
> > 
> 

> > --
> 

> > Hal Finkel
> 
> > Assistant Computational Scientist
> 
> > Leadership Computing Facility
> 
> > Argonne National Laboratory
> 

-- 

Hal Finkel 
Assistant Computational Scientist 
Leadership Computing Facility 
Argonne National Laboratory 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160512/0aa4c6ad/attachment.html>


More information about the llvm-commits mailing list