[PATCH] D19950: Use frequency info to guide Loop Invariant Code Motion.

Wed May 25 11:42:11 PDT 2016

----- Original Message -----

> From: "Dehao Chen" <danielcdh at gmail.com>
> To: "Xinliang David Li" <davidxl at google.com>
> Cc: "Hal Finkel" <hfinkel at anl.gov>,
> reviews+D19950+public+38ba22078c2035b8 at reviews.llvm.org, "David
> Majnemer" <david.majnemer at gmail.com>, "Junbum Lim"
> <junbuml at codeaurora.org>, mcrosier at codeaurora.org, "llvm-commits"
> <llvm-commits at lists.llvm.org>, "amara emerson"
> <amara.emerson at arm.com>
> Sent: Tuesday, May 17, 2016 1:28:45 PM
> Subject: Re: [PATCH] D19950: Use frequency info to guide Loop
> Invariant Code Motion.

> I was trying to add a IR level sinking pass for this. But for my
> simple testcase:

> ; int g;
> ; int foo(int p, int x) {
> ; for (int i = 0; i != x; i++)
> ; if (__builtin_expect(i == p, 0)) {
> ; x += g; x *= g;
> ; }
> ; return x;
> ; }

> After LICM hoists the load of global variable g, "x+=g;x*=g" is also
> hoisted to form a select statement in header:

> . lr.ph : ; preds = %. lr.ph , %.lr.ph.preheader
> %.03 = phi i32 [ %8, %. lr.ph ], [ 0, %.lr.ph.preheader ]
> %.012 = phi i32 [ %.1, %. lr.ph ], [ %1, %.lr.ph.preheader ]
> %5 = icmp eq i32 %.03, %0
> %6 = add nsw i32 %4, %.012
> %7 = mul nsw i32 %6, %4
> %.1 = select i1 %5, i32 %7, i32 %.012, !prof !1
> %8 = add nuw nsw i32 %.03, 1
> %9 = icmp eq i32 %8, %.1
> br i1 %9, label %._crit_edge, label %. lr.ph

> Later in CGP, it's expanded to:

> . lr.ph : ; preds = %select.end, %.lr.ph.preheader
> %.03 = phi i32 [ %8, %select.end ], [ 0, %.lr.ph.preheader ]
> %.012 = phi i32 [ %.1, %select.end ], [ %1, %.lr.ph.preheader ]
> %5 = icmp eq i32 %0, %.03
> %6 = add nsw i32 %.012, %4
> %7 = mul nsw i32 %6, %4
> br i1 %5, label %select.end, label %select.false

> select.false: ; preds = %. lr.ph
> br label %select.end

> select.end: ; preds = %. lr.ph , %select.false
> %.1 = phi i32 [ %7, %. lr.ph ], [ %.012, %select.false ]
> %8 = add nuw nsw i32 %.03, 1
> %9 = icmp eq i32 %8, %.1
> br i1 %9, label %._crit_edge, label %. lr.ph

> So if we want to sink the load from preheader to the branch, we need
> to:
> 1. create a select.true basic block
> 2. sink %6 and %7 to that basic block
> 3. sink load instruction to that basic block

> This is doable, but seems doing a lot of redundant work of
> MachineSinking pass. And it could possibly affect optimizations
> between CGP and MachineSinking.

> Alternatively, we can also do the sinking once in MachineSinking, and
> make it handle more general cases like this one. But from the
> MachineSink.cpp, I saw the comment:

> // This pass is not intended to be a replacement or a complete
> alternative
> // for an LLVM-IR-level sinking pass. It is only designed to sink
> simple
> // constructs that are not exposed before lowering and instruction
> selection.

> Which confuses me: looks like we don't even have a IR-level sinking
> pass. Why would we want IR-level sinking pass instead of one single
> sinking pass at machine level?
I think that removing this comment and improving the MachineSinking pass is certainly reasonable. 

-Hal 

> Thanks,
> Dehao

> On Tue, May 17, 2016 at 11:03 AM, Xinliang David Li <
> davidxl at google.com > wrote:

> > On Thu, May 12, 2016 at 6:37 PM, Hal Finkel < hfinkel at anl.gov >
> > wrote:
> 

> > > > From: "Xinliang David Li" < davidxl at google.com >
> > > 
> > 
> 
> > > > To: "Hal Finkel" < hfinkel at anl.gov >
> > > 
> > 
> 
> > > > Cc: "Dehao Chen" < danielcdh at gmail.com >,
> > > > reviews+D19950+public+38ba22078c2035b8 at reviews.llvm.org ,
> > > > "David
> > > > Majnemer" < david.majnemer at gmail.com >, "Junbum Lim" <
> > > > junbuml at codeaurora.org >, mcrosier at codeaurora.org ,
> > > > "llvm-commits"
> > > > <
> > > > llvm-commits at lists.llvm.org >, "amara emerson" <
> > > > amara.emerson at arm.com >
> > > 
> > 
> 
> > > > Sent: Wednesday, May 11, 2016 12:58:24 PM
> > > 
> > 
> 
> > > > Subject: Re: [PATCH] D19950: Use frequency info to guide Loop
> > > > Invariant Code Motion.
> > > 
> > 
> 

> > > > On Tue, May 10, 2016 at 3:14 PM, Hal Finkel < hfinkel at anl.gov >
> > > > wrote:
> > > 
> > 
> 

> > > > > > From: "Dehao Chen" < danielcdh at gmail.com >
> > > > 
> > > 
> > 
> 
> > > > > > To: "Hal Finkel" < hfinkel at anl.gov >
> > > > 
> > > 
> > 
> 
> > > > > > Cc: "Xinliang David Li" < davidxl at google.com >,
> > > > > > reviews+D19950+public+38ba22078c2035b8 at reviews.llvm.org ,
> > > > > > "David
> > > > 
> > > 
> > 
> 
> > > > > > Majnemer" < david.majnemer at gmail.com >, "Junbum Lim" <
> > > > > > junbuml at codeaurora.org >, mcrosier at codeaurora.org ,
> > > > > > "llvm-commits"
> > > > 
> > > 
> > 
> 
> > > > > > < llvm-commits at lists.llvm.org >, "amara emerson" <
> > > > > > amara.emerson at arm.com >
> > > > 
> > > 
> > 
> 
> > > > > > Sent: Tuesday, May 10, 2016 4:10:49 PM
> > > > 
> > > 
> > 
> 
> > > > > > Subject: Re: [PATCH] D19950: Use frequency info to guide
> > > > > > Loop
> > > > > > Invariant Code Motion.
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > Thanks for the comment. I spent quite a while to think, but
> > > > > > still
> > > > 
> > > 
> > 
> 
> > > > > > cannot think of an optimization that could be unblocked by
> > > > 
> > > 
> > 
> 
> > > > > > speculatively hoisting an loop invariant from an unlikely
> > > > > > executed
> > > > 
> > > 
> > 
> 
> > > > > > path. Can you give some hint (or an example) on what type
> > > > > > of
> > > > 
> > > 
> > 
> 
> > > > > > optimization can benefit from this case?
> > > > 
> > > 
> > 
> 

> > > > > I'm specifically thinking about this case (although I suspect
> > > > > there
> > > > > are others):
> > > > 
> > > 
> > 
> 

> > > > > for (...) {
> > > > 
> > > 
> > 
> 
> > > > > if (...) {
> > > > 
> > > 
> > 
> 
> > > > > hoistable
> > > > 
> > > 
> > 
> 
> > > > > cold_stuff
> > > > 
> > > 
> > 
> 
> > > > > }
> > > > 
> > > 
> > 
> 
> > > > > }
> > > > 
> > > 
> > 
> 

> > > > > for (...) {
> > > > 
> > > 
> > 
> 
> > > > > if (...) {
> > > > 
> > > 
> > 
> 
> > > > > hoistable
> > > > 
> > > 
> > 
> 
> > > > > hot_stuff
> > > > 
> > > 
> > 
> 
> > > > > }
> > > > 
> > > 
> > 
> 
> > > > > }
> > > > 
> > > 
> > 
> 

> > > > > I expect that 'hoistable' will be hoisted by LICM out of both
> > > > > loops,
> > > > > and then CSE'd by GVN.
> > > > 
> > > 
> > 
> 
> > > > I think this case can be/should be handled by more general
> > > > profile
> > > > driver speculative PRE. The above case may not be profitable
> > > > even
> > > > after GVN CSEed two expressions.
> > > 
> > 
> 
> > > Why not? It is performance neutral and a code-size improvement.
> > 
> 
> > It may not be performance neutral for the above case -- it depends
> > on
> > how cold the if-block is.
> 

> > > > .
> > > 
> > 
> 
> > > Loop unswitching could certainly be smarter about how it
> > > estimates
> > > the cost of the loop body, but if there are nested conditions, I
> > > don't think that always helps. For example, if we have this:
> > 
> 

> > > for (...) {
> > 
> 
> > > if (c1) {
> > 
> 
> > > ...
> > 
> 
> > > if (c2) {
> > 
> 
> > > hoistable
> > 
> 
> > > cold_stuff
> > 
> 
> > > }
> > 
> 
> > > ...
> > 
> 
> > > }
> > 
> 

> > > ...
> > 
> 
> > > }
> > 
> 

> > > then the code-size cost of unswitching on condition c1 is rightly
> > > impacted by whether the hoistable code is still in the loop body.
> > 
> 

> > Ok.
> 

> > > > > Another potential issue is that the hoistable code might be
> > > > > cold,
> > > > > and
> > > > > relatively cheap to hoist, but expensive to vectorize. As a
> > > > > result,
> > > > > failing to hoist the code might block otherwise-profitable
> > > > > vectorization. Which reminds me, we need to fix the
> > > > > vectorizer's
> > > > > if-conversion heuristic to use profiling information too ;)
> > > > 
> > > 
> > 
> 

> > > > SLP vectorize? Any example like this? Can vectorizor be
> > > > enhanced
> > > > so
> > > > that it can be done in absence of the hoisting?
> > > 
> > 
> 
> > > I'm referring to the loop vectorizer. I'm not sure it is an issue
> > > of
> > > potential enhancement, the issue seems fundamental. The "cost" of
> > > if-conversion is proportional to the increased execution cost,
> > > and
> > > that's proportional to the number of instructions in the
> > > conditionally-executed block. Even more so if the hoistable
> > > operations would need to be scalarized under the targeted ISA.
> > > The
> > > more instructions need to be if-converted, and the colder the
> > > block,
> > > the lower the profitability of vectorizing (because, once
> > > vectorized, the instructions now execute on every iteration).
> > 
> 

> > This is a reasonable concern.
> 

> > > In any case, for the same reason we don't want to throttle LICM
> > > based
> > > on register pressure, I don't think we want to do it here either.
> > > There is value is having a canonical form where things execute as
> > > soon as possible, because it means that we can use dominance to
> > > determine the frontier of legal placement. Otherwise we need to
> > > teach a myriad of other passes how to hoist in combination with
> > > their core functionality. Maybe we'll come up with some
> > > fundamental
> > > reason why sinking late is insufficient, but I don't think I've
> > > seen
> > > one yet.
> > 
> 

> > Ok.
> 

> > thanks,
> 

> > David
> 

> > > Thanks again,
> > 
> 
> > > Hal
> > 
> 

> > > > thanks,
> > > 
> > 
> 

> > > > David
> > > 
> > 
> 

> > > > > Thanks again,
> > > > 
> > > 
> > 
> 
> > > > > Hal
> > > > 
> > > 
> > 
> 

> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > Thanks,
> > > > 
> > > 
> > 
> 
> > > > > > Dehao
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > On Tue, May 10, 2016 at 1:58 PM, Hal Finkel <
> > > > > > hfinkel at anl.gov
> > > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > wrote:
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > From: "Xinliang David Li" < davidxl at google.com >
> > > > 
> > > 
> > 
> 
> > > > > > To: "Dehao Chen" < danielcdh at gmail.com >
> > > > 
> > > 
> > 
> 
> > > > > > Cc: reviews+D19950+public+38ba22078c2035b8 at reviews.llvm.org
> > > > > > ,
> > > > > > "David
> > > > 
> > > 
> > 
> 
> > > > > > Majnemer" < david.majnemer at gmail.com >, "Hal Finkel" <
> > > > 
> > > 
> > 
> 
> > > > > > hfinkel at anl.gov >, "Junbum Lim" < junbuml at codeaurora.org >,
> > > > 
> > > 
> > 
> 
> > > > > > mcrosier at codeaurora.org , "llvm-commits" <
> > > > 
> > > 
> > 
> 
> > > > > > llvm-commits at lists.llvm.org >, "amara emerson" <
> > > > 
> > > 
> > 
> 
> > > > > > amara.emerson at arm.com >
> > > > 
> > > 
> > 
> 
> > > > > > Sent: Tuesday, May 10, 2016 3:15:24 PM
> > > > 
> > > 
> > 
> 
> > > > > > Subject: Re: [PATCH] D19950: Use frequency info to guide
> > > > > > Loop
> > > > 
> > > 
> > 
> 
> > > > > > Invariant Code Motion.
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > On Tue, May 10, 2016 at 1:03 PM, Dehao Chen <
> > > > > > danielcdh at gmail.com
> > > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > wrote:
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > On Tue, May 10, 2016 at 11:48 AM, Xinliang David Li <
> > > > 
> > > 
> > 
> 
> > > > > > davidxl at google.com > wrote:
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > On Tue, May 10, 2016 at 11:01 AM, Dehao Chen <
> > > > > > danielcdh at gmail.com
> > > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > wrote:
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > danielcdh added a comment.
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > In http://reviews.llvm.org/D19950#425287 , @hfinkel wrote:
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > > In http://reviews.llvm.org/D19950#425286 , @hfinkel
> > > > > > > wrote:
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > 
> > > 
> > 
> 

> > > > > > > > In http://reviews.llvm.org/D19950#425285 , @davidxl
> > > > > > > > wrote:
> > > > 
> > > 
> > 
> 
> > > > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > > > > Static prediction has been conservative in estimating
> > > > > > > > > loop
> > > > > > > > > trip
> > > > 
> > > 
> > 
> 
> > > > > > > > > count -- it produces something like 30ish iterations.
> > > > > > > > > If
> > > > > > > > > the
> > > > > > > > > a
> > > > 
> > > 
> > 
> 
> > > > > > > > > very hot loop has a big if-then-else (or switch), it
> > > > > > > > > is
> > > > > > > > > very
> > > > 
> > > 
> > 
> 
> > > > > > > > > likely to mark many bbs' to be colder than the loop
> > > > > > > > > header.
> > > > 
> > > 
> > 
> 
> > > > > > > > > Turning on this for static prediction really depends
> > > > > > > > > on
> > > > > > > > > the
> > > > 
> > > 
> > 
> 
> > > > > > > > > false rate. It seems to be this can get wrong pretty
> > > > > > > > > easily
> > > > 
> > > 
> > 
> 
> > > > > > > > > for very hot loops (which is also the most important
> > > > > > > > > thing
> > > > > > > > > to
> > > > 
> > > 
> > 
> 
> > > > > > > > > optimize for).
> > > > 
> > > 
> > 
> 
> > > > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > > > This is a good point. There's no universal conservative
> > > > > > > > choice
> > > > 
> > > 
> > 
> 
> > > > > > > > (assuming a small trip count is conservative in some
> > > > > > > > cases,
> > > > > > > > and
> > > > 
> > > 
> > 
> 
> > > > > > > > assuming a large trip count is conservative in other
> > > > > > > > cases).
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > > Would it be better (and practical) if there were some way
> > > > > > > for
> > > > > > > the
> > > > 
> > > 
> > 
> 
> > > > > > > BFI client to specify which kind of 'conservative' is
> > > > > > > desired?
> > > > 
> > > 
> > 
> 
> > > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > > Also, why are we doing this instead of sinking later (in
> > > > > > > CGP
> > > > > > > or
> > > > 
> > > 
> > 
> 
> > > > > > > similar)? LICM can expose optimization opportunities,
> > > > > > > plus
> > > > 
> > > 
> > 
> 
> > > > > > > represents a code pattern the user might input manually.
> > > > > > > Sinking
> > > > 
> > > 
> > 
> 
> > > > > > > later seems more robust.
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > I looked at CGP pass, looks like it's handling the sinking
> > > > 
> > > 
> > 
> 
> > > > > > case-by-case (e.g. there is separate routine to handle
> > > > > > sinking
> > > > > > of
> > > > 
> > > 
> > 
> 
> > > > > > load, gep, etc. I'm afraid this would miss opportunities.
> > > > 
> > > 
> > 
> 
> > > > > > Additionally, the file-level comment of CGP pass says "This
> > > > > > works
> > > > 
> > > 
> > 
> 
> > > > > > around limitations in it's basic-block-at-a-time approach.
> > > > > > It
> > > > > > should
> > > > 
> > > 
> > 
> 
> > > > > > eventually be removed."
> > > > 
> > > 
> > 
> 
> > > > > > Yes, but it will be "removed" when the entire subsystem is
> > > > > > replaced
> > > > 
> > > 
> > 
> 
> > > > > > by GlobalISel, and we'll certainly need to make GlobalISel
> > > > 
> > > 
> > 
> 
> > > > > > profiling-data aware, so I expect this is the right path
> > > > > > forward
> > > > 
> > > 
> > 
> 
> > > > > > regardless. I agree, however, that we want a general
> > > > > > sinking
> > > > > > here
> > > > 
> > > 
> > 
> 
> > > > > > based on profiling data, not just the specific existing
> > > > > > heuristics
> > > > 
> > > 
> > 
> 
> > > > > > for loads, GEPs, etc.
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > Perhaps you can do profile driven sinking CGP separately to
> > > > > > handle
> > > > 
> > > 
> > 
> 
> > > > > > manually hoisted code situation mentioned by Hal.
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > Do you mean we still use frequency to decide whether to
> > > > > > hoist
> > > > > > code
> > > > > > in
> > > > 
> > > 
> > 
> 
> > > > > > LICM, additionally use frequency info to check if we want
> > > > > > to
> > > > > > sink
> > > > 
> > > 
> > 
> 
> > > > > > instructions in CGP?
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > yes -- that is the suggestion. I'd prefer that we try to
> > > > > > sink
> > > > > > late
> > > > 
> > > 
> > 
> 
> > > > > > first, and only if there are use cases that we can't handle
> > > > > > this
> > > > 
> > > 
> > 
> 
> > > > > > way, we consider throttling hoisting early. If we come
> > > > > > across
> > > > > > such
> > > > 
> > > 
> > 
> 
> > > > > > use cases, I'd like to understand them better. Hoisting can
> > > > > > expose
> > > > 
> > > 
> > 
> 
> > > > > > other optimization opportunities, and you lose those
> > > > > > opportunities
> > > > 
> > > 
> > 
> 
> > > > > > if you don't hoist in the first place.
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > -Hal
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > David
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > Dehao
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > David
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > I'm not quite clear why it helps to move code out of loop
> > > > > > early
> > > > > > and
> > > > 
> > > 
> > 
> 
> > > > > > later sink it inside. Could you give an example or some
> > > > > > more
> > > > 
> > > 
> > 
> 
> > > > > > context?
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > Thanks,
> > > > 
> > > 
> > 
> 
> > > > > > Dehao
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > http://reviews.llvm.org/D19950
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > --
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > > Hal Finkel
> > > > 
> > > 
> > 
> 
> > > > > > Assistant Computational Scientist
> > > > 
> > > 
> > 
> 
> > > > > > Leadership Computing Facility
> > > > 
> > > 
> > 
> 
> > > > > > Argonne National Laboratory
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 
> > > > > >
> > > > 
> > > 
> > 
> 

> > > > > --
> > > > 
> > > 
> > 
> 
> > > > > Hal Finkel
> > > > 
> > > 
> > 
> 
> > > > > Assistant Computational Scientist
> > > > 
> > > 
> > 
> 
> > > > > Leadership Computing Facility
> > > > 
> > > 
> > 
> 
> > > > > Argonne National Laboratory
> > > > 
> > > 
> > 
> 

> > > --
> > 
> 

> > > Hal Finkel
> > 
> 
> > > Assistant Computational Scientist
> > 
> 
> > > Leadership Computing Facility
> > 
> 
> > > Argonne National Laboratory
> > 
> 

-- 

Hal Finkel 
Assistant Computational Scientist 
Leadership Computing Facility 
Argonne National Laboratory 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160525/d34d6b96/attachment.html>