[PATCH] D19950: Use frequency info to guide Loop Invariant Code Motion.

Wed May 11 10:58:24 PDT 2016

On Tue, May 10, 2016 at 3:14 PM, Hal Finkel <hfinkel at anl.gov> wrote:

> ----- Original Message -----
> > From: "Dehao Chen" <danielcdh at gmail.com>
> > To: "Hal Finkel" <hfinkel at anl.gov>
> > Cc: "Xinliang David Li" <davidxl at google.com>,
> reviews+D19950+public+38ba22078c2035b8 at reviews.llvm.org, "David
> > Majnemer" <david.majnemer at gmail.com>, "Junbum Lim" <
> junbuml at codeaurora.org>, mcrosier at codeaurora.org, "llvm-commits"
> > <llvm-commits at lists.llvm.org>, "amara emerson" <amara.emerson at arm.com>
> > Sent: Tuesday, May 10, 2016 4:10:49 PM
> > Subject: Re: [PATCH] D19950: Use frequency info to guide Loop Invariant
> Code Motion.
> >
> >
> > Thanks for the comment. I spent quite a while to think, but still
> > cannot think of an optimization that could be unblocked by
> > speculatively hoisting an loop invariant from an unlikely executed
> > path. Can you give some hint (or an example) on what type of
> > optimization can benefit from this case?
>
> I'm specifically thinking about this case (although I suspect there are
> others):
>
>  for (...) {
>    if (...) {
>      hoistable
>      cold_stuff
>    }
>  }
>
>  for (...) {
>    if (...) {
>      hoistable
>      hot_stuff
>    }
>  }
>
> I expect that 'hoistable' will be hoisted by LICM out of both loops, and
> then CSE'd by GVN.

I think this case can be/should be handled by more general profile driver
speculative PRE.  The above case may not be profitable even after GVN CSEed
two expressions. On the other hand,

... = a * b;

for (...) {
   if (cold) {
      .... = a * b;
   }
 }

It will be good to hoist and CSE. Though in this case, we do not need LICM
to enable this CSE.   Another case:

if (....) {
    ... = a*b;
 }

for (....) {
   if (cold) {
      ... = a * b;
    }
 }

Depending on the profile, it might be profitable to do:

t = a * b;
 if (...) {
    .. = t;
 }
for (...) {
   if (cold) {
      .. = t ;
    }
}

Again, LICM won't be necessary to enable this.

> One might also imagine cases where the two hoistable sections are SLP
> vectorized.
>

Will that make it harder to undo the damage later ?

>
> Failing to host the code might also prevent loop unswitching (by failing
> to reduce the size of the loop body below the threshold size).
>

There are always existing cleanups that can only happen after
loop-unswitching happens. IMO, loop unswiitching, like inliner should also
look at the code state if the transformation happens.

>
> Another potential issue is that the hoistable code might be cold, and
> relatively cheap to hoist, but expensive to vectorize. As a result, failing
> to hoist the code might block otherwise-profitable vectorization. Which
> reminds me, we need to fix the vectorizer's if-conversion heuristic to use
> profiling information too ;)
>

SLP vectorize? Any example like this? Can vectorizor be enhanced so that it
can be done in absence of the hoisting?

thanks,

David

>
> Thanks again,
> Hal
>
> >
> > Thanks,
> > Dehao
> >
> >
> > On Tue, May 10, 2016 at 1:58 PM, Hal Finkel < hfinkel at anl.gov >
> > wrote:
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > From: "Xinliang David Li" < davidxl at google.com >
> > To: "Dehao Chen" < danielcdh at gmail.com >
> > Cc: reviews+D19950+public+38ba22078c2035b8 at reviews.llvm.org , "David
> > Majnemer" < david.majnemer at gmail.com >, "Hal Finkel" <
> > hfinkel at anl.gov >, "Junbum Lim" < junbuml at codeaurora.org >,
> > mcrosier at codeaurora.org , "llvm-commits" <
> > llvm-commits at lists.llvm.org >, "amara emerson" <
> > amara.emerson at arm.com >
> > Sent: Tuesday, May 10, 2016 3:15:24 PM
> > Subject: Re: [PATCH] D19950: Use frequency info to guide Loop
> > Invariant Code Motion.
> >
> >
> >
> >
> >
> >
> > On Tue, May 10, 2016 at 1:03 PM, Dehao Chen < danielcdh at gmail.com >
> > wrote:
> >
> >
> >
> >
> >
> >
> >
> > On Tue, May 10, 2016 at 11:48 AM, Xinliang David Li <
> > davidxl at google.com > wrote:
> >
> >
> >
> >
> >
> >
> >
> > On Tue, May 10, 2016 at 11:01 AM, Dehao Chen < danielcdh at gmail.com >
> > wrote:
> >
> >
> > danielcdh added a comment.
> >
> > In http://reviews.llvm.org/D19950#425287 , @hfinkel wrote:
> >
> > > In http://reviews.llvm.org/D19950#425286 , @hfinkel wrote:
> > >
> > > > In http://reviews.llvm.org/D19950#425285 , @davidxl wrote:
> > > >
> > > > > Static prediction has been conservative in estimating loop trip
> > > > > count -- it produces something like 30ish iterations. If the a
> > > > > very hot loop has a big if-then-else (or switch), it is very
> > > > > likely to mark many bbs' to be colder than the loop header.
> > > > > Turning on this for static prediction really depends on the
> > > > > false rate. It seems to be this can get wrong pretty easily
> > > > > for very hot loops (which is also the most important thing to
> > > > > optimize for).
> > > >
> > > >
> > > > This is a good point. There's no universal conservative choice
> > > > (assuming a small trip count is conservative in some cases, and
> > > > assuming a large trip count is conservative in other cases).
> > >
> > >
> > > Would it be better (and practical) if there were some way for the
> > > BFI client to specify which kind of 'conservative' is desired?
> > >
> > > Also, why are we doing this instead of sinking later (in CGP or
> > > similar)? LICM can expose optimization opportunities, plus
> > > represents a code pattern the user might input manually. Sinking
> > > later seems more robust.
> >
> >
> > I looked at CGP pass, looks like it's handling the sinking
> > case-by-case (e.g. there is separate routine to handle sinking of
> > load, gep, etc. I'm afraid this would miss opportunities.
> > Additionally, the file-level comment of CGP pass says "This works
> > around limitations in it's basic-block-at-a-time approach. It should
> > eventually be removed."
> > Yes, but it will be "removed" when the entire subsystem is replaced
> > by GlobalISel, and we'll certainly need to make GlobalISel
> > profiling-data aware, so I expect this is the right path forward
> > regardless. I agree, however, that we want a general sinking here
> > based on profiling data, not just the specific existing heuristics
> > for loads, GEPs, etc.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > Perhaps you can do profile driven sinking CGP separately to handle
> > manually hoisted code situation mentioned by Hal.
> >
> >
> > Do you mean we still use frequency to decide whether to hoist code in
> > LICM, additionally use frequency info to check if we want to sink
> > instructions in CGP?
> >
> >
> >
> >
> > yes -- that is the suggestion. I'd prefer that we try to sink late
> > first, and only if there are use cases that we can't handle this
> > way, we consider throttling hoisting early. If we come across such
> > use cases, I'd like to understand them better. Hoisting can expose
> > other optimization opportunities, and you lose those opportunities
> > if you don't hoist in the first place.
> >
> > -Hal
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > David
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > Dehao
> >
> >
> >
> >
> >
> >
> >
> >
> > David
> >
> >
> >
> > I'm not quite clear why it helps to move code out of loop early and
> > later sink it inside. Could you give an example or some more
> > context?
> >
> > Thanks,
> > Dehao
> >
> >
> > http://reviews.llvm.org/D19950
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > --
> >
> > Hal Finkel
> > Assistant Computational Scientist
> > Leadership Computing Facility
> > Argonne National Laboratory
> >
> >
>
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160511/66837a4e/attachment.html>