[PATCH] D19950: Use frequency info to guide Loop Invariant Code Motion.

Tue May 17 11:03:55 PDT 2016

On Thu, May 12, 2016 at 6:37 PM, Hal Finkel <hfinkel at anl.gov> wrote:

>
> ------------------------------
>
> *From: *"Xinliang David Li" <davidxl at google.com>
> *To: *"Hal Finkel" <hfinkel at anl.gov>
> *Cc: *"Dehao Chen" <danielcdh at gmail.com>,
> reviews+D19950+public+38ba22078c2035b8 at reviews.llvm.org, "David Majnemer"
> <david.majnemer at gmail.com>, "Junbum Lim" <junbuml at codeaurora.org>,
> mcrosier at codeaurora.org, "llvm-commits" <llvm-commits at lists.llvm.org>,
> "amara emerson" <amara.emerson at arm.com>
> *Sent: *Wednesday, May 11, 2016 12:58:24 PM
> *Subject: *Re: [PATCH] D19950: Use frequency info to guide Loop Invariant
> Code Motion.
>
>
>
> On Tue, May 10, 2016 at 3:14 PM, Hal Finkel <hfinkel at anl.gov> wrote:
>
>> ------------------------------
>>
>> > From: "Dehao Chen" <danielcdh at gmail.com>
>> > To: "Hal Finkel" <hfinkel at anl.gov>
>> > Cc: "Xinliang David Li" <davidxl at google.com>,
>> reviews+D19950+public+38ba22078c2035b8 at reviews.llvm.org, "David
>> > Majnemer" <david.majnemer at gmail.com>, "Junbum Lim" <
>> junbuml at codeaurora.org>, mcrosier at codeaurora.org, "llvm-commits"
>> > <llvm-commits at lists.llvm.org>, "amara emerson" <amara.emerson at arm.com>
>> > Sent: Tuesday, May 10, 2016 4:10:49 PM
>> > Subject: Re: [PATCH] D19950: Use frequency info to guide Loop Invariant
>> Code Motion.
>> >
>> >
>> > Thanks for the comment. I spent quite a while to think, but still
>> > cannot think of an optimization that could be unblocked by
>> > speculatively hoisting an loop invariant from an unlikely executed
>> > path. Can you give some hint (or an example) on what type of
>> > optimization can benefit from this case?
>>
>> I'm specifically thinking about this case (although I suspect there are
>> others):
>>
>>  for (...) {
>>    if (...) {
>>      hoistable
>>      cold_stuff
>>    }
>>  }
>>
>>  for (...) {
>>    if (...) {
>>      hoistable
>>      hot_stuff
>>    }
>>  }
>>
>> I expect that 'hoistable' will be hoisted by LICM out of both loops, and
>> then CSE'd by GVN.
>
>
>
> I think this case can be/should be handled by more general profile driver
> speculative PRE.  The above case may not be profitable even after GVN CSEed
> two expressions.
>
> Why not? It is performance neutral and a code-size improvement.
>

It may not be performance neutral for the above case -- it depends on how
cold the if-block is.

> .
>
> Loop unswitching could certainly be smarter about how it estimates the
> cost of the loop body, but if there are nested conditions, I don't think
> that always helps. For example, if we have this:
>
> for (...) {
>   if (c1) {
>     ...
>     if (c2) {
>       hoistable
>       cold_stuff
>     }
>     ...
>   }
>
>   ...
> }
>
> then the code-size cost of unswitching on condition c1 is rightly impacted
> by whether the hoistable code is still in the loop body.
>
>
Ok.

>
>
>>
>> Another potential issue is that the hoistable code might be cold, and
>> relatively cheap to hoist, but expensive to vectorize. As a result, failing
>> to hoist the code might block otherwise-profitable vectorization. Which
>> reminds me, we need to fix the vectorizer's if-conversion heuristic to use
>> profiling information too ;)
>>
>
> SLP vectorize? Any example like this? Can vectorizor be enhanced so that
> it can be done in absence of the hoisting?
>
> I'm referring to the loop vectorizer. I'm not sure it is an issue of
> potential enhancement, the issue seems fundamental. The "cost" of
> if-conversion is proportional to the increased execution cost, and that's
> proportional to the number of instructions in the conditionally-executed
> block. Even more so if the hoistable operations would need to be scalarized
> under the targeted ISA. The more instructions need to be if-converted, and
> the colder the block, the lower the profitability of vectorizing (because,
> once vectorized, the instructions now execute on every iteration).
>
>
This is a reasonable concern.

> In any case, for the same reason we don't want to throttle LICM based on
> register pressure, I don't think we want to do it here either. There is
> value is having a canonical form where things execute as soon as possible,
> because it means that we can use dominance to determine the frontier of
> legal placement. Otherwise we need to teach a myriad of other passes how to
> hoist in combination with their core functionality. Maybe we'll come up
> with some fundamental reason why sinking late is insufficient, but I don't
> think I've seen one yet.
>

Ok.

thanks,

David

>
> Thanks again,
> Hal
>
>
> thanks,
>
> David
>
>
>>
>> Thanks again,
>> Hal
>>
>> >
>> > Thanks,
>> > Dehao
>> >
>> >
>> > On Tue, May 10, 2016 at 1:58 PM, Hal Finkel < hfinkel at anl.gov >
>> > wrote:
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > From: "Xinliang David Li" < davidxl at google.com >
>> > To: "Dehao Chen" < danielcdh at gmail.com >
>> > Cc: reviews+D19950+public+38ba22078c2035b8 at reviews.llvm.org , "David
>> > Majnemer" < david.majnemer at gmail.com >, "Hal Finkel" <
>> > hfinkel at anl.gov >, "Junbum Lim" < junbuml at codeaurora.org >,
>> > mcrosier at codeaurora.org , "llvm-commits" <
>> > llvm-commits at lists.llvm.org >, "amara emerson" <
>> > amara.emerson at arm.com >
>> > Sent: Tuesday, May 10, 2016 3:15:24 PM
>> > Subject: Re: [PATCH] D19950: Use frequency info to guide Loop
>> > Invariant Code Motion.
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Tue, May 10, 2016 at 1:03 PM, Dehao Chen < danielcdh at gmail.com >
>> > wrote:
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Tue, May 10, 2016 at 11:48 AM, Xinliang David Li <
>> > davidxl at google.com > wrote:
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Tue, May 10, 2016 at 11:01 AM, Dehao Chen < danielcdh at gmail.com >
>> > wrote:
>> >
>> >
>> > danielcdh added a comment.
>> >
>> > In http://reviews.llvm.org/D19950#425287 , @hfinkel wrote:
>> >
>> > > In http://reviews.llvm.org/D19950#425286 , @hfinkel wrote:
>> > >
>> > > > In http://reviews.llvm.org/D19950#425285 , @davidxl wrote:
>> > > >
>> > > > > Static prediction has been conservative in estimating loop trip
>> > > > > count -- it produces something like 30ish iterations. If the a
>> > > > > very hot loop has a big if-then-else (or switch), it is very
>> > > > > likely to mark many bbs' to be colder than the loop header.
>> > > > > Turning on this for static prediction really depends on the
>> > > > > false rate. It seems to be this can get wrong pretty easily
>> > > > > for very hot loops (which is also the most important thing to
>> > > > > optimize for).
>> > > >
>> > > >
>> > > > This is a good point. There's no universal conservative choice
>> > > > (assuming a small trip count is conservative in some cases, and
>> > > > assuming a large trip count is conservative in other cases).
>> > >
>> > >
>> > > Would it be better (and practical) if there were some way for the
>> > > BFI client to specify which kind of 'conservative' is desired?
>> > >
>> > > Also, why are we doing this instead of sinking later (in CGP or
>> > > similar)? LICM can expose optimization opportunities, plus
>> > > represents a code pattern the user might input manually. Sinking
>> > > later seems more robust.
>> >
>> >
>> > I looked at CGP pass, looks like it's handling the sinking
>> > case-by-case (e.g. there is separate routine to handle sinking of
>> > load, gep, etc. I'm afraid this would miss opportunities.
>> > Additionally, the file-level comment of CGP pass says "This works
>> > around limitations in it's basic-block-at-a-time approach. It should
>> > eventually be removed."
>> > Yes, but it will be "removed" when the entire subsystem is replaced
>> > by GlobalISel, and we'll certainly need to make GlobalISel
>> > profiling-data aware, so I expect this is the right path forward
>> > regardless. I agree, however, that we want a general sinking here
>> > based on profiling data, not just the specific existing heuristics
>> > for loads, GEPs, etc.
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > Perhaps you can do profile driven sinking CGP separately to handle
>> > manually hoisted code situation mentioned by Hal.
>> >
>> >
>> > Do you mean we still use frequency to decide whether to hoist code in
>> > LICM, additionally use frequency info to check if we want to sink
>> > instructions in CGP?
>> >
>> >
>> >
>> >
>> > yes -- that is the suggestion. I'd prefer that we try to sink late
>> > first, and only if there are use cases that we can't handle this
>> > way, we consider throttling hoisting early. If we come across such
>> > use cases, I'd like to understand them better. Hoisting can expose
>> > other optimization opportunities, and you lose those opportunities
>> > if you don't hoist in the first place.
>> >
>> > -Hal
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > David
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > Dehao
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > David
>> >
>> >
>> >
>> > I'm not quite clear why it helps to move code out of loop early and
>> > later sink it inside. Could you give an example or some more
>> > context?
>> >
>> > Thanks,
>> > Dehao
>> >
>> >
>> > http://reviews.llvm.org/D19950
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > --
>> >
>> > Hal Finkel
>> > Assistant Computational Scientist
>> > Leadership Computing Facility
>> > Argonne National Laboratory
>> >
>> >
>>
>> --
>> Hal Finkel
>> Assistant Computational Scientist
>> Leadership Computing Facility
>> Argonne National Laboratory
>>
>
>
>
>
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160517/e731bf2c/attachment-0001.html>