RFC: Tweak heuristics in SimplifyCFG

Mon Feb 9 08:28:27 PST 2015

Hi Hal,

Please find attached two patches implementing what we discussed last week.
The code changes are trivial, there is some slight churn in the tests which
I have attempted to justify.

Cheers,

James

On Sat Feb 07 2015 at 10:07:56 AM James Molloy <james at jamesmolloy.co.uk>
wrote:

> Hi Geoff,
>
> The inverse transform is done in CodeGenPrepare, see
> TargetLoweringInfo::isPredictableSelectExpensive()
>
> Cheers,
>
> James
>
> On Fri Feb 06 2015 at 10:44:04 PM Hal Finkel <hfinkel at anl.gov> wrote:
>
>> ----- Original Message -----
>> > From: "Geoff Berry" <gberry at codeaurora.org>
>> > To: "James Molloy" <james at jamesmolloy.co.uk>, "Hal Finkel" <
>> hfinkel at anl.gov>
>> > Cc: "LLVM Commits" <llvm-commits at cs.uiuc.edu>
>> > Sent: Friday, February 6, 2015 4:38:27 PM
>> > Subject: RE: RFC: Tweak heuristics in SimplifyCFG
>> >
>> >
>> >
>> >
>> > Hi James, Hal,
>> >
>> >
>> >
>> > I’ve been looking into a related issue, and would appreciate your
>> > input on it. We currently convert branches to selects in
>> > SimplifyCFG, but we don’t do the opposite anywhere at the IR level
>> > from what I can tell.
>> > It seems like it might be beneficial to do
>> > this conversion in cases where the un-if converted code would be
>> > rejected by the cost heuristic that was being discussed below. Code
>> > sinking also comes into play, since ideally we would consider the
>> > branching case cost with code sunk into the then/else blocks where
>> > possible/profitable. Come to think of it, it seems like we would
>> > want to do some sinking before the SimplifyCFG case as well to make
>> > a better decision.
>> >
>> >
>> >
>> > Any thoughts on this?
>> >
>>
>> I think that I understand what you're saying, but could you provide a
>> couple quick examples? I'm somewhat skeptical because creating branches
>> from selects generically makes optimization and analysis harder. As a
>> result, we generally delay this until CodeGen when our modeling is more
>> precise.
>>
>>  -Hal
>>
>> >
>> >
>> > Thanks,
>> >
>> >
>> >
>> >
>> > --
>> >
>> > Geoff Berry
>> >
>> > Employee of Qualcomm Innovation Center, Inc.
>> >
>> > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a
>> > Linux Foundation Collaborative Project
>> >
>> >
>> >
>> > From: llvm-commits-bounces at cs.uiuc.edu
>> > [mailto:llvm-commits-bounces at cs.uiuc.edu] On Behalf Of James Molloy
>> > Sent: Friday, February 06, 2015 12:01 PM
>> > To: Hal Finkel
>> > Cc: LLVM Commits
>> > Subject: Re: RFC: Tweak heuristics in SimplifyCFG
>> >
>> >
>> >
>> >
>> > Hi Hal,
>> >
>> >
>> > That's a really good point, I'm on board with that. I'll cook up a
>> > patch soon and send it for review.
>> >
>> >
>> >
>> >
>> >
>> > Cheers,
>> >
>> >
>> >
>> >
>> >
>> > James
>> >
>> >
>> >
>> >
>> > On Fri Feb 06 2015 at 4:53:44 PM Hal Finkel < hfinkel at anl.gov >
>> > wrote:
>> >
>> >
>> >
>> > ----- Original Message -----
>> > > From: "James Molloy" < james at jamesmolloy.co.uk >
>> > > To: "LLVM Commits" < llvm-commits at cs.uiuc.edu >
>> > > Sent: Friday, February 6, 2015 9:00:33 AM
>> > > Subject: RFC: Tweak heuristics in SimplifyCFG
>> > >
>> > > Hi all,
>> > >
>> > >
>> > > I've been looking at why we generate poor code for idiomatic stuff
>> > > like clamp() and abs().
>> > >
>> > >
>> > > Clamp normally looks like this:
>> > >
>> > >
>> > > T clamp(T a, T b, T c) { return (a < b) ? b : ((a > c) ? c : a); }
>> > >
>> > >
>> > > We currently produce the following IR for this:
>> > >
>> > >
>> > > define i32 @clamp2(i32 %a, i32 %b, i32 %c) #0 {
>> > > entry:
>> > > %cmp = icmp sgt i32 %a, %c
>> > > br i1 %cmp, label %cond.end4, label %cond.false
>> > >
>> > >
>> > > cond.false:
>> > > %cmp1 = icmp slt i32 %a, %b
>> > > %cond = select i1 %cmp1, i32 %b, i32 %a
>> > > br label %cond.end4
>> > >
>> > >
>> > > cond.end4:
>> > > %cond5 = phi i32 [ %cond, %cond.false ], [ %c, %entry ]
>> > > ret i32 %cond5
>> > > }
>> > >
>> > >
>> > > This is multi-block so makes later optimizations more awkward, such
>> > > as loop vectorization and loop rerolling. SimplifyCFG can convert
>> > > this into "icmp; select; icmp; select", but doesn't because it has
>> > > quite a conservative heuristic - it'll only ever hoist one (cheap)
>> > > instruction into the dominating block.
>> > >
>> > >
>> > > I think this is too conservative - given the potential gains later
>> > > on
>> > > in the optimizer from flattening basic blocks (and that
>> > > CodegenPrepare can remove selects again!) - we should be more
>> > > aggressive here.
>> > >
>> > >
>> > > My suggestions are:
>> > > - Up -phi-node-folding-threshold from 1 to 3.
>> > > - Add "fcmp", "fadd" and "fsub" to the list of cheap instructions
>> > > to
>> > > hoist. (fadd and fsub to make abs() work!)
>> > >
>> > > Would anyone object to this? I'll have benchmark results on AArch64
>> > > by the end of the weekend.
>> >
>> > This sounds good to be. Regarding the second point, I'd rather that
>> > SimplifyCFG did not have its own list of cheap instructions (I'm
>> > referring to ComputeSpeculationCost in
>> > lib/Transforms/Utils/SimplifyCFG.cpp), but rather used the existing
>> > TTI interface for this. SimplifyCFG already now uses TTI for other
>> > things, and I think this is a natural enhancement.
>> >
>> > I think that we should call TTI.getUserCost(&I) (which is the same
>> > interface used by the inliner's cost analysis, the loop unroller,
>> > etc.), and hoist an unlimited number of instructions marked as
>> > TargetTransformInfo::TCC_Free and some limited number of
>> > instructions marked as TCC_Basic. The idea is that the total cost of
>> > the instructions should equal
>> > (phi-node-folding-threshold)*(TCC_Basic).
>> >
>> > This also provides a natural way to turn off these optimizations for
>> > fadd, etc. on targets that don't have hardware-implemented floating
>> > point.
>> >
>> > -Hal
>> >
>> > >
>> > >
>> > > Cheers,
>> > >
>> > >
>> > > James
>> > >
>> > >
>> > > _______________________________________________
>> > > llvm-commits mailing list
>> > > llvm-commits at cs.uiuc.edu
>> > > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>> > >
>> >
>> > --
>> > Hal Finkel
>> > Assistant Computational Scientist
>> > Leadership Computing Facility
>> > Argonne National Laboratory
>>
>> --
>> Hal Finkel
>> Assistant Computational Scientist
>> Leadership Computing Facility
>> Argonne National Laboratory
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150209/97634783/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-SimplifyCFG-Swap-to-using-TargetTransformInfo-for-co.patch
Type: application/octet-stream
Size: 19542 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150209/97634783/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-SimplifyCFG-Be-more-aggressive.patch
Type: application/octet-stream
Size: 4733 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150209/97634783/attachment-0001.obj>