RFC: Tweak heuristics in SimplifyCFG
James Molloy
james at jamesmolloy.co.uk
Mon Feb 9 09:10:16 PST 2015
Hi Kristof,
http://reviews.llvm.org/D7506 and http://reviews.llvm.org/D7507.
Cheers,
James
On Mon Feb 09 2015 at 4:52:54 PM Kristof Beyls <kristof.beyls at arm.com>
wrote:
> Hi James,
>
>
>
> I would find it easier to review this if the patches were in phabricator.
> Can you upload them to phabricator?
>
>
>
> Thanks,
>
>
>
> Kristof
>
>
>
> *From:* llvm-commits-bounces at cs.uiuc.edu [mailto:
> llvm-commits-bounces at cs.uiuc.edu] *On Behalf Of *James Molloy
> *Sent:* 09 February 2015 16:28
> *To:* Hal Finkel; Geoff Berry
>
>
> *Cc:* LLVM Commits
> *Subject:* Re: RFC: Tweak heuristics in SimplifyCFG
>
>
>
> Hi Hal,
>
> Please find attached two patches implementing what we discussed last week.
> The code changes are trivial, there is some slight churn in the tests which
> I have attempted to justify.
>
>
>
> Cheers,
>
>
>
> James
>
>
>
> On Sat Feb 07 2015 at 10:07:56 AM James Molloy <james at jamesmolloy.co.uk>
> wrote:
>
> Hi Geoff,
>
> The inverse transform is done in CodeGenPrepare, see
> TargetLoweringInfo::isPredictableSelectExpensive()
>
>
>
> Cheers,
>
>
>
> James
>
>
>
> On Fri Feb 06 2015 at 10:44:04 PM Hal Finkel <hfinkel at anl.gov> wrote:
>
> ----- Original Message -----
> > From: "Geoff Berry" <gberry at codeaurora.org>
> > To: "James Molloy" <james at jamesmolloy.co.uk>, "Hal Finkel" <
> hfinkel at anl.gov>
> > Cc: "LLVM Commits" <llvm-commits at cs.uiuc.edu>
> > Sent: Friday, February 6, 2015 4:38:27 PM
> > Subject: RE: RFC: Tweak heuristics in SimplifyCFG
> >
> >
> >
> >
> > Hi James, Hal,
> >
> >
> >
> > I’ve been looking into a related issue, and would appreciate your
> > input on it. We currently convert branches to selects in
> > SimplifyCFG, but we don’t do the opposite anywhere at the IR level
> > from what I can tell.
> > It seems like it might be beneficial to do
> > this conversion in cases where the un-if converted code would be
> > rejected by the cost heuristic that was being discussed below. Code
> > sinking also comes into play, since ideally we would consider the
> > branching case cost with code sunk into the then/else blocks where
> > possible/profitable. Come to think of it, it seems like we would
> > want to do some sinking before the SimplifyCFG case as well to make
> > a better decision.
> >
> >
> >
> > Any thoughts on this?
> >
>
> I think that I understand what you're saying, but could you provide a
> couple quick examples? I'm somewhat skeptical because creating branches
> from selects generically makes optimization and analysis harder. As a
> result, we generally delay this until CodeGen when our modeling is more
> precise.
>
> -Hal
>
> >
> >
> > Thanks,
> >
> >
> >
> >
> > --
> >
> > Geoff Berry
> >
> > Employee of Qualcomm Innovation Center, Inc.
> >
> > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a
> > Linux Foundation Collaborative Project
> >
> >
> >
> > From: llvm-commits-bounces at cs.uiuc.edu
> > [mailto:llvm-commits-bounces at cs.uiuc.edu] On Behalf Of James Molloy
> > Sent: Friday, February 06, 2015 12:01 PM
> > To: Hal Finkel
> > Cc: LLVM Commits
> > Subject: Re: RFC: Tweak heuristics in SimplifyCFG
> >
> >
> >
> >
> > Hi Hal,
> >
> >
> > That's a really good point, I'm on board with that. I'll cook up a
> > patch soon and send it for review.
> >
> >
> >
> >
> >
> > Cheers,
> >
> >
> >
> >
> >
> > James
> >
> >
> >
> >
> > On Fri Feb 06 2015 at 4:53:44 PM Hal Finkel < hfinkel at anl.gov >
> > wrote:
> >
> >
> >
> > ----- Original Message -----
> > > From: "James Molloy" < james at jamesmolloy.co.uk >
> > > To: "LLVM Commits" < llvm-commits at cs.uiuc.edu >
> > > Sent: Friday, February 6, 2015 9:00:33 AM
> > > Subject: RFC: Tweak heuristics in SimplifyCFG
> > >
> > > Hi all,
> > >
> > >
> > > I've been looking at why we generate poor code for idiomatic stuff
> > > like clamp() and abs().
> > >
> > >
> > > Clamp normally looks like this:
> > >
> > >
> > > T clamp(T a, T b, T c) { return (a < b) ? b : ((a > c) ? c : a); }
> > >
> > >
> > > We currently produce the following IR for this:
> > >
> > >
> > > define i32 @clamp2(i32 %a, i32 %b, i32 %c) #0 {
> > > entry:
> > > %cmp = icmp sgt i32 %a, %c
> > > br i1 %cmp, label %cond.end4, label %cond.false
> > >
> > >
> > > cond.false:
> > > %cmp1 = icmp slt i32 %a, %b
> > > %cond = select i1 %cmp1, i32 %b, i32 %a
> > > br label %cond.end4
> > >
> > >
> > > cond.end4:
> > > %cond5 = phi i32 [ %cond, %cond.false ], [ %c, %entry ]
> > > ret i32 %cond5
> > > }
> > >
> > >
> > > This is multi-block so makes later optimizations more awkward, such
> > > as loop vectorization and loop rerolling. SimplifyCFG can convert
> > > this into "icmp; select; icmp; select", but doesn't because it has
> > > quite a conservative heuristic - it'll only ever hoist one (cheap)
> > > instruction into the dominating block.
> > >
> > >
> > > I think this is too conservative - given the potential gains later
> > > on
> > > in the optimizer from flattening basic blocks (and that
> > > CodegenPrepare can remove selects again!) - we should be more
> > > aggressive here.
> > >
> > >
> > > My suggestions are:
> > > - Up -phi-node-folding-threshold from 1 to 3.
> > > - Add "fcmp", "fadd" and "fsub" to the list of cheap instructions
> > > to
> > > hoist. (fadd and fsub to make abs() work!)
> > >
> > > Would anyone object to this? I'll have benchmark results on AArch64
> > > by the end of the weekend.
> >
> > This sounds good to be. Regarding the second point, I'd rather that
> > SimplifyCFG did not have its own list of cheap instructions (I'm
> > referring to ComputeSpeculationCost in
> > lib/Transforms/Utils/SimplifyCFG.cpp), but rather used the existing
> > TTI interface for this. SimplifyCFG already now uses TTI for other
> > things, and I think this is a natural enhancement.
> >
> > I think that we should call TTI.getUserCost(&I) (which is the same
> > interface used by the inliner's cost analysis, the loop unroller,
> > etc.), and hoist an unlimited number of instructions marked as
> > TargetTransformInfo::TCC_Free and some limited number of
> > instructions marked as TCC_Basic. The idea is that the total cost of
> > the instructions should equal
> > (phi-node-folding-threshold)*(TCC_Basic).
> >
> > This also provides a natural way to turn off these optimizations for
> > fadd, etc. on targets that don't have hardware-implemented floating
> > point.
> >
> > -Hal
> >
> > >
> > >
> > > Cheers,
> > >
> > >
> > > James
> > >
> > >
> > > _______________________________________________
> > > llvm-commits mailing list
> > > llvm-commits at cs.uiuc.edu
> > > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> > >
> >
> > --
> > Hal Finkel
> > Assistant Computational Scientist
> > Leadership Computing Facility
> > Argonne National Laboratory
>
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150209/7ec533d2/attachment.html>
More information about the llvm-commits
mailing list