RFC: Tweak heuristics in SimplifyCFG

James Molloy james at jamesmolloy.co.uk
Fri Feb 6 09:00:54 PST 2015


Hi Hal,

That's a really good point, I'm on board with that. I'll cook up a patch
soon and send it for review.

Cheers,

James

On Fri Feb 06 2015 at 4:53:44 PM Hal Finkel <hfinkel at anl.gov> wrote:

> ----- Original Message -----
> > From: "James Molloy" <james at jamesmolloy.co.uk>
> > To: "LLVM Commits" <llvm-commits at cs.uiuc.edu>
> > Sent: Friday, February 6, 2015 9:00:33 AM
> > Subject: RFC: Tweak heuristics in SimplifyCFG
> >
> > Hi all,
> >
> >
> > I've been looking at why we generate poor code for idiomatic stuff
> > like clamp() and abs().
> >
> >
> > Clamp normally looks like this:
> >
> >
> > T clamp(T a, T b, T c) { return (a < b) ? b : ((a > c) ? c : a); }
> >
> >
> > We currently produce the following IR for this:
> >
> >
> > define i32 @clamp2(i32 %a, i32 %b, i32 %c) #0 {
> > entry:
> > %cmp = icmp sgt i32 %a, %c
> > br i1 %cmp, label %cond.end4, label %cond.false
> >
> >
> > cond.false:
> > %cmp1 = icmp slt i32 %a, %b
> > %cond = select i1 %cmp1, i32 %b, i32 %a
> > br label %cond.end4
> >
> >
> > cond.end4:
> > %cond5 = phi i32 [ %cond, %cond.false ], [ %c, %entry ]
> > ret i32 %cond5
> > }
> >
> >
> > This is multi-block so makes later optimizations more awkward, such
> > as loop vectorization and loop rerolling. SimplifyCFG can convert
> > this into "icmp; select; icmp; select", but doesn't because it has
> > quite a conservative heuristic - it'll only ever hoist one (cheap)
> > instruction into the dominating block.
> >
> >
> > I think this is too conservative - given the potential gains later on
> > in the optimizer from flattening basic blocks (and that
> > CodegenPrepare can remove selects again!) - we should be more
> > aggressive here.
> >
> >
> > My suggestions are:
> > - Up -phi-node-folding-threshold from 1 to 3.
> > - Add "fcmp", "fadd" and "fsub" to the list of cheap instructions to
> > hoist. (fadd and fsub to make abs() work!)
> >
> > Would anyone object to this? I'll have benchmark results on AArch64
> > by the end of the weekend.
>
> This sounds good to be. Regarding the second point, I'd rather that
> SimplifyCFG did not have its own list of cheap instructions (I'm referring
> to ComputeSpeculationCost in lib/Transforms/Utils/SimplifyCFG.cpp), but
> rather used the existing TTI interface for this. SimplifyCFG already now
> uses TTI for other things, and I think this is a natural enhancement.
>
> I think that we should call TTI.getUserCost(&I) (which is the same
> interface used by the inliner's cost analysis, the loop unroller, etc.),
> and hoist an unlimited number of instructions marked as
> TargetTransformInfo::TCC_Free and some limited number of instructions
> marked as TCC_Basic. The idea is that the total cost of the instructions
> should equal (phi-node-folding-threshold)*(TCC_Basic).
>
> This also provides a natural way to turn off these optimizations for fadd,
> etc. on targets that don't have hardware-implemented floating point.
>
>  -Hal
>
> >
> >
> > Cheers,
> >
> >
> > James
> >
> >
> > _______________________________________________
> > llvm-commits mailing list
> > llvm-commits at cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >
>
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150206/645521ae/attachment.html>


More information about the llvm-commits mailing list