RFC: Tweak heuristics in SimplifyCFG

Geoff Berry gberry at codeaurora.org
Fri Feb 6 14:38:27 PST 2015


Hi James, Hal,

 

I’ve been looking into a related issue, and would appreciate your input on it.  We currently convert branches to selects in SimplifyCFG, but we don’t do the opposite anywhere at the IR level from what I can tell.  It seems like it might be beneficial to do this conversion in cases where the un-if converted code would be rejected by the cost heuristic that was being discussed below.  Code sinking also comes into play, since ideally we would consider the branching case cost with code sunk into the then/else blocks where possible/profitable.  Come to think of it, it seems like we would want to do some sinking before the SimplifyCFG case as well to make a better decision.

 

Any thoughts on this?

 

Thanks,

 

--

Geoff Berry

Employee of Qualcomm Innovation Center, Inc.

Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

 

From: llvm-commits-bounces at cs.uiuc.edu [mailto:llvm-commits-bounces at cs.uiuc.edu] On Behalf Of James Molloy
Sent: Friday, February 06, 2015 12:01 PM
To: Hal Finkel
Cc: LLVM Commits
Subject: Re: RFC: Tweak heuristics in SimplifyCFG

 

Hi Hal,

That's a really good point, I'm on board with that. I'll cook up a patch soon and send it for review.

 

Cheers,

 

James

 

On Fri Feb 06 2015 at 4:53:44 PM Hal Finkel <hfinkel at anl.gov <mailto:hfinkel at anl.gov> > wrote:

----- Original Message -----
> From: "James Molloy" <james at jamesmolloy.co.uk <mailto:james at jamesmolloy.co.uk> >
> To: "LLVM Commits" <llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu> >
> Sent: Friday, February 6, 2015 9:00:33 AM
> Subject: RFC: Tweak heuristics in SimplifyCFG
>
> Hi all,
>
>
> I've been looking at why we generate poor code for idiomatic stuff
> like clamp() and abs().
>
>
> Clamp normally looks like this:
>
>
> T clamp(T a, T b, T c) { return (a < b) ? b : ((a > c) ? c : a); }
>
>
> We currently produce the following IR for this:
>
>
> define i32 @clamp2(i32 %a, i32 %b, i32 %c) #0 {
> entry:
> %cmp = icmp sgt i32 %a, %c
> br i1 %cmp, label %cond.end4, label %cond.false
>
>
> cond.false:
> %cmp1 = icmp slt i32 %a, %b
> %cond = select i1 %cmp1, i32 %b, i32 %a
> br label %cond.end4
>
>
> cond.end4:
> %cond5 = phi i32 [ %cond, %cond.false ], [ %c, %entry ]
> ret i32 %cond5
> }
>
>
> This is multi-block so makes later optimizations more awkward, such
> as loop vectorization and loop rerolling. SimplifyCFG can convert
> this into "icmp; select; icmp; select", but doesn't because it has
> quite a conservative heuristic - it'll only ever hoist one (cheap)
> instruction into the dominating block.
>
>
> I think this is too conservative - given the potential gains later on
> in the optimizer from flattening basic blocks (and that
> CodegenPrepare can remove selects again!) - we should be more
> aggressive here.
>
>
> My suggestions are:
> - Up -phi-node-folding-threshold from 1 to 3.
> - Add "fcmp", "fadd" and "fsub" to the list of cheap instructions to
> hoist. (fadd and fsub to make abs() work!)
>
> Would anyone object to this? I'll have benchmark results on AArch64
> by the end of the weekend.

This sounds good to be. Regarding the second point, I'd rather that SimplifyCFG did not have its own list of cheap instructions (I'm referring to ComputeSpeculationCost in lib/Transforms/Utils/SimplifyCFG.cpp), but rather used the existing TTI interface for this. SimplifyCFG already now uses TTI for other things, and I think this is a natural enhancement.

I think that we should call TTI.getUserCost(&I) (which is the same interface used by the inliner's cost analysis, the loop unroller, etc.), and hoist an unlimited number of instructions marked as TargetTransformInfo::TCC_Free and some limited number of instructions marked as TCC_Basic. The idea is that the total cost of the instructions should equal (phi-node-folding-threshold)*(TCC_Basic).

This also provides a natural way to turn off these optimizations for fadd, etc. on targets that don't have hardware-implemented floating point.

 -Hal

>
>
> Cheers,
>
>
> James
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu> 
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>

--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150206/4742e758/attachment.html>


More information about the llvm-commits mailing list