RFC: Tweak heuristics in SimplifyCFG

Fri Feb 6 08:51:32 PST 2015

----- Original Message -----
> From: "James Molloy" <james at jamesmolloy.co.uk>
> To: "LLVM Commits" <llvm-commits at cs.uiuc.edu>
> Sent: Friday, February 6, 2015 9:00:33 AM
> Subject: RFC: Tweak heuristics in SimplifyCFG
> 
> Hi all,
> 
> 
> I've been looking at why we generate poor code for idiomatic stuff
> like clamp() and abs().
> 
> 
> Clamp normally looks like this:
> 
> 
> T clamp(T a, T b, T c) { return (a < b) ? b : ((a > c) ? c : a); }
> 
> 
> We currently produce the following IR for this:
> 
> 
> define i32 @clamp2(i32 %a, i32 %b, i32 %c) #0 {
> entry:
> %cmp = icmp sgt i32 %a, %c
> br i1 %cmp, label %cond.end4, label %cond.false
> 
> 
> cond.false:
> %cmp1 = icmp slt i32 %a, %b
> %cond = select i1 %cmp1, i32 %b, i32 %a
> br label %cond.end4
> 
> 
> cond.end4:
> %cond5 = phi i32 [ %cond, %cond.false ], [ %c, %entry ]
> ret i32 %cond5
> }
> 
> 
> This is multi-block so makes later optimizations more awkward, such
> as loop vectorization and loop rerolling. SimplifyCFG can convert
> this into "icmp; select; icmp; select", but doesn't because it has
> quite a conservative heuristic - it'll only ever hoist one (cheap)
> instruction into the dominating block.
> 
> 
> I think this is too conservative - given the potential gains later on
> in the optimizer from flattening basic blocks (and that
> CodegenPrepare can remove selects again!) - we should be more
> aggressive here.
> 
> 
> My suggestions are:
> - Up -phi-node-folding-threshold from 1 to 3.
> - Add "fcmp", "fadd" and "fsub" to the list of cheap instructions to
> hoist. (fadd and fsub to make abs() work!)
> 
> Would anyone object to this? I'll have benchmark results on AArch64
> by the end of the weekend.

This sounds good to be. Regarding the second point, I'd rather that SimplifyCFG did not have its own list of cheap instructions (I'm referring to ComputeSpeculationCost in lib/Transforms/Utils/SimplifyCFG.cpp), but rather used the existing TTI interface for this. SimplifyCFG already now uses TTI for other things, and I think this is a natural enhancement.

I think that we should call TTI.getUserCost(&I) (which is the same interface used by the inliner's cost analysis, the loop unroller, etc.), and hoist an unlimited number of instructions marked as TargetTransformInfo::TCC_Free and some limited number of instructions marked as TCC_Basic. The idea is that the total cost of the instructions should equal (phi-node-folding-threshold)*(TCC_Basic).

This also provides a natural way to turn off these optimizations for fadd, etc. on targets that don't have hardware-implemented floating point.

 -Hal

> 
> 
> Cheers,
> 
> 
> James
> 
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory