<div dir="ltr">Hi Hal,<br><br><div>That's a really good point, I'm on board with that. I'll cook up a patch soon and send it for review.</div><div><br></div><div>Cheers,</div><div><br></div><div>James</div></div><br><div class="gmail_quote">On Fri Feb 06 2015 at 4:53:44 PM Hal Finkel <<a href="mailto:hfinkel@anl.gov">hfinkel@anl.gov</a>> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">----- Original Message -----<br>

> From: "James Molloy" <<a href="mailto:james@jamesmolloy.co.uk" target="_blank">james@jamesmolloy.co.uk</a>><br>

> To: "LLVM Commits" <<a href="mailto:llvm-commits@cs.uiuc.edu" target="_blank">llvm-commits@cs.uiuc.edu</a>><br>

> Sent: Friday, February 6, 2015 9:00:33 AM<br>

> Subject: RFC: Tweak heuristics in SimplifyCFG<br>

><br>

> Hi all,<br>

><br>

><br>

> I've been looking at why we generate poor code for idiomatic stuff<br>

> like clamp() and abs().<br>

><br>

><br>

> Clamp normally looks like this:<br>

><br>

><br>

> T clamp(T a, T b, T c) { return (a < b) ? b : ((a > c) ? c : a); }<br>

><br>

><br>

> We currently produce the following IR for this:<br>

><br>

><br>

> define i32 @clamp2(i32 %a, i32 %b, i32 %c) #0 {<br>

> entry:<br>

> %cmp = icmp sgt i32 %a, %c<br>

> br i1 %cmp, label %cond.end4, label %cond.false<br>

><br>

><br>

> cond.false:<br>

> %cmp1 = icmp slt i32 %a, %b<br>

> %cond = select i1 %cmp1, i32 %b, i32 %a<br>

> br label %cond.end4<br>

><br>

><br>

> cond.end4:<br>

> %cond5 = phi i32 [ %cond, %cond.false ], [ %c, %entry ]<br>

> ret i32 %cond5<br>

> }<br>

><br>

><br>

> This is multi-block so makes later optimizations more awkward, such<br>

> as loop vectorization and loop rerolling. SimplifyCFG can convert<br>

> this into "icmp; select; icmp; select", but doesn't because it has<br>

> quite a conservative heuristic - it'll only ever hoist one (cheap)<br>

> instruction into the dominating block.<br>

><br>

><br>

> I think this is too conservative - given the potential gains later on<br>

> in the optimizer from flattening basic blocks (and that<br>

> CodegenPrepare can remove selects again!) - we should be more<br>

> aggressive here.<br>

><br>

><br>

> My suggestions are:<br>

> - Up -phi-node-folding-threshold from 1 to 3.<br>

> - Add "fcmp", "fadd" and "fsub" to the list of cheap instructions to<br>

> hoist. (fadd and fsub to make abs() work!)<br>

><br>

> Would anyone object to this? I'll have benchmark results on AArch64<br>

> by the end of the weekend.<br>

<br>

This sounds good to be. Regarding the second point, I'd rather that SimplifyCFG did not have its own list of cheap instructions (I'm referring to ComputeSpeculationCost in lib/Transforms/Utils/<u></u>SimplifyCFG.cpp), but rather used the existing TTI interface for this. SimplifyCFG already now uses TTI for other things, and I think this is a natural enhancement.<br>

<br>

I think that we should call TTI.getUserCost(&I) (which is the same interface used by the inliner's cost analysis, the loop unroller, etc.), and hoist an unlimited number of instructions marked as TargetTransformInfo::TCC_Free and some limited number of instructions marked as TCC_Basic. The idea is that the total cost of the instructions should equal (phi-node-folding-threshold)*(<u></u>TCC_Basic).<br>

<br>

This also provides a natural way to turn off these optimizations for fadd, etc. on targets that don't have hardware-implemented floating point.<br>

<br>

 -Hal<br>

<br>

><br>

><br>

> Cheers,<br>

><br>

><br>

> James<br>

><br>

><br>

> ______________________________<u></u>_________________<br>

> llvm-commits mailing list<br>

> <a href="mailto:llvm-commits@cs.uiuc.edu" target="_blank">llvm-commits@cs.uiuc.edu</a><br>

> <a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" target="_blank">http://lists.cs.uiuc.edu/<u></u>mailman/listinfo/llvm-commits</a><br>

><br>

<br>

--<br>

Hal Finkel<br>

Assistant Computational Scientist<br>

Leadership Computing Facility<br>

Argonne National Laboratory<br>

</blockquote></div>