RFC: Tweak heuristics in SimplifyCFG

Fri Feb 6 07:00:33 PST 2015

Hi all,

I've been looking at why we generate poor code for idiomatic stuff like
clamp() and abs().

Clamp normally looks like this:

    T clamp(T a, T b, T c) { return (a < b) ? b : ((a > c) ? c : a); }

We currently produce the following IR for this:

    define i32 @clamp2(i32 %a, i32 %b, i32 %c) #0 {
    entry:
      %cmp = icmp sgt i32 %a, %c
      br i1 %cmp, label %cond.end4, label %cond.false

    cond.false:
      %cmp1 = icmp slt i32 %a, %b
      %cond = select i1 %cmp1, i32 %b, i32 %a
      br label %cond.end4

    cond.end4:
      %cond5 = phi i32 [ %cond, %cond.false ], [ %c, %entry ]
      ret i32 %cond5
    }

This is multi-block so makes later optimizations more awkward, such as loop
vectorization and loop rerolling. SimplifyCFG can convert this into "icmp;
select; icmp; select", but doesn't because it has quite a conservative
heuristic - it'll only ever hoist one (cheap) instruction into the
dominating block.

I think this is too conservative - given the potential gains later on in
the optimizer from flattening basic blocks (and that CodegenPrepare can
remove selects again!) - we should be more aggressive here.

My suggestions are:
  - Up -phi-node-folding-threshold from 1 to 3.
  - Add "fcmp", "fadd" and "fsub" to the list of cheap instructions to
hoist. (fadd and fsub to make abs() work!)

Would anyone object to this? I'll have benchmark results on AArch64 by the
end of the weekend.

Cheers,

James
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150206/ee87f440/attachment.html>