<div dir="ltr"><div>Hi all,</div><div><br></div><div>I've been looking at why we generate poor code for idiomatic stuff like clamp() and abs().</div><div><br></div><div>Clamp normally looks like this:</div><div><br></div><div>    T clamp(T a, T b, T c) { return (a < b) ? b : ((a > c) ? c : a); }</div><div><br></div><div>We currently produce the following IR for this:</div><div><br></div><div>    define i32 @clamp2(i32 %a, i32 %b, i32 %c) #0 {</div><div>    entry:</div><div>      %cmp = icmp sgt i32 %a, %c</div><div>      br i1 %cmp, label %cond.end4, label %cond.false</div><div><br></div><div>    cond.false:</div><div>      %cmp1 = icmp slt i32 %a, %b</div><div>      %cond = select i1 %cmp1, i32 %b, i32 %a</div><div>      br label %cond.end4</div><div><br></div><div>    cond.end4:</div><div>      %cond5 = phi i32 [ %cond, %cond.false ], [ %c, %entry ]</div><div>      ret i32 %cond5</div><div>    }</div><div><br></div><div>This is multi-block so makes later optimizations more awkward, such as loop vectorization and loop rerolling. SimplifyCFG can convert this into "icmp; select; icmp; select", but doesn't because it has quite a conservative heuristic - it'll only ever hoist one (cheap) instruction into the dominating block.</div><div><br></div><div>I think this is too conservative - given the potential gains later on in the optimizer from flattening basic blocks (and that CodegenPrepare can remove selects again!) - we should be more aggressive here.</div><div><br></div><div>My suggestions are:</div><div>  - Up -phi-node-folding-threshold from 1 to 3.</div><div>  - Add "fcmp", "fadd" and "fsub" to the list of cheap instructions to hoist. (fadd and fsub to make abs() work!)</div><div><br></div><div>Would anyone object to this? I'll have benchmark results on AArch64 by the end of the weekend.</div><div><br></div><div>Cheers,</div><div><br></div><div>James</div><div><br></div></div>