[PATCH] D17288: [CodeGenPrepare] Do select to branch transform when cmp's operand is expensive.

Tue Mar 29 20:34:25 PDT 2016

> On Mar 11, 2016, at 8:17 AM, Sanjay Patel via llvm-commits <llvm-commits at lists.llvm.org> wrote:
> 
> spatel added a comment.
> 
> In http://reviews.llvm.org/D17288#372925, @flyingforyou wrote:
> 
>> There is concern about this heuristic will not be applied for future architectures. So I added FdivLatency in `SchedMachineModel` for refering.
>> I think fdiv is special for most of architecture. It wouldn't be pipelined and it has very high latency. So I think this information can be used for another optimizations.
>> 
>> Also, this is very different heuristic from load-cmp-heuristic. Most of load takes 4cycles when the target is in cache. But division takes over 18~30 cycles. This heuristic could be helpful for most of architecture.
>> 
>> And there is a optimization pass `EarlyIfConversion`. This will recover the change when `TBB`, `FBB`'s cost is not good for using branch.
>> 
>> So, How about this approach?
> 
> 
> Hi Junmo -
> Thanks for following up. You've answered my concerns about limiting the transform based on actual subtarget features, so I think this is a reasonable way to solve the problem. I'm not sure if we want the default behavior to be opt-out though, but that's simple to change. 
> I'd certainly like to hear from others if they think this is a good approach.

It seems the major motivation here is the benchmark gain, and no, it won’t be necessarily simple to change. This optimization could give a sizable gain on multiple architectures, and once in, people will not easily accept a loss when the default is attempted to be changed. If we decide to go for it we should be conscious about it.
I think the machine combiner (in this case it would split the conditional move instructions) would be a more natural place for this optimization. Eventually I see that pass capable of both latency and throughput estimates for possibly a small increase in compile-time. Perhaps a compromise is reasonable by enabling a version of the current patch (assuming no regressions) and starting the work towards a more general solution. I see related questions coming up in other places. For example for some DAGcombines the machine combiner could come up with faster code sequences.

-Gerolf

> 
> 
> http://reviews.llvm.org/D17288
> 
> 
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits