[llvm-commits] [llvm] r173342 - in /llvm/trunk: lib/Transforms/Utils/SimplifyCFG.cpp test/Transforms/SimplifyCFG/SpeculativeExec.ll

Fri Jan 25 13:39:01 PST 2013

+1.

The best way to move forward is to work on improving EarlyIfConversion.

On Jan 25, 2013, at 1:32 PM, Andrew Trick <atrick at apple.com> wrote:

> 
> On Jan 25, 2013, at 1:30 PM, Andrew Trick <atrick at apple.com> wrote:
> 
>> 
>> On Jan 25, 2013, at 10:16 AM, Jakob Stoklund Olesen <stoklund at 2pi.dk> wrote:
>> 
>>> 
>>> On Jan 24, 2013, at 4:39 AM, Chandler Carruth <chandlerc at gmail.com> wrote:
>>> 
>>>> Author: chandlerc
>>>> Date: Thu Jan 24 06:39:29 2013
>>>> New Revision: 173342
>>>> 
>>>> URL: http://llvm.org/viewvc/llvm-project?rev=173342&view=rev
>>>> Log:
>>>> Plug TTI into the speculation logic, giving it a real cost interface
>>>> that can be specialized by targets.
>>>> 
>>>> The goal here is not to be more aggressive, but to just be more accurate
>>>> with very obvious cases. There are instructions which are known to be
>>>> truly free and which were not being modeled as such in this code -- see
>>>> the regression test which is distilled from an inner loop of zlib.
>>> 
>>> Hi Chandler,
>>> 
>>> It is important to realize how profoundly clueless the IR passes are about execution speed. The obvious cases you mention here could easily cause regressions. (Just like the existing heuristics can).
>>> 
>>> Example:
>>> 
>>> define i32 @f(i32 %a, i32 %b) nounwind uwtable readnone ssp {                    
>>> entry:                                                                           
>>> %mul = mul nsw i32 %b, %a                                                      
>>> %cmp = icmp slt i32 %mul, 7                                                    
>>> br i1 %cmp, label %if.then, label %return                                      
>>> 
>>> if.then:                                                                         
>>> br label %return                                                               
>>> 
>>> return:                                                                          
>>> %c = phi i32 [ %a, %if.then ], [ %b, %entry ]                                  
>>> ret i32 %c                                                                     
>>> }                                                                                
>>> 
>>> With a well predicted branch, the latency from %a/%b to %c is 0-1 cycles, depending on the latency of copies in your micro-architecture. The branch predictor doesn't wait around for %mul and %cmp to be computed - it just predicts.
>>> 
>>> SimplifyCFG of course if-converts this 'cost = 0' basic block to get:
>>> 
>>> define i32 @f(i32 %a, i32 %b) nounwind uwtable readnone ssp {
>>> entry:
>>> %mul = mul nsw i32 %b, %a
>>> %cmp = icmp slt i32 %mul, 7
>>> %c = select i1 %cmp, i32 %a, i32 %b
>>> ret i32 %c
>>> }
>>> 
>>> A multiply(3) + compare(1) + cmov(2) sequence is 6 cycles slower. This isn't an obvious transformation at all.
>>> 
>>> The LLVM IR passes are basically optimizing for a PDP 11 where instructions are executed serially, and some take longer than others. Modern micro-architectures simply don't work that way.
>>> 
>>> In this model, I don't think it makes sense to ask the targets for instruction costs. You are basically asking the target if it is one of the new snazzy 1975 models, or an old model with a separate ALU card.
>>> 
>>> What kind of music do you like, Country or Western? There is no good answer.
>>> 
>>> 
>>> Now, I am not arguing that SimplifyCFG shouldn't if-convert at all, or even that it shouldn't be more aggressive about it - it clearly enables further optimizations.
>>> 
>>> But don't ask the target about instruction costs - the question doesn't make sense in this context.
>> 
>> I think that aggressive IR if-conversion should be selected by the target and would make more sense as a different FlattenCFG pass. I think that adding ISEL optimizations into SimplifyCFG does a poor job of communicating the overall design to other contributors. I would like to more clearly separate target specific lowering and ISEL-related optimizations from target-independent IR optimization.
> 
> Reiterating to be absolutely clear: the EarlyIfConversion pass in MachineCode is the right place for if-conversion. The FlattenCFG pass I mention above is only an ISEL hack.
> 
> -Andy
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits