[llvm-commits] [llvm] r173342 - in /llvm/trunk: lib/Transforms/Utils/SimplifyCFG.cpp test/Transforms/SimplifyCFG/SpeculativeExec.ll

Fri Jan 25 10:16:17 PST 2013

On Jan 24, 2013, at 4:39 AM, Chandler Carruth <chandlerc at gmail.com> wrote:

> Author: chandlerc
> Date: Thu Jan 24 06:39:29 2013
> New Revision: 173342
> 
> URL: http://llvm.org/viewvc/llvm-project?rev=173342&view=rev
> Log:
> Plug TTI into the speculation logic, giving it a real cost interface
> that can be specialized by targets.
> 
> The goal here is not to be more aggressive, but to just be more accurate
> with very obvious cases. There are instructions which are known to be
> truly free and which were not being modeled as such in this code -- see
> the regression test which is distilled from an inner loop of zlib.

Hi Chandler,

It is important to realize how profoundly clueless the IR passes are about execution speed. The obvious cases you mention here could easily cause regressions. (Just like the existing heuristics can).

Example:

define i32 @f(i32 %a, i32 %b) nounwind uwtable readnone ssp {                    
entry:                                                                           
  %mul = mul nsw i32 %b, %a                                                      
  %cmp = icmp slt i32 %mul, 7                                                    
  br i1 %cmp, label %if.then, label %return                                      

if.then:                                                                         
  br label %return                                                               

return:                                                                          
  %c = phi i32 [ %a, %if.then ], [ %b, %entry ]                                  
  ret i32 %c                                                                     
}                                                                                

With a well predicted branch, the latency from %a/%b to %c is 0-1 cycles, depending on the latency of copies in your micro-architecture. The branch predictor doesn't wait around for %mul and %cmp to be computed - it just predicts.

SimplifyCFG of course if-converts this 'cost = 0' basic block to get:

define i32 @f(i32 %a, i32 %b) nounwind uwtable readnone ssp {
entry:
  %mul = mul nsw i32 %b, %a
  %cmp = icmp slt i32 %mul, 7
  %c = select i1 %cmp, i32 %a, i32 %b
  ret i32 %c
}

A multiply(3) + compare(1) + cmov(2) sequence is 6 cycles slower. This isn't an obvious transformation at all.

The LLVM IR passes are basically optimizing for a PDP 11 where instructions are executed serially, and some take longer than others. Modern micro-architectures simply don't work that way.

In this model, I don't think it makes sense to ask the targets for instruction costs. You are basically asking the target if it is one of the new snazzy 1975 models, or an old model with a separate ALU card.

What kind of music do you like, Country or Western? There is no good answer.

Now, I am not arguing that SimplifyCFG shouldn't if-convert at all, or even that it shouldn't be more aggressive about it - it clearly enables further optimizations.

But don't ask the target about instruction costs - the question doesn't make sense in this context.

/jakob