[llvm-commits] [llvm] r159694 - in /llvm/trunk: include/llvm/CodeGen/Passes.h include/llvm/InitializePasses.h include/llvm/Target/TargetInstrInfo.h lib/CodeGen/CMakeLists.txt lib/CodeGen/CodeGen.cpp lib/CodeGen/EarlyIfConversion.cpp lib/CodeGen/Passes.cpp

Thu Apr 4 18:17:40 PDT 2013

On Apr 4, 2013, at 5:20 PM, Hal Finkel <hfinkel at anl.gov> wrote:

>> +  /// canInsertSelect - Return true if it is possible to insert a
>> select
>> +  /// instruction that chooses between TrueReg and FalseReg based on
>> the
>> +  /// condition code in Cond.
>> +  ///
>> +  /// When successful, also return the latency in cycles from
>> TrueReg,
>> +  /// FalseReg, and Cond to the destination register. The Cond
>> latency should
>> +  /// compensate for a conditional branch being removed. For
>> example, if a
>> +  /// conditional branch has a 3 cycle latency from the condition
>> code read,
>> +  /// and a cmov instruction has a 2 cycle latency from the
>> condition code
>> +  /// read, CondCycles should be returned as -1.
> 
> The current X86 code returns 2 for all of these latency numbers. Perhaps the comment should be updated to make the example more 'realistic'?

Yes, that language is a bit confusing. I'll fix it.

> Also, I'm wondering how to set these to experiment on the PPC A2. Essentially, this is because the conditional branch takes 1 cycle if correctly predicted, but ~6 if incorrectly predicted, and the corresponding move also takes one cycle. An isel has a two cycle latency (but single-cycle throughput). As a result, it might makes sense to compute both sides of simple conditions instead of risking a 6-cycle misprediction penalty (just a hypothesis).

Definitely.

On machines with branch predictors, branches don't really have latencies. 'Executing' a branch means verifying that the branch predictor got it right, and initiating an expensive correction if it didn't. This means that even the single cycle it takes to execute a correctly predicted branch does not enter into the critical path.

If you have a single-cycle cmov instruction, all three numbers should be set to one. The cost of a misprediction goes in the scheduling model.

The x86 cmov instruction really takes two cycles. I assume it is because it has to read three inputs while most x86 instructions only have 2 inputs. Add with carry is also two cycles.

If we knew that cmov read TrueReg and FalseReg on the first cycle, and the flags on the second cycle, we could set CondCycles = 1 because the cmov result is available 1 cycle after the flags.

/jakob