[llvm-commits] [llvm] r171879 - in /llvm/trunk: lib/Target/X86/CMakeLists.txt lib/Target/X86/X86.h lib/Target/X86/X86.td lib/Target/X86/X86PadShortFunction.cpp lib/Target/X86/X86Subtarget.cpp lib/Target/X86/X86Subtarget.h lib/Target/X86/X86TargetMa...

Evan Cheng evan.cheng at apple.com
Tue Jan 8 22:59:07 PST 2013



Sent from my iPad

On Jan 8, 2013, at 2:07 PM, "Zhang, Andy" <andy.zhang at intel.com> wrote:

> On January 08, 2013 3:28 PM, Evan Cheng wrote:
> 
> Hi Evan,
> 
>> 1. It's checking for -Os, but not -Oz.
> 
> Is it necessary to explicitly check for -Oz? Doesn't -Oz also set the optforsize attribute?

It's a different attribute that you need to check. 

> 
>> 2. This code looks like compile time intensive. It's walking the CFG but it
>> doesn't track what BBs have been visited already. It definitely looks like
>> it can visit a single BB multiple times and does wasted computation.
> 
> The code will only walk the CFG until the cycle count threshold is reached (currently 4 cycles). I didn't think it was necessary to track which BBs were visited given how few instructions are examined.

I don't think that's a good excuse for poor algorithm design. The code is reusable for other CPUs, right? What if some other CPUs needs the same padding but with a much higher threshold?

> 
>> 3. Other issues:
>> 
>> +    if (Cycles < Threshold) {
>> +      if (!cyclesUntilReturn(MBB, BBCycles, &ReturnLoc))
>> +        continue;
>> 
>> Why is it calling cyclesUntilReturn again? Isn't it called by findReturns()
>> already?
> 
>> Why is it necessary to save Location? The return instruction is always at
>> the end of the BB.
> 
> You're right, the ret is always at the end of the BB so we don't need to save Location. The second call to cyclesUntilReturn() was just to get a MBBI to insert the NOOPs. I've attached a patch to remove this part. Does it look ok now?

You need to check for cases where there are trailing DEBUG_LOC instructions following the terminator. 

> 
>> 4. The whole approach just seems weird to me. You should only pad extremely
>> short functions  (4 instructions?) but the code has to scan the entire CFG.
>> I think you are essentially just looking for functions where there is an
>> early branch from the entry BB to the exit BB.
> 
> It's not just functions that have an early exit, but any function that is short. The function might have no control flow but still be less than 4 cycles long.

In that case the entry BB is the exit BB. It still seems to me that the pass can simply look at the entry and exit blocks and avoid the scanning completely. 

Evan

> It won't scan the entire CFG unless the function is short to begin with.
> 
> 
> Regards,
> Andy
> 
> <pad.diff>



More information about the llvm-commits mailing list