[llvm-commits] [llvm] r171879 - in /llvm/trunk: lib/Target/X86/CMakeLists.txt lib/Target/X86/X86.h lib/Target/X86/X86.td lib/Target/X86/X86PadShortFunction.cpp lib/Target/X86/X86Subtarget.cpp lib/Target/X86/X86Subtarget.h lib/Target/X86/X86TargetMa...

Fri Jan 11 11:46:11 PST 2013

On Jan 11, 2013, at 9:07 AM, "Zhang, Andy" <andy.zhang at intel.com> wrote:

> Hi Evan,
> 
> I've attached an updated patch to check for -Oz, not re-visit BBs, and skip over DBG_VALUE instrs.

Ok. 

> 
> Would you agree that walking the CFG can't be avoided if we want to pad functions with more than 4 instructions? Is this patch ok to commit?

I still don't think this is the right algorithm. If the threshold is small you are much better off start from the return instruction and walk back up. That said, since this is Atom specific I don't care too much. If this ever end up being run for larger set of CPUs then we should revisit it.

Evan

> 
> Regards,
> Andy
> 
> On January 09, 2013 12:05 PM, Evan Cheng wrote:
> 
>> On Jan 9, 2013, at 8:58 AM, "Zhang, Andy" <andy.zhang at intel.com> wrote:
>> 
>>> On January 09, 2013 1:59 AM, Evan Cheng wrote:
>>> 
>>>>> Is it necessary to explicitly check for -Oz? Doesn't -Oz also set
>>>>> the optforsize attribute?
>>>> 
>>>> It's a different attribute that you need to check.
>>> 
>>> Ok, I'll check for that attribute too.
>>> 
>>>>> The code will only walk the CFG until the cycle count threshold is
>>>>> reached (currently 4 cycles). I didn't think it was necessary to
>>>>> track which BBs were visited given how few instructions are examined.
>>>> 
>>>> I don't think that's a good excuse for poor algorithm design. The
>>>> code is reusable for other CPUs, right? What if some other CPUs needs
>>>> the same padding but with a much higher threshold?
>>> 
>>> I can cache the blocks that were visited previously to avoid visiting
>> them again.
>>> 
>>>> You need to check for cases where there are trailing DEBUG_LOC
>>>> instructions following the terminator.
>>> 
>>> I will do that.
>>> 
>>>> In that case the entry BB is the exit BB. It still seems to me that
>>>> the pass can simply look at the entry and exit blocks and avoid the
>>>> scanning completely.
>>> 
>>> I'm not sure I understand your approach. Are you saying that we can just
>> count the number of instructions in the entry block, and in each of the
>> exit blocks? The branch may not be directly from the entry block to an exit
>> block.
>> 
>> Yes. Given the threshold is 4 what are the possibilities? 1. Single BB
>> function. 2. Entry BB with an early exit to the exit BB. Can control flow
>> go through an intermediate BB before the exit BB? It takes two instructions
>> for conditional branch in the entry BB. The only possibility then is for
>> the intermediate BB to contain nothing more than a branch to the exit BB.
>> That doesn't look like a realistic scenario to me.
>> 
>> Evan
>> 
>>> 
>>> 
>>> Regards,
>>> Andy
> 
> <updated-pad.diff>