[LLVMdev] Register scavenger and SP/FP adjustments

Krzysztof Parzyszek kparzysz at codeaurora.org
Wed Oct 2 09:47:52 PDT 2013


Just to follow-up: this has been worked around locally.  A general fix 
is not likely to be pursued in the near future... :(

-K


On 9/26/2013 3:41 PM, Krzysztof Parzyszek wrote:
> Thanks, I'll look into that.  Still, the case where the function does
> not call anything remains---in such a situation there are no
> ADJCALLSTACK pseudos, so regardless of what that function you pointed at
> does, there won't be any target-independent information about the SP
> adjustment by the time the frame index elimination runs.
>
> Would it make sense to have ADJCALLSTACK pseudos every time there are
> objects to be allocated on the stack (regardless of whether the function
> is a leaf or not)?  What would be the implications of that?
>
> An alternative approach would be to never use virtual registers in frame
> setup, but I'm not sure how popular that would be.  So far I have only
> seen that in the Thumb backend.
>
> -Krzysztof
>
>
> On 9/26/2013 3:30 PM, Evan Cheng wrote:
>> The code has changed a lot over the years. Looks like at some point of
>> time the assumption was broken. calculateCallsInformation() may have
>> eliminated the pseudo set up instructions already.
>>
>> // If call frames are not being included as part of the stack frame, and
>> // the target doesn't indicate otherwise, remove the call frame pseudos
>> // here. The sub/add sp instruction pairs are still inserted, but we
>> don't
>> // need to track the SP adjustment for frame index elimination.
>> if (TFI->canSimplifyCallFramePseudos(Fn))
>> =>    TFI->eliminateCallFramePseudoInstr(Fn, *I->getParent(), I);
>>
>> Perhaps there is a bug in canSimplifyCallFramePseudos?
>>
>> Evan
>>
>> On Sep 26, 2013, at 12:00 PM, Krzysztof Parzyszek
>> <kparzysz at codeaurora.org <mailto:kparzysz at codeaurora.org>> wrote:
>>
>>> Consider this example:
>>>
>>> --- ex.ll ---
>>> declare void @bar()
>>>
>>> ; Function Attrs: nounwind optsize
>>> define void @main() {
>>> entry:
>>>  %hin = alloca [256 x i32], align 4
>>>  %xin = alloca [256 x i32], align 4
>>>  call void @bar()
>>>  ret void
>>> }
>>> -------------
>>>
>>>
>>> Freshly built llc:
>>>
>>> llc -O2 -march=x86 < ex.ll -print-before-all
>>>
>>> # *** IR Dump Before Prologue/Epilogue Insertion & Frame Finalization
>>> ***:
>>> # Machine code for function main: Post SSA
>>> Frame Objects:
>>>  fi#0: size=1024, align=4, at location [SP+4]
>>>  fi#1: size=1024, align=4, at location [SP+4]
>>>
>>> BB#0: derived from LLVM BB %entry
>>>        ADJCALLSTACKDOWN32 0, %ESP<imp-def>, %EFLAGS<imp-def,dead>,
>>> %ESP<imp-use>
>>>        CALLpcrel32 <ga:@bar>, <regmask>, %ESP<imp-use>, %ESP<imp-def>
>>>        ADJCALLSTACKUP32 0, 0, %ESP<imp-def>, %EFLAGS<imp-def,dead>,
>>> %ESP<imp-use>
>>>        RET
>>>
>>> # End machine code for function main.
>>>
>>> before replace frame indices
>>> # Machine code for function main: Post SSA
>>> Frame Objects:
>>>  fi#0: size=1024, align=4, at location [SP-1024]
>>>  fi#1: size=1024, align=4, at location [SP-2048]
>>>
>>> BB#0: derived from LLVM BB %entry
>>>        %ESP<def,tied1> = SUB32ri %ESP<tied0>, 2060,
>>> %EFLAGS<imp-def,dead>; flags: FrameSetup
>>>        PROLOG_LABEL <MCSym=.Ltmp0>
>>>        CALLpcrel32 <ga:@bar>, <regmask>, %ESP<imp-use>, %ESP<imp-def>
>>>        %ESP<def,tied1> = ADD32ri %ESP<tied0>, 2060,
>>> %EFLAGS<imp-def,dead>
>>>        RET
>>>
>>> # End machine code for function main.
>>>
>>>
>>>
>>> Let's see what happens if we remove the call to "bar".
>>>
>>> There aren't any pseudocodes that set up the frame to begin with, even
>>> though the SP is actually modified.  (This is to show that RS has no
>>> way of finding out that SP was actually adjusted in such cases.)
>>>
>>>
>>> # *** IR Dump Before Prologue/Epilogue Insertion & Frame Finalization
>>> ***:
>>> # Machine code for function main: Post SSA
>>> Frame Objects:
>>>  fi#0: size=1024, align=4, at location [SP+4]
>>>  fi#1: size=1024, align=4, at location [SP+4]
>>>
>>> BB#0: derived from LLVM BB %entry
>>>        RET
>>>
>>> # End machine code for function main.
>>>
>>> before replace frame indices
>>> # Machine code for function main: Post SSA
>>> Frame Objects:
>>>  fi#0: size=1024, align=4, at location [SP-1024]
>>>  fi#1: size=1024, align=4, at location [SP-2048]
>>>
>>> BB#0: derived from LLVM BB %entry
>>>        %ESP<def,tied1> = SUB32ri %ESP<tied0>, 2048,
>>> %EFLAGS<imp-def,dead>; flags: FrameSetup
>>>        PROLOG_LABEL <MCSym=.Ltmp0>
>>>        %ESP<def,tied1> = ADD32ri %ESP<tied0>, 2048,
>>> %EFLAGS<imp-def,dead>
>>>        RET
>>>
>>> # End machine code for function main.
>>>
>>>
>>>
>>> And here's where the problem becomes more apparent.
>>>
>>> Compile for Thumb and see that there is a virtual register used in the
>>> frame setup:
>>>
>>> # *** IR Dump Before Prologue/Epilogue Insertion & Frame Finalization
>>> ***:
>>> # Machine code for function main: Post SSA
>>> Frame Objects:
>>>  fi#0: size=1024, align=4, at location [SP]
>>>  fi#1: size=1024, align=4, at location [SP]
>>>
>>> BB#0: derived from LLVM BB %entry
>>>        tBX_RET pred:14, pred:%noreg
>>>
>>> # End machine code for function main.
>>>
>>> before replace frame indices
>>> # Machine code for function main: Post SSA
>>> Frame Objects:
>>>  fi#0: size=1024, align=4, at location [SP-1032]
>>>  fi#1: size=1024, align=4, at location [SP-2056]
>>>  fi#2: size=4, align=4, at location [SP-4]
>>>  fi#3: size=4, align=4, at location [SP-8]
>>> Constant Pool:
>>>  cp#0: -2048, align=4
>>>  cp#1: 2048, align=4
>>>
>>> BB#0: derived from LLVM BB %entry
>>>    Live Ins: %R4 %LR
>>>        tPUSH pred:14, pred:%noreg, %R4<kill>, %LR<kill>, %SP<imp-def>,
>>> %SP<imp-use>; flags: FrameSetup
>>>        %vreg0<def> = tLDRpci <cp#0>, pred:14, pred:%noreg; flags:
>>> FrameSetup tGPR:%vreg0
>>>        %SP<def,tied1> = tADDhirr %SP<tied0>, %vreg0<kill>, pred:14,
>>> pred:%noreg; tGPR:%vreg0
>>>        %vreg1<def> = tLDRpci <cp#1>, pred:14, pred:%noreg; tGPR:%vreg1
>>>        %SP<def,tied1> = tADDhirr %SP<tied0>, %vreg1<kill>, pred:14,
>>> pred:%noreg; tGPR:%vreg1
>>>        tPOP_RET pred:14, pred:%noreg, %R4<def>, %PC<def>,
>>> %SP<imp-def>, %SP<imp-use>
>>>
>>> # End machine code for function main.
>>>
>>>
>>> On Thumb you can save/restore a register without having to use a spill
>>> slot, so the scavenger won't run into problems, but if a target had to
>>> spill, we would end up with a register save before the SP update, and
>>> restore after the SP update, and the RS would use the same offset in
>>> both instructions.
>>> I don't have a working testcase (i.e. one that demonstrates the
>>> failure) that I can post, but if I cheat the RS into believing that it
>>> has to spill, the problem will happen.
>>>
>>> Here's a sample result of this.  Don't mind the FixedStack-1, I
>>> explicitly used a base offset of 0 in the code, and this was to
>>> illustrate the lack of adjustment in RS:
>>>
>>>        tSTRspi %R1<kill>, %SP, 0, pred:14, pred:%noreg;
>>> mem:ST4[FixedStack-1]    <- spill to *(SP+0)
>>>        %R1<def> = tLDRpci <cp#1>, pred:14, pred:%noreg
>>>        %SP<def,tied1> = tADDhirr %SP<tied0>, %R1<kill>, pred:14,
>>> pred:%noreg     <- SP = something different
>>>        %R3<def> = tLDRspi %SP, 0, pred:14, pred:%noreg;
>>> mem:LD4[FixedStack-1]
>>>        %R1<def> = tLDRspi %SP, 0, pred:14, pred:%noreg;
>>> mem:LD4[FixedStack-1]    <- restore from *(NewSP+0)   !!
>>>
>>>
>>> -Krzysztof
>>>
>>>
>>>
>>> On 9/26/2013 1:24 PM, Evan Cheng wrote:
>>>> CallFrameSetupOpcode is a pseudo opcode like X86::ADJCALLSTACKDOWN64.
>>>> That means when the code is expected to be called before the pseudo
>>>> instructions are eliminated. I don't know why it's not the case for
>>>> you.
>>>> A quick look at PEI code indicates the pseudo's should not have been
>>>> removed at the time when replaceFrameIndices are run.
>>>>
>>>> Evan
>>>>
>>>>
>>>> On Sep 25, 2013, at 8:57 AM, Krzysztof Parzyszek
>>>> <kparzysz at codeaurora.org <mailto:kparzysz at codeaurora.org>
>>>> <mailto:kparzysz at codeaurora.org>> wrote:
>>>>
>>>>> Hi All,
>>>>> I'm dealing with a problem where the spill/restore instructions
>>>>> inserted during scavenging span an adjustment of the SP/FP register.
>>>>> The result is that despite the base register (SP/FP) being changed
>>>>> between the spill and the restore, both store and load use the same
>>>>> immediate offset.
>>>>>
>>>>> I see code in the PEI (replaceFrameIndices) that is supposed to track
>>>>> the SP/FP adjustment:
>>>>>
>>>>> ----------------------------------------
>>>>> void PEI::replaceFrameIndices(MachineBasicBlock *BB,
>>>>>                             MachineFunction &Fn, int &SPAdj) {
>>>>> const TargetMachine &TM = Fn.getTarget();
>>>>> assert(TM.getRegisterInfo() &&
>>>>>        "TM::getRegisterInfo() must be implemented!");
>>>>> const TargetInstrInfo &TII = *Fn.getTarget().getInstrInfo();
>>>>> const TargetRegisterInfo &TRI = *TM.getRegisterInfo();
>>>>> const TargetFrameLowering *TFI = TM.getFrameLowering();
>>>>> bool StackGrowsDown =
>>>>>   TFI->getStackGrowthDirection() ==
>>>>>               TargetFrameLowering::StackGrowsDown;
>>>>> int FrameSetupOpcode   = TII.getCallFrameSetupOpcode();
>>>>> int FrameDestroyOpcode = TII.getCallFrameDestroyOpcode();
>>>>>
>>>>> if (RS && !FrameIndexVirtualScavenging) RS->enterBasicBlock(BB);
>>>>>
>>>>> for (MachineBasicBlock::iterator I = BB->begin(); I != BB->end(); ) {
>>>>>
>>>>>   if (I->getOpcode() == FrameSetupOpcode ||
>>>>>       I->getOpcode() == FrameDestroyOpcode) {
>>>>>     // Remember how much SP has been adjusted to create the call
>>>>>     // frame.
>>>>>     int Size = I->getOperand(0).getImm();
>>>>>
>>>>>     if ((!StackGrowsDown && I->getOpcode() == FrameSetupOpcode) ||
>>>>>         (StackGrowsDown && I->getOpcode() == FrameDestroyOpcode))
>>>>>       Size = -Size;
>>>>>
>>>>>     SPAdj += Size;
>>>>>
>>>>> [...]
>>>>> ----------------------------------------
>>>>>
>>>>>
>>>>> The problem is that it expects frame-setup and frame-destroy opcodes,
>>>>> but at the time it runs (after emitPrologue/emitEpilogue) the frame
>>>>> setup and teardown will be expanded into instruction sequences that
>>>>> can be different for each target, let alone having the immediate value
>>>>> in the 0-th operand.
>>>>>
>>>>> As I see, this code won't work, although I'm not sure what was the
>>>>> original idea behind it.  Should this code run before the target
>>>>> specific generation of prolog/epilog?  Even then, there won't need to
>>>>> be ADJCALLSTACKUP/DOWN instructions (if it's a leaf function).  If it
>>>>> runs where it should, should it instead use some target-specific hook
>>>>> that identifies the actual stack adjustment amount?
>>>>>
>>>>> -Krzysztof
>>>>>
>>>>>
>>>>> --
>>>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
>>>>> hosted by The Linux Foundation
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>>>>> <mailto:LLVMdev at cs.uiuc.edu>http://llvm.cs.uiuc.edu
>>>>> <http://llvm.cs.uiuc.edu/>
>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>
>>>
>>>
>>> --
>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
>>> hosted by The Linux Foundation
>>
>
>


-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, 
hosted by The Linux Foundation



More information about the llvm-dev mailing list