[PATCH] fix for PR16393: miscompile with struct byval

Thu Jul 11 10:27:32 PDT 2013

On Jul 11, 2013, at 5:54 AM, Stepan Dyatkovskiy wrote:

> Hello Manman.
> 
> I've just read your patches.
> Great job, man! Though I have some remarks.
> 
> About stack verification.
> 
> Would you explain what is stored in SPAdjAfter? Some comments may be?
> May be SPAdjNew? I understood what it means only after 30 minutes of reading code up and down. If I understood properly it means:
> SPAdj value after ordinary FrameSetup/Destroy instruction.

SPAdj is used to keep the value at entry of a Basic Block. SPAdjAfter is used to track the value after processing each instruction, after we are
done with the basic block, it stores the value at exit of a basic block.
Of course, I can add more comments.

> May be rename SPAdj/IsSetup to PrevSPAdj/PrevIsSetup. I'm not sure about how it should be called. But I still think names are important here.
> 
> > SPAdj = Iter->second.second.second;
> I think it is not pretty readable…
Yes, will use a struct :]
> 
> >       // If we see a FrameSetup, now we are expecting a FrameDestroy with the
> >      // same value; otherwise we are expecting a FrameSetup.
> ...
> >      if ((IsSetupAfter && (I->getOpcode() != FrameDestroyOpcode ||
> >                            SPAdjAfter != Size)) ||
> >          (!IsSetupAfter && I->getOpcode() != FrameSetupOpcode))
> >        report("FrameSetup <n> is always followed by a FrameDestroy <n>", I);
> 
> Hm.. may be split it somehow. If I got right there is next logic:
> if (IsSetupAfter)
>  valid: (I->getOpcode() == FrameDestroyOpcode && SPAdjAfter == Size)
> else
>  valid: I->getOpcode() == FrameSetupOpcode
Yes, I will separate the cases and have more details in the report.

> 
> And what about:
> 
> [ DOWN 512] ; OK
> [CALL]
> [UP 512; DOWN 8; DOWN 2048] ; report?

Yes, report. Nested stack frames are not allowed.

> [CALL]
> [UP 2048; UP 8] ?
> 
> After all
> 
> SPAdjAfter = Size - SPAdjAfter;
> 
> may do the bad things in case above. May be here we change the sign of Size depending on opcode and then just add it?
To make things clear, I am going to change the definition of SPAdj.

> 
> Size = (Opcode == Setup) ? -Size : Size;
> SPAdjAfter += Size
> 
> About PEI.
> 
> >  assert((SPAdjCount || SPAdjInBB == 0) &&
> >         "Unbalanced call frame setup / destroy pairs?");
> After you implemented stack verification it looks rudimentary, and it still fires for test-case I've attached to this post. Main reason we decided to make things more complex: is unbalanced BBs, so SPAdjInBB maybe != 0.
This is from existing code, it does some preliminary checking.
It gives error if SPAdjCount is zero and SPAdjInBB is not zero.

Here SPAdjInBB is the amount of adjustment inside a basic block and SPAdjCount is the number of UP and DOWN inside a basic block.
SPAdjCount is zero if we have paired UPs and DOWNs.
So if we have a basic block of UP x and DOWN y where x != y, we will have assertion failure.
This does not sound right. I will check against your testing case.

Manman
> 
> -Stepan.
> 
> Manman Ren wrote:
>> 
>> 3 patches attached:
>> 1> modify machine verifier to verify the 3 variants
>> 2> reduced testing case that crashed with struct byval
>> 3> modify PEI to calculate SPAdj using DFS traversal
>>       refactor replaceFrameIndices(MachineFunction &Fn) to
>>           replaceFrameIndices(MachineBasicBlock *BB, MachineFunction &Fn,
>>                                                   int &SPAdj)
>>       which handles a single basic block
>> 
>> Thanks,
>> Manman
>> 
>> 
>> 
>> 
>> 
>> On Jul 9, 2013, at 6:07 AM, Stepan Dyatkovskiy wrote:
>> 
>>> Hello guys,
>>> 
>>> As another solution, can we add one more back-end hook that tells PEI everything about CALL instruction: BBs with ADJSTACKDOWN/UP and BB with CALL instruction itself.
>>> What do you think?
>>> 
>>> -Stepan.
>>> 
>>> Jakob Stoklund Olesen wrote:
>>>> 
>>>> On Jul 8, 2013, at 2:06 PM, Jim Grosbach <grosbach at apple.com> wrote:
>>>> 
>>>>> 
>>>>> On Jul 8, 2013, at 1:47 PM, Jakob Stoklund Olesen <jolesen at apple.com> wrote:
>>>>> 
>>>>>> By complexity, do you mean compile time or amount of PEI code required?
>>>>> 
>>>>> The latter. I think compile time will be fine either way. My simplistic view of PEI is that it just iterates over the instructions in a function, in layout order, replaces frame indices along the way and keeps track of some interesting facts while doing so like which CSRs are clobbered. Anything that moves the algorithm further afield from that makes me nervous.
>>>>> 
>>>>>> I am not really concerned about compile time. We’re already (at least) linear in the number of CFG edges, and this won’t add much to that term.
>>>>> 
>>>>> I thought PEI was constant on the CFG and linear on the number of basic blocks unless shrink-wrapping was going on. I confess I haven’t looked at the guts of the algorithm in a long time, though, so I could easily be wrong.
>>>> 
>>>> You’re not wrong. I meant the algorithmic complexity of the whole backend is (super-) linear in the CFG edges. PEI is indeed constant.
>>>> 
>>>> The difference between O(#blocks) and O(#edges) only shows up when compiling functions with computed-goto, as you mentioned.
>>>> 
>>>>> The scavenger is the nasty bit, when it fires, but has the hard-coded cap on how far to scan to prevent things from going too badly. Anyways, I don’t see this as a significant concern for the problem at hand.
>>>>> 
>>>>>> As for code complexity, this will certainly make PEI more complex, but I think it is comparable to all the places that would need to learn how to maintain the SET_SP_ADJ invariants. I’d prefer to have the mess all in one place.
>>>>> 
>>>>> Entirely possible. My first thought is that there wouldn’t need to be many modifications elsewhere and the pseudo would just flow on through w/o getting mucked with. That may be overly optimistic, though.
>>>> 
>>>> There wouldn’t be too many cases, but it makes me nervous ;-)
>>>> 
>>>> I’d compare the SET_SP_ADJ instruction to the physical register live-in lists we keep on basic blocks. When you split a basic block, you need to compute which physical registers are live at the split point, so the live-in lists can be populated correctly. Similarly for SET_SP_ADJ, you would need to check if your split point is bracketed by STACKDOWN/STACKUP instructions.
>>>> 
>>>> We currently get this wrong when expanding X86 CMOV pseudo-instructions, this was the cause of the register allocator crash that I recently fixed by shrinking the physreg live ranges in landing pads.
>>>> 
>>>> So this comes down to “who gets the monkey?” We can add complexity to PEI, or we can add complexity to the MI intermediate representation. I prefer to simplify MI to have less rules for mutators to follow.
>>>> 
>>>> [I would love to ban physical register live ranges while MI is in SSA form, it would avoid the block splitting problem.]
>>>> 
>>>> /jakob
>>>> 
>>> 
>> 
> 
> <pr16393-2013-06-03-ByVal-2Kbytes.ll>