[PATCH] D78717: [SystemZ] Implement -fstack-clash-protection

Fri May 8 09:37:54 PDT 2020

uweigand added a comment.

In D78717#2025239 <https://reviews.llvm.org/D78717#2025239>, @jonpa wrote:

> Added probing of the tail of dyn-alloca, with an extra check for zero tail, roughly like GCC. It seems it can be assumed that the alloca size will always be a multiple of 8 bytes, or? I think that is necessary, or there might be final probe partially below SP (if tail is e.g. 4 bytes). Also, if the stack-probe size is 0, an infinite loop would result (assert?), but I suppose that would always be noticeable.

The size argument to DYNAMIC_STACKALLOC is indeed always a multiple of 8 (as documented in ISDOpcodes.h).

As to the stack probe size, I think must indeed ensure that this size is also a multiple of 8 bytes (the stack alignment requirement) and is nonzero.   If the given value doesn't fulfil those requirements, I guess we should round it down to the stack alignment requirement, and if the result is zero, use the stack alginment requirement instead.

> (re)compute live-ins for the new blocks created in inlineStackProbe() (it seems this is not needed for the probed alloca case). The generated code from inlineStackProbe() does roughly the same as GCC. We could use a brct for the loop instead, or we could try to keep it at 12 or more instructions using multiple exits (like "forced" unrolling), but I suppose maybe the unrolling is the common case, so the loop isn't that important?

I think just having a plain loop is OK for now.    For any future optimizations, we'd have to do performance evaluation first.

> Tests: Not sure if we need all the X86 test cases - for instance stack-clash-medium-natural-probes.ll seems to generate nearly the same code as stack-clash-medium.ll.

It seems this was added in preparation for exploiting "natural probes" -- if the code can be proven to already touch all (or at least some) stack pages at least once (in this test case via the stores), then we might optimize out some of the extra probes.   But apparently this wasn't implemented in the X86 back-end in the end, and neither is it in your patch, so it doesn't make sense to have a test for it.

> stack-clash-unknown-call.ll: Changed the called function from memset to something not known, since the point of the test case seemed to be to have a call in the function, which however did not work at first since an XC loop resulted.

I believe this is also related to natural probes: on *Intel*, every call instruction would be a natural probe as it pushes the return address to the stack.  But on Z, the call instruction doesn't touch the stack (the called function might, but it also might not, so we cannot really rely on it).

> Moved the utility functions to SystemZInstrInfo.cpp as agreed before.

OK.

>> Huh. I'd probably just use uint64_t everywhere.
> 
> I think what confused me a bit was that MCCFIInstruction::createDefCfaOffset() takes an 'int' as argument, but I suppose it's not the end of the world if the CFA offset is possibly broken if the stack frame size ever became greater than -INT32_MIN...?

Hmm.   In theory, those DWARF offset values are encoded as ULEB128, so there's no reason to constrain them to an "int".   If this ever becomes an issue, we should update MCCFIInstruction to use int64_t instead.  For now, that's probably not a real concern; I'm sure there's other code that doesn't handle stack sizes > 2GB correctly.  (E.g. do we even update the stack pointer correctly in this case?)

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78717/new/

https://reviews.llvm.org/D78717