[PATCH] D68720: Support -fstack-clash-protection for x86

Mon Oct 14 15:28:22 PDT 2019

rnk added a comment.

For maintenance reasons, I'd really prefer it if we could find a way to reuse the existing code that calls an external stack probe function. What do you think about taking a look at X86RetpolineThunks.cpp and doing something similar to that? Basically, when the user sets `-fstack-clash-protection`, LLVM will emit a small comdat+weak function into every object file that has the same ABI as the existing stack probe mechanism. For other prior art, you can also look at how `__clang_call_terminate` works.

================
Comment at: llvm/lib/Target/X86/X86FrameLowering.cpp:400
+        !(STI.isOSWindows() && !STI.isTargetMachO());
+    if (InlineStackClashProtector && !InEpilogue) {
+      const uint64_t PageSize = TLI.getStackProbeSize(MF);
----------------
serge-sans-paille wrote:
> efriedma wrote:
> > Why is this code in a different location from the stack probing code that generates a call?
> Because `BuildStackAdjustment` has more callsites than just emitPrologue and we want to capture all stack manipulation.
If you care about those cases, we should have tests for all of them. These are all the cases:
1. TailCallReturnAddrDelta: When doing guaranteed TCO for a callee with >4K of argument memory
2. eliminateCallFramePseudoInstr: When doing stack adjustments to set up a call with more than 4K of arguments

Both of these cases involve setting up arguments to calls, and I think we can guarantee that the compiler will emit writes to every argument stack slot. So, to set up one of these cases, we'd have this code:
```
struct EightK { int a[2048]; };
void passEightK(EightK);
void foo() {
  EightK x;
  memset(&x, 0, sizeof(x));
  passEightK(x); // some targets will pass indirect, 32-bit probably uses byval.
}
```

In this case, LLVM happens to use rep;movsl to copy the argument memory. It could use memcpy, but either way, it will probe all those bytes.

I think that means it's safe to move to emitSPUpdate. I would greatly prefer that `BuildStackAdjustment` remain a simple primitive that generates a single instruction.

================
Comment at: llvm/lib/Target/X86/X86FrameLowering.cpp:353

+static const DenseMap<unsigned, unsigned> OpcodeToSPOperandIndex = {
+    {X86::MOV8mi, 3}, {X86::MOV16mi, 3}, {X86::MOV32mi, 3}, {X86::MOV64mi32, 3},
----------------
Please no dynamically initialized globals. The LLVM-y thing would be to use a switch.

================
Comment at: llvm/test/CodeGen/X86/stack-clash-dynamic-alloca.ll:9-11
+; CHECK: subq   $4096, %rsp
+; CHECK: movq   $0, (%rsp)
+  %a = alloca i32, i32 %n, align 16
----------------
This seems like a good use case for update_llc_test_checks.py. I'd want to see the body of the loop, the cmp, the jl, etc.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D68720/new/

https://reviews.llvm.org/D68720