[PATCH] D147975: [StackProtector] don't check stack protector before calling nounwind functions

Wed Apr 12 19:03:53 PDT 2023

xiangzhangllvm added a comment.

In D147975#4262071 <https://reviews.llvm.org/D147975#4262071>, @nickdesaulniers wrote:

> In D147975#4259794 <https://reviews.llvm.org/D147975#4259794>, @LuoYuanke wrote:
>
>> For this case
>>
>>   void caller () {
>>     int arr [42] = {};
>>     int (*callee)() = get_callback();
>>     vuln();
>>     callee(arr);
>>   }
>>
>> If callee is not noreturn function, caller has a chance to check the canary when it returns.
>
> The issue is that if `callee` is not `noreturn`, `vuln` may have overwritten the value of `callee` (if it was on the stack) and the stack canary is not checked BEFORE transferring control flow to an arbitrary location.  But if `callee` was `noreturn` we would? Echoing the sentiment of https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58245#c6, that lack of symmetry seems odd.
>
> Consider another example:
>
>   void caller () {
>     int data [42] = {};
>     int (*callback)() = get_callback();
>     vuln();
>     callee(callback, data);
>   }
>
> So the stack gets corrupted within `caller`'s frame, possibly modifying `callback` if it was on the stack.  `caller` gets a stack canary, but the check is after the return of `callee` to `caller`.  In the meantime, `callee` assumes the stack data is not corrupted and invokes `callback`, which may be arbitrary attacker code now due to `vuln`, and might not actually ever return where any such stack canary would be checked.  Oops!
>
> In these two examples, nothing is `noreturn`, and yet you can still hijack control flow, so why treat `noreturn` functions specially?  This is my interpretation of https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58245#c6 and I agree with the sentiment.

LLVM stack protector is not good at cover this 2 cases, the function pointers are in local variable area, But the position of LLVM stack guard are between local variable area and return address.
If the attacker only need to "cover" the local variable area on stack, the stack protector didn't work.

> Once a caller has had its stack frame corrupted, literally any stack references passed to a callee are dangerous to use to make control flow decisions.  I also don't think that the callee can detect such corruption of the parent's/caller's frame.
>
> Another issue is that a stack canary is weak indicator that the stack has been tampered with.  You can maliciously modify stack memory without overwriting the stack slot containing the canary, or if the canary was leaked it can be rewritten in the correct slot.  So adding more and more and more stack canary checks is nearly meaningless work (IMO).

Very correct ！So let's focus on what this patch mainly want to do ? It can't cover all stack attack case, it mainly to **reduce ** the **most common** stack attack--attacker cover the stack from local variable area to fixed stack slot (return address or parameter address)--LLVM stack protect can work well because stack guard happens to be in this area.

>> As a tradeoff of security, code size and performance, I don't object to land the patch. I am curious about the noreturn function that is caught in your case. Could you show some examples of those noreturn function?
>
> The kernel code is basically:
>
>   void start_kernel () {
>     ...
>     initialize_the_stack_canary();
>     do_rest();
>   }
>   void do_rest () {
>     ...
>     while (1);
>   }
>
> LLVM:
>
> 1. infers that `do_rest` is implicitly `noreturn`.
> 2. decides that `start_kernel` must now have a stack canary
>
> At runtime:
>
> 1. `start_kernel` writes an uninitialized canary value into a stack slot.
> 2. `initialize_the_stack_canary` changes the value of canary (globally, not in the stack slot)
> 3. canary reloaded and compared against the value in the stack slot, which was uninitialized garbage
> 4. `__stack_chk_fail` -> `panic` -> machine fails to boot at startup. ---

It is belong to user code change the stack guard value, it should not use stack protect.

> In D147975#4260221 <https://reviews.llvm.org/D147975#4260221>, @efriedma wrote:
>
>> The kernel should probably disable stack canaries for the functions in question?
>
> Yes, I plan to do that //in addition to this change//.  But it's important to demonstrate to the kernel community that we're working on this problem from both angles.  Because kernel developers will ask "what is your toolchain smoking?" (rightfully) when they see this.
>
> For the kernel, I will have to:
>
> 1. add support for `__attribute__((no_stack_protector))`. This doesn't exist for all supported toolchain versions (https://docs.kernel.org/process/changes.html) particularly GCC only gained support in gcc-11 which is relatively recent.
> 2. backport this to all supported kernel versions (https://kernel.org/)
>
> It is infeasible (IMO) to move `start_kernel` to its own TU and build that function with `-fno-stack-protector`; the function attributes give us the ideal granularity, but aren't available everywhere we need to support (temporal versions of kernel sources, and temporal versions of supported toolchains).

In my eye, Linux kernel should pay more attention on performance. Maybe we shouldn't build Linux kernel with stack-protector at all, especially if it has a lot of initialize_the_stack_canary() like functions.

I still +1 for this patch.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D147975/new/

https://reviews.llvm.org/D147975