[llvm-dev] [RFC] Using Intel MPX to harden SafeStack
LeMay, Michael via llvm-dev
llvm-dev at lists.llvm.org
Tue Feb 7 16:05:06 PST 2017
I previously posted about using 32-bit X86 segmentation to harden SafeStack: http://lists.llvm.org/pipermail/llvm-dev/2016-May/100346.html That involves lowering the limits of the DS and ES segments that are used for ordinary data accesses while leaving the limit for SS, the stack segment, set to its maximum value. The safe stacks were clustered above the limits of DS and ES. Thus, by directing individual memory operands to either DS/ES or SS, stray pointer writes that could otherwise corrupt the safe stack would be blocked by the segmentation checks. My proposed compiler modifications inspect memory operands to determine whether the compiler (or more specifically, the SafeStack pass) intends that they be allowed to access the safe stack. It then inserts segment override prefixes and related instructions as necessary.
I submitted patches today to implement an analogous idea in 64-bit mode using Intel MPX. MPX can be used both to enforce fine-grained per-object bounds and coarse-grained bounds. My patches use it for the latter purpose, so they make no use of the table-related instructions in MPX. The runtime library  simply initializes one bounds register, BND0, to have an upper bound that is set below all safe stacks and above all ordinary data. A pre-isel patch instruments stores that are not authorized to access the safe stack by preceding each such instruction with a BNDCU instruction. That checks whether the following store accesses memory that is entirely below the upper bound in BND0 . Loads are not instrumented, since the purpose of the checks is only to help prevent corruption of the safe stacks. Authorized safe stack accesses are not instrumented, since the SafeStack pass is responsible for verifying that such accesses do not corrupt the safe stack. The default handler is used when a bound check fails, which results in the program being terminated on the systems where I have performed tests.
To reduce the performance and size overhead from instrumenting the code, both the pre-isel patch and a pre-emit patch elide various checks [2, 3]. The pre-isel patch uses techniques derived from the BoundsChecking pass to statically verify that some stores are safe so that the checks for those stores can be elided. The pre-emit patch compares the bound checks in each basic block and combines those that are redundant. The contents of BND0 are static, so a successful check of a higher address implies that any check of a lower address will also succeed. Thus, if a check of a higher address precedes a check of a lower address in a basic block, the latter check can be erased. On the other hand, if a check of a lower address precedes a check of a higher address in a basic block, then the latter check can still be erased, but it is also necessary to use the higher address in the remaining check. However, my pass is only able to statically compare certain addresses, which limits the checks that can be combined. For example, if two addresses use the same base and index registers and scale along with a simple displacement, then my pass may be able to compare them. However, if either the base or the index register is redefined by an instruction between the two checks, then my pass is currently unable to compare the two addresses. Incidentally, the pre-emit pass uses the getAddressFromInstr routine, which needs to be patched to properly handle certain global variable addresses . The pre-emit pass also erases checks for addresses that do not specify a base or index register as well as those that specify a RIP-relative offset with no index register. I think that the source code would need to be quite malformed to corrupt safe stacks using such address types. Additional optimizations may be possible in the future, such as lifting checks out of loops or otherwise performing inter-basic block analysis to identify additional redundant checks.
The pre-emit pass also erases bound checks for accesses relative to a non-default segment, such as thread-local accesses relative to FS. Linear addresses for thread-local accesses are computed with a non-zero segment base address, so it would be necessary to check thread-local effective addresses against a bounds register with an upper bound that is adjusted down to account for that rather than the bounds register that is used for checking other accesses. However, negative offsets are sometimes used for thread-local accesses, which are treated as very large unsigned effective addresses. Checking them would require them to first be added to the base of the thread-local storage segment.
Developers can use the -mseparate-stack-seg flag to enable instrumentation of functions that have the SafeStack attribute [4, 6]. That flag also causes the runtime library to be linked .
Due to BND0 being treated as per-thread state, the runtime library picks an initial BND0 upper bound when the program starts that is arbitrarily set to be 256MiB below the base of the initial (safe) stack. If and when that 256MiB space becomes overfilled by safe stacks, the program will crash due to a failing CHECK_GE statement in the runtime library. Without this check, an adversary may be able to modify a variable in the runtime library recording the address of the most-recently allocated safe stack to cause safe stacks to be allocated in vulnerable locations. An alternative approach to avoid that limitation could be to store that variable above the bound checked by the instrumented code. This could help to prevent adversaries from forcing safe stacks to be allocated at vulnerable locations while still allowing the program to keep running even when its safe stacks protrude below the bound. Of course, the protruding portions of the safe stacks would be vulnerable. Another alternative could be to treat the MPX bounds registers as per-program state rather than per-thread state. BND0 could then be adjusted downwards as necessary when new safe stacks are allocated.
The runtime library currently ignores all attribute settings passed to pthread_create. It allocates a safe stack itself at a high address. Furthermore, the runtime library currently does not support expansion of the safe stacks, nor does it free the safe stacks that it allocates.
If the BND0 upper bound happens to fall below some ordinary data, then attempted accesses to that data by instrumented stores will violate the associated bound checks.
The runtime library checks for MPX support when it initializes, and it falls back to the default (ASLR-based) safe stack protections if MPX is unavailable. However, (inactive) bound check instructions in the program may still impose size and performance overheads.
There could conceivably be a situation in which an instrumented program passes a function pointer for an instrumented callback to an uninstrumented library. If that library allocates an object on the (safe) stack and passes its pointer to the callback, then a bound check violation could result. This is due to an assumption in the pass that instruments the code. It assumes that all pointer arguments except those with the byval or readnone attributes point to the unsafe stack. I think this is a valid assumption when all stack frames correspond to instrumented functions. However, it is not a valid assumption in the scenario described above. Such bound check violations can be avoided by not instrumenting such callbacks as well as any functions to which they pass pointers to any allocations on the safe stack.
 [safestack] Add runtime support for MPX-based hardening: https://reviews.llvm.org/D29657
 [X86] Add X86SafeStackBoundsChecking pass: https://reviews.llvm.org/D29649
 [X86] Add X86SafeStackBoundsCheckingCombiner pass: https://reviews.llvm.org/D29652
 [X86] Add -mseparate-stack-seg: https://reviews.llvm.org/D17092
 [X86] Link safestacksepseg runtime: https://reviews.llvm.org/D29655
 [X86] Add separate-stack-seg feature: https://reviews.llvm.org/D29646
 [x86] Fix getAddressFromInstr: https://reviews.llvm.org/D27169
More information about the llvm-dev