[llvm-dev] [RFC] Using Intel MPX to harden SafeStack

Kostya Serebryany via llvm-dev llvm-dev at lists.llvm.org
Tue Feb 7 20:02:13 PST 2017

On Tue, Feb 7, 2017 at 4:05 PM, LeMay, Michael via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> Hi,
> I previously posted about using 32-bit X86 segmentation to harden
> SafeStack: http://lists.llvm.org/pipermail/llvm-dev/2016-May/100346.html
> That involves lowering the limits of the DS and ES segments that are used
> for ordinary data accesses while leaving the limit for SS, the stack
> segment, set to its maximum value.  The safe stacks were clustered above
> the limits of DS and ES.  Thus, by directing individual memory operands to
> either DS/ES or SS, stray pointer writes that could otherwise corrupt the
> safe stack would be blocked by the segmentation checks.  My proposed
> compiler modifications inspect memory operands to determine whether the
> compiler (or more specifically, the SafeStack pass) intends that they be
> allowed to access the safe stack.  It then inserts segment override
> prefixes and related instructions as necessary.
> I submitted patches today to implement an analogous idea in 64-bit mode
> using Intel MPX.  MPX can be used both to enforce fine-grained per-object
> bounds and coarse-grained bounds.  My patches use it for the latter
> purpose, so they make no use of the table-related instructions in MPX.

That's a relief :)

> The runtime library [1] simply initializes one bounds register, BND0, to
> have an upper bound that is set below all safe stacks and above all
> ordinary data.

So you enforce that safe stacks and other data are not intermixed, as you
explain below.
What are the downsides? Performance? Compatibility?

> A pre-isel patch instruments stores that are not authorized to access the
> safe stack by preceding each such instruction with a BNDCU instruction.

My understanding is that BNDCU is the cheapest possible instruction, just
like XOR or ADD,
so the overhead should be relatively small.
Still my guesstimate would be >= 5% since stores are very numerous.
And such overhead will be on top of whatever overhead SafeStack has.
Do you have any measurements to share?

> That checks whether the following store accesses memory that is entirely
> below the upper bound in BND0 [2].  Loads are not instrumented, since the
> purpose of the checks is only to help prevent corruption of the safe
> stacks.  Authorized safe stack accesses are not instrumented, since the
> SafeStack pass is responsible for verifying that such accesses do not
> corrupt the safe stack.  The default handler is used when a bound check
> fails, which results in the program being terminated on the systems where I
> have performed tests.
> To reduce the performance and size overhead from instrumenting the code,
> both the pre-isel patch and a pre-emit patch elide various checks [2, 3].
> The pre-isel patch uses techniques derived from the BoundsChecking pass to
> statically verify that some stores are safe so that the checks for those
> stores can be elided.  The pre-emit patch compares the bound checks in each
> basic block and combines those that are redundant.  The contents of BND0
> are static, so a successful check of a higher address implies that any
> check of a lower address will also succeed.  Thus, if a check of a higher
> address precedes a check of a lower address in a basic block, the latter
> check can be erased.  On the other hand, if a check of a lower address
> precedes a check of a higher address in a basic block, then the latter
> check can still be erased, but it is also necessary to use the higher
> address in the remaining check.  However, my pass is only able to
> statically compare certain addresses, which limits the checks that can be
> combined.  For example, if two addresses use the same base and index
> registers and scale along with a simple displacement, then my pass may be
> able to compare them.  However, if either the base or the index register is
> redefined by an instruction between the two checks, then my pass is
> currently unable to compare the two addresses.

The usual question in such situation: how do we verify that the
optimizations are not too optimistic?
If we remove a check that is not in fact redundant, we will never know,
until clever folks use it for an exploit (and maybe not even then).

> Incidentally, the pre-emit pass uses the getAddressFromInstr routine,
> which needs to be patched to properly handle certain global variable
> addresses [7].  The pre-emit pass also erases checks for addresses that do
> not specify a base or index register as well as those that specify a
> RIP-relative offset with no index register.  I think that the source code
> would need to be quite malformed to corrupt safe stacks using such address
> types.  Additional optimizations may be possible in the future, such as
> lifting checks out of loops or otherwise performing inter-basic block
> analysis to identify additional redundant checks.
> The pre-emit pass also erases bound checks for accesses relative to a
> non-default segment, such as thread-local accesses relative to FS.  Linear
> addresses for thread-local accesses are computed with a non-zero segment
> base address, so it would be necessary to check thread-local effective
> addresses against a bounds register with an upper bound that is adjusted
> down to account for that rather than the bounds register that is used for
> checking other accesses.  However, negative offsets are sometimes used for
> thread-local accesses, which are treated as very large unsigned effective
> addresses.  Checking them would require them to first be added to the base
> of the thread-local storage segment.
> Developers can use the -mseparate-stack-seg flag to enable instrumentation
> of functions that have the SafeStack attribute [4, 6].  That flag also
> causes the runtime library to be linked [5].
> Due to BND0 being treated as per-thread state, the runtime library picks
> an initial BND0 upper bound when the program starts that is arbitrarily set
> to be 256MiB below the base of the initial (safe) stack.  If and when that
> 256MiB space becomes overfilled by safe stacks, the program will crash due
> to a failing CHECK_GE statement in the runtime library.  Without this
> check, an adversary may be able to modify a variable in the runtime library
> recording the address of the most-recently allocated safe stack to cause
> safe stacks to be allocated in vulnerable locations.  An alternative
> approach to avoid that limitation could be to store that variable above the
> bound checked by the instrumented code.  This could help to prevent
> adversaries from forcing safe stacks to be allocated at vulnerable
> locations while still allowing the program to keep running even when its
> safe stacks protrude below the bound.  Of course, the protruding portions
> of the safe stacks would be vulnerable.  Another alternative could be to
> treat the MPX bounds registers as per-program state rather than per-thread
> state.  BND0 could then be adjusted downwards as necessary when new safe
> stacks are allocated.
> The runtime library currently ignores all attribute settings passed to
> pthread_create.  It allocates a safe stack itself at a high address.
> Furthermore, the runtime library currently does not support expansion of
> the safe stacks, nor does it free the safe stacks that it allocates.
> If the BND0 upper bound happens to fall below some ordinary data, then
> attempted accesses to that data by instrumented stores will violate the
> associated bound checks.
> The runtime library checks for MPX support when it initializes, and it
> falls back to the default (ASLR-based) safe stack protections if MPX is
> unavailable.  However, (inactive) bound check instructions in the program
> may still impose size and performance overheads.
> There could conceivably be a situation in which an instrumented program
> passes a function pointer for an instrumented callback to an uninstrumented
> library.  If that library allocates an object on the (safe) stack and
> passes its pointer to the callback, then a bound check violation could
> result.  This is due to an assumption in the pass that instruments the
> code.  It assumes that all pointer arguments except those with the byval or
> readnone attributes point to the unsafe stack.  I think this is a valid
> assumption when all stack frames correspond to instrumented functions.
> However, it is not a valid assumption in the scenario described above.
> Such bound check violations can be avoided by not instrumenting such
> callbacks as well as any functions to which they pass pointers to any
> allocations on the safe stack.
> Comments appreciated.
> Thanks,
> Michael
> [1] [safestack] Add runtime support for MPX-based hardening:
> https://reviews.llvm.org/D29657
> [2] [X86] Add X86SafeStackBoundsChecking pass: https://reviews.llvm.org/
> D29649
> [3] [X86] Add X86SafeStackBoundsCheckingCombiner pass:
> https://reviews.llvm.org/D29652
> [4] [X86] Add -mseparate-stack-seg: https://reviews.llvm.org/D17092
> [5] [X86] Link safestacksepseg runtime: https://reviews.llvm.org/D29655
> [6] [X86] Add separate-stack-seg feature: https://reviews.llvm.org/D29646
> [7] [x86] Fix getAddressFromInstr: https://reviews.llvm.org/D27169
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170207/cde9479f/attachment.html>

More information about the llvm-dev mailing list