[llvm-dev] [RFC] Using Intel MPX to harden SafeStack
LeMay, Michael via llvm-dev
llvm-dev at lists.llvm.org
Wed Feb 8 16:51:24 PST 2017
On 2/7/2017 20:02, Kostya Serebryany wrote:
> On Tue, Feb 7, 2017 at 4:05 PM, LeMay, Michael via llvm-dev
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
> The runtime library  simply initializes one bounds register,
> BND0, to have an upper bound that is set below all safe stacks and
> above all ordinary data.
> So you enforce that safe stacks and other data are not intermixed, as
> you explain below.
> What are the downsides? Performance? Compatibility?
I think the main downside is that only a limited number of threads can
be created before the safe stacks would protrude below the bound.
Extending the proposed runtime library to deallocate safe stacks when
they are no longer needed may help with this. The safe stacks are also
prevented from expanding, since they are allocated contiguously at high
> A pre-isel patch instruments stores that are not authorized to
> access the safe stack by preceding each such instruction with a
> BNDCU instruction.
> My understanding is that BNDCU is the cheapest possible instruction,
> just like XOR or ADD,
> so the overhead should be relatively small.
> Still my guesstimate would be >= 5% since stores are very numerous.
> And such overhead will be on top of whatever overhead SafeStack has.
> Do you have any measurements to share?
I'm working on getting approval to release some benchmark results.
> That checks whether the following store accesses memory that is
> entirely below the upper bound in BND0 . Loads are not
> instrumented, since the purpose of the checks is only to help
> prevent corruption of the safe stacks. Authorized safe stack
> accesses are not instrumented, since the SafeStack pass is
> responsible for verifying that such accesses do not corrupt the
> safe stack. The default handler is used when a bound check fails,
> which results in the program being terminated on the systems where
> I have performed tests.
> To reduce the performance and size overhead from instrumenting the
> code, both the pre-isel patch and a pre-emit patch elide various
> checks [2, 3]. The pre-isel patch uses techniques derived from
> the BoundsChecking pass to statically verify that some stores are
> safe so that the checks for those stores can be elided. The
> pre-emit patch compares the bound checks in each basic block and
> combines those that are redundant. The contents of BND0 are
> static, so a successful check of a higher address implies that any
> check of a lower address will also succeed. Thus, if a check of a
> higher address precedes a check of a lower address in a basic
> block, the latter check can be erased. On the other hand, if a
> check of a lower address precedes a check of a higher address in a
> basic block, then the latter check can still be erased, but it is
> also necessary to use the higher address in the remaining check.
> However, my pass is only able to statically compare certain
> addresses, which limits the checks that can be combined. For
> example, if two addresses use the same base and index registers
> and scale along with a simple displacement, then my pass may be
> able to compare them. However, if either the base or the index
> register is redefined by an instruction between the two checks,
> then my pass is currently unable to compare the two addresses.
> The usual question in such situation: how do we verify that the
> optimizations are not too optimistic?
> If we remove a check that is not in fact redundant, we will never
> know, until clever folks use it for an exploit (and maybe not even then).
The pre-emit pass is able to verify that some checks are redundant by
inspecting the operands used to specify an address. For example,
consider the following test for the pre-emit pass:
0: %rax = MOVSX64rr32 killed %edi
1: INLINEASM $"bndcu $0, %bnd0", 8, 196654, _, 8, %rax, @x + 4, _
; CHECK: INLINEASM $"bndcu $0, %bnd0", 8, 196654, _, 8, %rax, @x + 8, _
2: MOV32mi _, 8, %rax, @x, _, 0
3: INLINEASM $"bndcu $0, %bnd0", 8, 196654, _, 8, %rax, @x + 8, _
; CHECK-NOT: INLINEASM $"bndcu $0, %bnd0", 8, 196654, _, 8, %rax,
@x + 8, _
4: MOV32mi _, 8, killed %rax, @x + 4, _, 0
The pass verifies that the only difference between the memory operands
in instructions 1 and 3 is that they use a different offset from the
global variable, so they can be combined. The pass also tracks register
definitions, so it would know not to combine the checks in this example
if there had been an instruction that redefined %rax between
instructions 1 and 3.
On the other hand, some of the optimizations described in the next
couple of paragraphs may be optimistic, so I especially welcome feedback
> The pre-emit pass also erases checks for addresses that do not
> specify a base or index register as well as those that specify a
> RIP-relative offset with no index register. I think that the
> source code would need to be quite malformed to corrupt safe
> stacks using such address types.
> The pre-emit pass also erases bound checks for accesses relative
> to a non-default segment, such as thread-local accesses relative
> to FS. Linear addresses for thread-local accesses are computed
> with a non-zero segment base address, so it would be necessary to
> check thread-local effective addresses against a bounds register
> with an upper bound that is adjusted down to account for that
> rather than the bounds register that is used for checking other
> accesses. However, negative offsets are sometimes used for
> thread-local accesses, which are treated as very large unsigned
> effective addresses. Checking them would require them to first be
> added to the base of the thread-local storage segment.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev