<html>

  <head>

    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    On 2/7/2017 20:02, Kostya Serebryany wrote:<br>

    <blockquote

cite="mid:CAN=P9pjH4A0GdAT_7pd8YjUV+9T+XZMyEbxjQkQ3GW1qebwAFw@mail.gmail.com"

      type="cite">

      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">On Tue, Feb 7, 2017 at 4:05 PM,

            LeMay, Michael via llvm-dev

            <span dir="ltr"><<a moz-do-not-send="true"

                href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>></span>

            wrote:<br>

            <blockquote class="gmail_quote" style="margin:0px 0px 0px

              0.8ex;border-left:1px solid

              rgb(204,204,204);padding-left:1ex"><br>

            </blockquote>

          </div>

        </div>

      </div>

    </blockquote>

    ...<br>

    <blockquote

cite="mid:CAN=P9pjH4A0GdAT_7pd8YjUV+9T+XZMyEbxjQkQ3GW1qebwAFw@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <blockquote class="gmail_quote" style="margin:0px 0px 0px

              0.8ex;border-left:1px solid

              rgb(204,204,204);padding-left:1ex"> <br>

            </blockquote>

            <blockquote class="gmail_quote" style="margin:0px 0px 0px

              0.8ex;border-left:1px solid

              rgb(204,204,204);padding-left:1ex">

              The runtime library [1] simply initializes one bounds

              register, BND0, to have an upper bound that is set below

              all safe stacks and above all ordinary data. 

            </blockquote>

            <div><br>

            </div>

            <div>So you enforce that safe stacks and other data are not

              intermixed, as you explain below. </div>

            <div>What are the downsides? Performance? Compatibility?  <br>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    I think the main downside is that only a limited number of threads

    can be created before the safe stacks would protrude below the

    bound.  Extending the proposed runtime library to deallocate safe

    stacks when they are no longer needed may help with this.  The safe

    stacks are also prevented from expanding, since they are allocated

    contiguously at high addresses.<br>

    <br>

    <blockquote

cite="mid:CAN=P9pjH4A0GdAT_7pd8YjUV+9T+XZMyEbxjQkQ3GW1qebwAFw@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote"> 

            <blockquote class="gmail_quote" style="margin:0px 0px 0px

              0.8ex;border-left:1px solid

              rgb(204,204,204);padding-left:1ex">

              A pre-isel patch instruments stores that are not

              authorized to access the safe stack by preceding each such

              instruction with a BNDCU instruction. 

            </blockquote>

            <div><br>

            </div>

            <div>My understanding is that BNDCU is the cheapest possible

              instruction, just like XOR or ADD, </div>

            <div>so the overhead should be relatively small. </div>

            <div>Still my guesstimate would be >= 5% since stores are

              very numerous. </div>

            <div>And such overhead will be on top of whatever overhead

              SafeStack has. </div>

            <div>Do you have any measurements to share?  <br>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    I'm working on getting approval to release some benchmark results.<br>

    <br>

    <blockquote

cite="mid:CAN=P9pjH4A0GdAT_7pd8YjUV+9T+XZMyEbxjQkQ3GW1qebwAFw@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div> </div>

            <blockquote class="gmail_quote" style="margin:0px 0px 0px

              0.8ex;border-left:1px solid

              rgb(204,204,204);padding-left:1ex">

              That checks whether the following store accesses memory

              that is entirely below the upper bound in BND0 [2].  Loads

              are not instrumented, since the purpose of the checks is

              only to help prevent corruption of the safe stacks. 

              Authorized safe stack accesses are not instrumented, since

              the SafeStack pass is responsible for verifying that such

              accesses do not corrupt the safe stack.  The default

              handler is used when a bound check fails, which results in

              the program being terminated on the systems where I have

              performed tests.<br>

              <br>

              To reduce the performance and size overhead from

              instrumenting the code, both the pre-isel patch and a

              pre-emit patch elide various checks [2, 3].  The pre-isel

              patch uses techniques derived from the BoundsChecking pass

              to statically verify that some stores are safe so that the

              checks for those stores can be elided.  The pre-emit patch

              compares the bound checks in each basic block and combines

              those that are redundant.  The contents of BND0 are

              static, so a successful check of a higher address implies

              that any check of a lower address will also succeed. 

              Thus, if a check of a higher address precedes a check of a

              lower address in a basic block, the latter check can be

              erased.  On the other hand, if a check of a lower address

              precedes a check of a higher address in a basic block,

              then the latter check can still be erased, but it is also

              necessary to use the higher address in the remaining

              check.  However, my pass is only able to statically

              compare certain addresses, which limits the checks that

              can be combined.  For example, if two addresses use the

              same base and index registers and scale along with a

              simple displacement, then my pass may be able to compare

              them.  However, if either the base or the index register

              is redefined by an instruction between the two checks,

              then my pass is currently unable to compare the two

              addresses.  </blockquote>

            <div><br>

            </div>

            <div>The usual question in such situation: how do we verify

              that the optimizations are not too optimistic? </div>

            <div>If we remove a check that is not in fact redundant, we

              will never know, until clever folks use it for an exploit

              (and maybe not even then). <br>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

    <br>

    The pre-emit pass is able to verify that some checks are redundant

    by inspecting the operands used to specify an address.  For example,

    consider the following test for the pre-emit pass:<br>

    <br>

        0: %rax = MOVSX64rr32 killed %edi<br>

        1: INLINEASM $"bndcu $0, %bnd0", 8, 196654, _, 8, %rax, @x + 4,

    _<br>

        ; CHECK: INLINEASM $"bndcu $0, %bnd0", 8, 196654, _, 8, %rax, @x

    + 8, _<br>

        2: MOV32mi _, 8, %rax, @x, _, 0<br>

        3: INLINEASM $"bndcu $0, %bnd0", 8, 196654, _, 8, %rax, @x + 8,

    _<br>

        ; CHECK-NOT: INLINEASM $"bndcu $0, %bnd0", 8, 196654, _, 8,

    %rax, @x + 8, _<br>

        4: MOV32mi _, 8, killed %rax, @x + 4, _, 0<br>

    <br>

    The pass verifies that the only difference between the memory

    operands in instructions 1 and 3 is that they use a different offset

    from the global variable, so they can be combined.  The pass also

    tracks register definitions, so it would know not to combine the

    checks in this example if there had been an instruction that

    redefined %rax between instructions 1 and 3.<br>

    <br>

    On the other hand, some of the optimizations described in the next

    couple of paragraphs may be optimistic, so I especially welcome

    feedback on them:<br>

    <br>

    ...<br>

    <br>

    <blockquote

cite="mid:CAN=P9pjH4A0GdAT_7pd8YjUV+9T+XZMyEbxjQkQ3GW1qebwAFw@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <blockquote class="gmail_quote" style="margin:0px 0px 0px

              0.8ex;border-left:1px solid

              rgb(204,204,204);padding-left:1ex">

              The pre-emit pass also erases checks for addresses that do

              not specify a base or index register as well as those that

              specify a RIP-relative offset with no index register.  I

              think that the source code would need to be quite

              malformed to corrupt safe stacks using such address types.<br>

            </blockquote>

          </div>

        </div>

      </div>

    </blockquote>

    ...<br>

    <blockquote

cite="mid:CAN=P9pjH4A0GdAT_7pd8YjUV+9T+XZMyEbxjQkQ3GW1qebwAFw@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <blockquote class="gmail_quote" style="margin:0px 0px 0px

              0.8ex;border-left:1px solid

              rgb(204,204,204);padding-left:1ex">

              The pre-emit pass also erases bound checks for accesses

              relative to a non-default segment, such as thread-local

              accesses relative to FS.  Linear addresses for

              thread-local accesses are computed with a non-zero segment

              base address, so it would be necessary to check

              thread-local effective addresses against a bounds register

              with an upper bound that is adjusted down to account for

              that rather than the bounds register that is used for

              checking other accesses.  However, negative offsets are

              sometimes used for thread-local accesses, which are

              treated as very large unsigned effective addresses. 

              Checking them would require them to first be added to the

              base of the thread-local storage segment.<br>

            </blockquote>

          </div>

        </div>

      </div>

    </blockquote>

    ...<br>

    <br>

    Thanks,<br>

    Michael<br>

    <br>

  </body>

</html>