<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    On 1/24/12 3:36 PM, Kostya Serebryany wrote:

    <blockquote

cite="mid:CAN=P9pgD71xxc547kRaQGE6jDLZ17rn3YCbAQf3=Xe82vwxKPA@mail.gmail.com"

      type="cite">

      <meta http-equiv="Content-Type" content="text/html;

        charset=ISO-8859-1">

      <br>

      <br>

      <div class="gmail_quote">On Tue, Jan 24, 2012 at 1:08 PM, John

        Criswell <span dir="ltr"><<a moz-do-not-send="true"

            href="mailto:criswell@illinois.edu">criswell@illinois.edu</a>></span>

        wrote:<br>

        <blockquote class="gmail_quote" style="margin:0 0 0

          .8ex;border-left:1px #ccc solid;padding-left:1ex">

          <div class="im">On 1/24/12 2:31 PM, Duncan Sands wrote:<br>

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex">

              Hi Kostya,<br>

              <br>

              <blockquote class="gmail_quote" style="margin:0 0 0

                .8ex;border-left:1px #ccc solid;padding-left:1ex">    

                As far as I can see the C and C++ standards are not

                relevant.  ASAN works on<br>

                    LLVM IR, not on C or C++.  Lots of different

                languages have LLVM frontends.  I<br>

                    personally turn Ada and Fortran into LLVM IR all the

                time for example.  Clearly<br>

                    the C standard is not relevant to LLVM IR coming

                from such languages.  What<br>

                    matters is how LLVM IR is defined.  As far as I know

                this construct is perfectly<br>

                    valid in LLVM IR.<br>

              </blockquote>

            </blockquote>

            <br>

            <br>

          </div>

          The issue here is that a load that reads data past the end of

          an alloca can occur at the LLVM IR level in one of three ways:<br>

          <br>

          1) Because the program at the original source-code level does

          it and is incorrect.<br>

          2) Because the program at the original source-code level does

          it and is correct (although that must be a pretty wacky

          language).<br>

          3) Load-widening introduces it when processing loads from

          allocas that are properly aligned.<br>

          <br>

          As it is today, an analysis cannot look at the LLVM IR and

          know which condition is causing the load to read data past the

          end of the memory object.  As such, tools like SAFECode and

          ASAN don't know when to relax their run-time checks to permit

          such out-of-bounds reading; they either have to relax it for

          all such loads (in which case a bug in the C source code might

          slip through), or they have to report it all the time (and

          report false positives for correct C programs).<br>

          <br>

          I assume Kostya's new attribute is a way to permit the LLVM IR

          to specify whether such an out-of-bounds read is intentional

          or not.<br>

          <br>

          In my opinion, I don't think we should bother with an

          attribute.  Load-widening's behavior does not introduce

          exploitable code into the program on commonly-used machines

          and operating systems(*), and incorrect source code at the C

          source level that exhibits identical behavior isn't

          exploitable, either.  </blockquote>

        <div><br>

        </div>

        <blockquote class="gmail_quote" style="margin:0 0 0

          .8ex;border-left:1px #ccc solid;padding-left:1ex">SAFECode can

          be enhanced so that the run-time checks for loads relax their

          guarantees for aligned allocas that are subject to

          load-widening; I imagine ASAN can be similarly modified.<br>

        </blockquote>

        <div> </div>

        <div>ASAN *can* be modified this way (it will actually make

          instrumentation ~10% cheaper). </div>

        <div>But this mode will miss some bugs that the current mode

          finds. </div>

        <div>I've seen at least a couple of such *real* bugs.</div>

      </div>

    </blockquote>

    <br>

    Yes, I understand.  My question is how many such bugs have you seen

    that involve loads *and* allocas aligned in such a way that the

    load-widening optimization triggers.<br>

    <br>

    <blockquote

cite="mid:CAN=P9pgD71xxc547kRaQGE6jDLZ17rn3YCbAQf3=Xe82vwxKPA@mail.gmail.com"

      type="cite">

      <div class="gmail_quote">

        <div><br>

        </div>

        <div>And these bugs are not only about exploitability, but also

          about correctness. </div>

        <div>If a program reads garbage, there is no simple way to

          statically prove that this garbage does not affect the

          program's behavior. <br>

        </div>

      </div>

    </blockquote>

    <br>

    Hrm.  Actually, by relaxing the safety guarantees, SAFECode and ASAN

    may fail to detect exploitable behavior in the original program, so

    I take back my original comment.  That said, it's a pretty obscure

    attack, so it's pretty low on my list of things to worry about.<br>

    <br>

    For me, the right way to go (barring a change in opinion from Chris)

    is to either disable the load-widening transform, transform the

    allocas to be larger, or to relax the safety guarantees.  The

    problem with attributes is that they are brittle; you have to make

    sure they get added to the right instructions, then you have to make

    sure they don't get removed by optimizations.<br>

    <br>

    For SAFECode, I'm alright with transforms that "force" a program to

    have memory safe behavior even if they do not report a bug (such as

    boosting the allocation size of allocas subject to load-widening). 

    ASAN may not be willing to do that (and understandably so).  I'm not

    sure what to suggest.<br>

    <br>

    -- John T.<br>

    <br>

    <blockquote

cite="mid:CAN=P9pgD71xxc547kRaQGE6jDLZ17rn3YCbAQf3=Xe82vwxKPA@mail.gmail.com"

      type="cite">

      <div class="gmail_quote">

        <div><br>

        </div>

        <div>--kcc </div>

        <div><br>

        </div>

        <div> </div>

        <blockquote class="gmail_quote" style="margin:0 0 0

          .8ex;border-left:1px #ccc solid;padding-left:1ex">

          <br>

          We won't catch some bugs in C/C++ code, but that's a natural

          consequence of deciding to permit certain out-of-bounds loads

          at the LLVM IR level, IMHO.<br>

          <br>

          My two cents.<br>

          <br>

          -- John T.<br>

          <br>

          (*) All bets are off for unconventional systems, though.<br>

          <br>

          <br>

          <blockquote class="gmail_quote" style="margin:0 0 0

            .8ex;border-left:1px #ccc solid;padding-left:1ex">

            <div class="im">

              <blockquote class="gmail_quote" style="margin:0 0 0

                .8ex;border-left:1px #ccc solid;padding-left:1ex">

                <br>

                <br>

                Asan will not work for Fortran and Ada anyway (at least,

                out of the box).<br>

                I am not even sure that anything like asan is needed for

                Ada (it has bounds<br>

                checking built-in, the dynamic memory allocation is much

                more restrictive).<br>

                The tool is rather specific to C/C++ (and ObjectiveC

                probably, although we have<br>

                almost no tests for ObjectiveC, nor much knowledge in

                it).<br>

                Yes, the IR transformations are done on the LLVM level,

                but the asan run-time<br>

                library heavily depends on the C/C++ semantics and even

                implementation,<br>

                and you can't really separate the asan instrumentation

                pass from the run-time.<br>

              </blockquote>

              it's pretty disappointing to hear that asan is basically

              just for C.  But since<br>

              it is, I won't bother you anymore about this attribute

              (though I still don't<br>

              like it much).<br>

              <br>

              Ciao, Duncan.<br>

            </div>

            <div class="im">

              _______________________________________________<br>

              LLVM Developers mailing list<br>

              <a moz-do-not-send="true"

                href="mailto:LLVMdev@cs.uiuc.edu" target="_blank">LLVMdev@cs.uiuc.edu</a>

                      <a moz-do-not-send="true"

                href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>

              <a moz-do-not-send="true"

                href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev"

                target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>

            </div>

          </blockquote>

          <br>

        </blockquote>

      </div>

      <br>

    </blockquote>

    <br>

  </body>

</html>