<html>

  <head>

    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <p>Hi Ivan,</p>

    <p>Thanks for posting this; I'm excited by this proposal - if we can

      get this kind of support in without making the implementation

      non-trivially-harder to maintain, that would be a positive

      development. As Sean mentioned, I did something along these lines

      to adapt ASan to the IBM BG/Q - an HPC system that uses a

      lightweight operating system. On the BG/Q, the lightweight

      operating system does support virtual memory for some

      special-purpose mappings, but it does not support mapping

      unreserved pages (i.e. MAP_NORESERVE is not supported, and this

      functionality is not supported any other way). As a result, the

      mechanism that the sanitizers use to cover the complete address

      space using shadow memory - by mapping a large region of

      unreserved pages - won't work in this environment. Systems without

      virtual memory at all will obviously have the same problem: All

      shadow memory must be physically backed. I'll also mention that

      many normal Linux HPC environments are configured with overcommit

      turned off, and I believe that using the sanitizers in such

      environments would also currently not work.<br>

    </p>

    <p>Because all shadow memory must be physically backed, it must be

      allocated judicially, and the mapping process might need to be

      more complicated than a simple shift/offset. On the BG/Q, there

      were a few distinct regions of virtual memory that needed to be

      mapped into a single shadow region in the part of the address

      space where heap allocations could be made - as a result, I used a

      more-complicated mapping function.<br>

    </p>

    <p>In this light, I'm trying to understand your proposal. I see that

      you're proposing to add support for some kind of additional

      translation scheme between virtual addresses and physical

      addresses, but I'm not exactly sure how you propose to use them.

      It might help if you were to provide some hypothetical

      implementation of these translations for a simple system so that

      we can understand the usage model better. I'd also like to better

      understand how the instrumentation works; if the mapping always

      replaced by these __asan_mem_to_vshadow/__asan_mem_to_pshadow

      calls?</p>

    <p>Finally, I recommend that we layer this support so that we have:</p>

    <p>[regular system] -> [system without (sufficient) unreserved

      pages] -> [system without any mmu]</p>

    <p>I'd like a clear explanation of how these last two differ. It

      looks like you have support for manually zeroing pages for the

      last category. Please explain exactly how this scheme works.</p>

    <p>Thanks,</p>

    <p>Hal<br>

    </p>

    <br>

    <div class="moz-cite-prefix">On 02/23/2017 12:16 PM, Ivan A. Kosarev

      via llvm-dev wrote:<br>

    </div>

    <blockquote

      cite="mid:153b6433-b001-b0fb-6957-1de8c598ce77@accesssoftek.com"

      type="cite">RFC: Generalize means the sanitizers work with memory

      <br>

      <br>

      Overview

      <br>

      ========

      <br>

      <br>

      Currently, LLVM sanitizers, such as Asan and Tsan, are tied to a

      specific

      <br>

      memory model that relies on presence of hardware support for

      virtual memory.

      <br>

      This prevents sanitizers from being used on platforms that lack

      such support,

      <br>

      but otherwise are capable of running sanitized programs. Our

      research

      <br>

      indicates that adding support for such platforms is possible with

      a relatively

      <br>

      small amount of changes to the sanitizers source code and zero

      performance and

      <br>

      size penalty on currently supported systems. We also found that

      these changes

      <br>

      clarify and formalize the functional and performance dependencies

      between

      <br>

      sanitizers and system memory so they can be considered an

      improvement in

      <br>

      terms of design and readability regardless of the added

      capabilities. One can

      <br>

      think of it as a zero-cost abstraction layer.

      <br>

      <br>

      <br>

      The Approach

      <br>

      ============

      <br>

      <br>

      To support platforms that do not have hardware virtual memory

      managers,

      <br>

      we need to introduce the concept of physical memory pages that

      work as the

      <br>

      storage for data that sanitizers currently read and write by

      virtual

      <br>

      addresses. In presence of the concept of physical memory, every

      time we access

      <br>

      virtual memory we have to translate the given virtual address to a

      physical

      <br>

      one. For example, this check:

      <br>

      <br>

         *(u8 *)MEM_TO_SHADOW(allocated) == 0

      <br>

      <br>

      becomes:

      <br>

      <br>

         *MEM_TO_PSHADOW(allocated) == 0

      <br>

      <br>

      where the MEM_TO_PSHADOW(mem) macro is defined as:

      <br>

      <br>

         #define MEM_TO_PSHADOW(mem)

      VSHADOW_TO_PSHADOW(MEM_TO_VSHADOW(mem))

      <br>

         #define MEM_TO_VSHADOW(mem) /* Whatever currently

      MEM_TO_SHADOW() is. */

      <br>

      <br>

      The VSHADOW_TO_PSHADOW(vs) macro returns a pointer to a byte

      within a

      <br>

      physical page that corresponds to the given virtual address and

      allocates this

      <br>

      page if it has not been allocated before. On platforms that

      leverage hardware

      <br>

      virtual memory managers this macro returns the virtual address as

      a physical

      <br>

      one:

      <br>

      <br>

         #define VSHADOW_TO_PSHADOW(vs)

      (reinterpret_cast<u8*>((vs)))

      <br>

      <br>

      Physical pages are required to be aligned by their size. The size

      of physical

      <br>

      pages is a multiple of the shadow memory granularity (8 bytes for

      Asan) and

      <br>

      not less than the size of the widest scalar access we have to

      support (16

      <br>

      bytes). This makes trivial finding page offsets, which we need to

      implement

      <br>

      RTL functions efficiently. This also simplifies handling of

      aligned accesses

      <br>

      to physical memory as they are known to not cross bounds of

      physical pages.

      <br>

      Note that RTL functions have to be fixed to not rely on specific

      size,

      <br>

      location or order of physical pages.

      <br>

      <br>

      In addition to the facilities that allow handling of individual

      accesses to

      <br>

      the virtual memory we also need a set of functions that

      efficiently perform

      <br>

      operations on specified ranges of virtual addresses:

      <br>

      <br>

      // Fills a virtual memory with a given value. May release zeroed

      pages. For

      <br>

      // DFsan we may need a version of this function that takes 16-bit

      values to

      <br>

      // fill with.

      <br>

      void vshadow_memset(uptr vs, u8 value, uptr size);

      <br>

      <br>

      // Similarly to vshadow_memset(), this function fills a range of

      virtual

      <br>

      // memory with a given value and additionally claims that range as

      read-only

      <br>

      // so the memory manager is not required to support modifying

      accesses for

      <br>

      // these addresses.

      <br>

      void fill_rodata_vshadow(uptr vs, u8 value, uptr size);

      <br>

      <br>

      // Copies potentially overlapping memory regions.

      <br>

      void vshadow_memmove(uptr dest, uptr src, uptr size);

      <br>

      <br>

      // Returns the virtual address of the first non-zero byte in a

      given virtual

      <br>

      // address range. Can also be used to test for zeroed regions.

      <br>

      uptr find_non_zero_vshadow_byte(uptr vs, uptr size);

      <br>

      <br>

      // Explicitly releases pages that fit the specified range.

      <br>

      void release_vshadow(uptr vs, uptr size);

      <br>

      <br>

      <br>

      The Proof-of-Concept Patch

      <br>

      ==========================

      <br>

      <br>

      To make sure the approach is feasible we have prepared a patch

      that

      <br>

      fixes the Asan and Tsan RTL and instrumentation parts to translate

      virtual

      <br>

      shadow memory addresses to physical ones and mmap() shadow memory

      as we access

      <br>

      it. This way we simulate a software virtual memory manager that

      allocates

      <br>

      physical storage for shadow memory on-demand.

      <br>

      <br>

      We used that to mock RTL for the sanitizers tests. With this mock

      in place we

      <br>

      pass all Tsan tests and fail on 3 of 610 Asan tests:

      <br>

      <br>

      test/asan/TestCases/Linux/cuda_test.cc

      <br>

      test/asan/TestCases/Linux/nohugepage_test.cc

      <br>

      test/asan/TestCases/Linux/swapcontext_annotation.cc

      <br>

      <br>

      The first two tests rely on specific memory map after

      initializtion of the

      <br>

      shadow memory and the latter takes too long to complete. It would

      probably be

      <br>

      acceptable to XFAIL them when run with a software memory manager

      enabled and

      <br>

      then consider ways to adopt them as necessary on a per-test basis.

      <br>

      <br>

      * * *

      <br>

      <br>

      With this paper we propose the changes that make it possible to

      use sanitizers

      <br>

      on plaforms that have no MMUs to be part of the mainline. However,

      before

      <br>

      moving further we would like some feedback from the community so

      comments are

      <br>

      very appreciated.

      <br>

      <br>

      If the approach is fine, we will prepare a set of patches shortly.

      <br>

      <br>

      Thank you,

      <br>

      <br>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">_______________________________________________

LLVM Developers mailing list

<a class="moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>

<a class="moz-txt-link-freetext" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a>

</pre>

    </blockquote>

    <br>

    <pre class="moz-signature" cols="72">-- 

Hal Finkel

Lead, Compiler Technology and Programming Languages

Leadership Computing Facility

Argonne National Laboratory</pre>

  </body>

</html>