[llvm-dev] RFC: Generalize means the sanitizers work with memory

Thu Feb 23 10:16:08 PST 2017

RFC: Generalize means the sanitizers work with memory

Overview
========

Currently, LLVM sanitizers, such as Asan and Tsan, are tied to a specific
memory model that relies on presence of hardware support for virtual memory.
This prevents sanitizers from being used on platforms that lack such 
support,
but otherwise are capable of running sanitized programs. Our research
indicates that adding support for such platforms is possible with a 
relatively
small amount of changes to the sanitizers source code and zero 
performance and
size penalty on currently supported systems. We also found that these 
changes
clarify and formalize the functional and performance dependencies between
sanitizers and system memory so they can be considered an improvement in
terms of design and readability regardless of the added capabilities. 
One can
think of it as a zero-cost abstraction layer.

The Approach
============

To support platforms that do not have hardware virtual memory managers,
we need to introduce the concept of physical memory pages that work as the
storage for data that sanitizers currently read and write by virtual
addresses. In presence of the concept of physical memory, every time we 
access
virtual memory we have to translate the given virtual address to a physical
one. For example, this check:

    *(u8 *)MEM_TO_SHADOW(allocated) == 0

becomes:

    *MEM_TO_PSHADOW(allocated) == 0

where the MEM_TO_PSHADOW(mem) macro is defined as:

    #define MEM_TO_PSHADOW(mem) VSHADOW_TO_PSHADOW(MEM_TO_VSHADOW(mem))
    #define MEM_TO_VSHADOW(mem) /* Whatever currently MEM_TO_SHADOW() is. */

The VSHADOW_TO_PSHADOW(vs) macro returns a pointer to a byte within a
physical page that corresponds to the given virtual address and 
allocates this
page if it has not been allocated before. On platforms that leverage 
hardware
virtual memory managers this macro returns the virtual address as a physical
one:

    #define VSHADOW_TO_PSHADOW(vs) (reinterpret_cast<u8*>((vs)))

Physical pages are required to be aligned by their size. The size of 
physical
pages is a multiple of the shadow memory granularity (8 bytes for Asan) and
not less than the size of the widest scalar access we have to support (16
bytes). This makes trivial finding page offsets, which we need to implement
RTL functions efficiently. This also simplifies handling of aligned accesses
to physical memory as they are known to not cross bounds of physical pages.
Note that RTL functions have to be fixed to not rely on specific size,
location or order of physical pages.

In addition to the facilities that allow handling of individual accesses to
the virtual memory we also need a set of functions that efficiently perform
operations on specified ranges of virtual addresses:

// Fills a virtual memory with a given value. May release zeroed pages. For
// DFsan we may need a version of this function that takes 16-bit values to
// fill with.
void vshadow_memset(uptr vs, u8 value, uptr size);

// Similarly to vshadow_memset(), this function fills a range of virtual
// memory with a given value and additionally claims that range as read-only
// so the memory manager is not required to support modifying accesses for
// these addresses.
void fill_rodata_vshadow(uptr vs, u8 value, uptr size);

// Copies potentially overlapping memory regions.
void vshadow_memmove(uptr dest, uptr src, uptr size);

// Returns the virtual address of the first non-zero byte in a given virtual
// address range. Can also be used to test for zeroed regions.
uptr find_non_zero_vshadow_byte(uptr vs, uptr size);

// Explicitly releases pages that fit the specified range.
void release_vshadow(uptr vs, uptr size);

The Proof-of-Concept Patch
==========================

To make sure the approach is feasible we have prepared a patch that
fixes the Asan and Tsan RTL and instrumentation parts to translate virtual
shadow memory addresses to physical ones and mmap() shadow memory as we 
access
it. This way we simulate a software virtual memory manager that allocates
physical storage for shadow memory on-demand.

We used that to mock RTL for the sanitizers tests. With this mock in 
place we
pass all Tsan tests and fail on 3 of 610 Asan tests:

test/asan/TestCases/Linux/cuda_test.cc
test/asan/TestCases/Linux/nohugepage_test.cc
test/asan/TestCases/Linux/swapcontext_annotation.cc

The first two tests rely on specific memory map after initializtion of the
shadow memory and the latter takes too long to complete. It would 
probably be
acceptable to XFAIL them when run with a software memory manager enabled and
then consider ways to adopt them as necessary on a per-test basis.

* * *

With this paper we propose the changes that make it possible to use 
sanitizers
on plaforms that have no MMUs to be part of the mainline. However, before
moving further we would like some feedback from the community so 
comments are
very appreciated.

If the approach is fine, we will prepare a set of patches shortly.

Thank you,

-- 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: sanitizers-instrumentation.diff
Type: text/x-patch
Size: 10554 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170223/bc9df223/attachment-0002.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sanitizers-rtl.diff
Type: text/x-patch
Size: 83463 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170223/bc9df223/attachment-0003.bin>