[llvm-dev] [MTE] Globals Tagging - Discussion

Thu Sep 17 15:05:18 PDT 2020

Hi folks,

ARM v8.5 introduces the Memory Tagging Extension (MTE), a hardware that
allows for detection of memory safety bugs (buffer overflows,
use-after-free, etc) with low overhead. So far, MTE support is implemented
in the Scudo hardened allocator (compiler-rt/lib/scudo/standalone) for
heap, and stack allocation is implemented in LLVM/Clang behind
-fsanitize=memtag <https://llvm.org/docs/MemTagSanitizer.html>.

As part of a holistic MTE implementation, global memory should also be
properly tagged. HWASan
<http://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html> (a
software-only implementation of MTE) has a schema that uses static tags,
however these can be trivially determined by an attacker with access to the
ELF file. This would allow attackers with arbitrary read/write to trivially
attack global variables. It would also allow attackers with a semilinear RW
primitive to trivially attack global variables if the offset is
controllable. Dynamic global tags are required to provide the same MTE
mitigation guarantees that are afforded to stack and heap memory.

We've got a plan in mind about how to do MTE globals with fully dynamic
tags, but we'd love to get feedback from the community. In particular -
we'd like to try and align implementation details with GCC as the scheme
requires cooperation from the compiler, linker, and loader.

Our current ideas are outlined below. All the compiler features (including
realignment, etc.) would be guarded behind -fsanitize=memtag. Protection of
read-only globals would be enabled-by-default, but can be disabled at
compile time behind a flag (likely -f(no)sanitize-memtag-ro-globals).

a) Dynamic symbols (int f; extern int f;)

   1.

   Mark all tagged global data symbols in the dynamic symbol table as
   st_other.STO_TAGGED.
   2.

   Teach the loader to read the symbol table at load time (and dlopen())
   prior to relocations, and apply random memory tags (via. `irg -> stg`) to
   each STO_TAGGED carrying global.

b) Hidden Symbols (static int g; or -fvisibility=hidden)

   1.

   Have the compiler mark hidden tagged globals in the symbol table as
   st_other.STO_TAGGED.
   2.

   Have the linker read the symbol table and create a table of {
   unrelocated virtual address, size } pairs for each STO_TAGGED carrying
   hidden global, storing this in a new section (.mteglobtab).
   3.

   Create a new dynamic entry "DT_MTEGLOBTAB" that points to this segment,
   along with "DT_MTEGLOBENT" for the size of each entry and "DT_MTEGLOBSZ"
   for the size (in bytes) of the table.
   4.

   Similar to dynamic symbols, teach the loader to read this table and
   apply random memory tags to each global prior to relocations.
   5.

   Materialization of hidden symbols now fetch and insert the memory tag
   via. `ldg`. On aarch64, this means non PC-relative
   loads/stores/address-taken (*g = 7;) generates:
     adrp x0, g;
     ldg x0, [x0, :lo12:g]; // new instruction
     mov x1, #7;
     str x1, [x0, :lo12:g];

   Note that this materialization sequence means that executables built
   with MTE globals are not able to run on non-MTE hardware.

Note: Some dynamic symbols can be transformed at link time into hidden
symbols if:

   1.

   The symbol is in an object file that is statically linked into an
   executable and is not referenced in any shared libraries, or
   2.

   The symbol has its visibility changed with a version script.

These globals always have their addresses derived from a GOT entry, and
thus have their address tag materialized through the RELATIVE relocation of
the GOT entry. Due to the lack of dynamic symbol table entry however, the
memory would go untagged. The linker must ensure it creates an MTEGLOBTAB
entry for all hidden MTE-globals, including those that are transformed from
external to hidden. DSO's linked with -Bsymbolic retain their dynamic
symbol table entries, and thus require no special handling.

c) All symbols

   1.

   Realign to granule size (16 bytes), resize to multiple of granule size
   (e.g. 40B -> 48B).
   2.

   Ban data folding (except where contents and size are same, no tail
   merging).
   3.

   In the loader, ensure writable segments (and possibly .rodata, see next
   dot point) are mapped MAP_ANONYMOUS and PROT_MTE (with the contents of the
   mappings filled from the file), as file-based mappings aren't necessarily
   backed by tag-capable memory. It also requires in-place remapping of data
   segments from the program image (as they're already mapped by the kernel
   before PT_INTERP invokes the loader).
   4.

   Make .rodata protection optional. When read-only protection is in use,
   the .rodata section should be moved into a separate segment. For Bionic
   libc, the rodata section takes up 20% of its ALLOC | READ segment, and we'd
   like to be able to maintain page sharing for the remaining 189KiB of other
   read-only data in this segment.

d) Relocations

   1.

   GLOB_DAT, ABS64, and RELATIVE relocations change semantics - they would
   be required to retrieve and insert the memory tag of the symbol into the
   relocated value. For example, the ABS64 relocation becomes:
     sym_addr = get_symbol_address() // sym_addr = 0x1008
     sym_addr |= get_tag(sym_addr & 0xf) // get_tag(0x1008 & 0xf == 0x1000)
     *r_offset = sym_addr + r_addend;
   2.

   Introduce a TAGGED_RELATIVE relocation - in order to solve the problem
   where the tag derivation shouldn't be from the relocation result, e.g.
   static int array[16] = {};
   // array_end must have the same tag as array[]. array_end is out of
   // bounds w.r.t. array, and may point to a completely different global.
   int *array_end = &array[16];

   TAGGED_RELATIVE stores the untagged symbol value in the place (*r_offset
   == &array[16]), and keeps the address where the tag should be derived in
   the addend (RELA-only r_addend == &array[0]).

   For derived symbols where the granule-aligned address is in-bounds of
   the tag (e.g. array_end = &array[7] implies the tag can be derived
from (&array[0]
   & 0xf)), we can use a normal RELATIVE relocation.

   The TAGGED_RELATIVE operand looks like:
     *r_offset |= get_tag(r_addend & ~0xf);
   3.

   ABS64, RELATIVE, and TAGGED_RELATIVE relocations need a slight tweak to
   grab the place's memory tag before use, as the place itself may be tagged.
   So, for example, the TAGGED_RELATIVE operation above actually becomes:
     r_offset = ldg(r_offset);
     *r_offset |= get_tag(r_addend & ~0xf);
   4.

   Introduce an R_AARCH64_LDG_LO9_SHORT_NC relocation for relocating the
   9-bit immediate for the LDG instruction. This isn't MTE-globals specific,
   we just seem to be missing the relocation to encode the 9-bit immediate for
   LDG at bits [12..20]. This would save us an additional ADD instruction in
   the inline-LDG sequence for hidden symbols.

We considered a few other schemes, including:

   1.

   Creating a dynamic symbol table entry for all hidden globals and giving
   them the same st_other.STO_TAGGED treatment. These entries would not
   require symbol names, but Elf(Sym) entries are 24 bytes (in comparison to 8
   bytes for the MTEGLOBTAB schema under the small code model). For an AOSP
   build, using dynamic symbol entries instead of MTEGLOBTAB results in a
   2.3MiB code size increase across all DSO's.
   2.

   Making all hidden symbol accesses go through a local-GOT. Requires an
   extra indirection for all local symbols - resulting in increased cache
   pressure (and thus decreased performance) over a simple `ldg` of the tag
   (as the dcache and tag-cache are going to be warmed anyway for the
   load/store). Unlike the MTEGLOBTAG scheme however, this scheme is backwards
   compatible, allowing MTE-globals built binaries to run on old ARM64
   hardware (as no incompatible instructions are emitted), the same as heap
   tagging. Stack tagging requires a new ABI - and we expect the MTE globals
   scheme to be enabled in partnership with stack tagging, thus we are
   unconcerned about the ABI requirement for the MTEGLOBTAG scheme.

Please let us know any feedback you have. We're currently working on an
experimental version and will update with any more details as they arise.

Thanks,

Mitch.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200917/0e9dce3c/attachment-0001.html>