[llvm-dev] [MTE] Globals Tagging - Discussion

Fri Sep 18 12:51:28 PDT 2020

On Fri, Sep 18, 2020 at 12:18 PM Mitch Phillips via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> Hi David,
>
> Does the tagging of these hidden symbols only protect against RW
>> primitives without a similar ldg? If I knew the address of the hidden
>> symbol I could presumably use the same sequence, but I think I'm
>> stretching what memory tagging is supposed to protect against.
>
>
> I might be missing your point here - but don't forget that the local
> globals are always PC-relative direct loads/stores. The `ldg` sequence in
> that example can only be used to get `&g` (and nothing else). There
> shouldn't be any `ldg`'s of arbitrary addresses (unless an attacker already
> has control of the instruction pointer, which means they've already
> bypassed MTE).
>
> Does this mean that the value of array_end must have the same tag as
>> array[]. Then &array_end would have a different tag since it's a
>> different global?
>>
>
> Yes, exactly.
>
> For example you might assign tag 1 to array, then tag 2 to array_end.
>> Which means that array_end has a tag of 2 and so does array[16].
>> (assuming they're sequential)
>> |            array            | array_end/array[16] |
>> | < 1> <1> <1> <1>  |            <2>               |
>>
>
>
> So if we just did a RELATIVE relocation then array_end's value would
>> have a tag of 2, so you couldn't do:
>> for (int* ptr=array; ptr != array_end; ++ptr)
>> Since it's always != due to the tags.
>> Do I have that right?
>
>
>  Yep - you've got it right, this is why we need TAGGED_RELATIVE. For
> clarity, here's the memory layout where array_end is relocated using
> TAGGED_RELATIVE{*r_offset = &array[16], r_addend = &array[0]}:
> array array_end (padding)
> Memory Tag 0x1 0x1 0x2 0x2
> Value 0 0 0 0 (0x1 << 56) | &array[16] 0 0
> So the address tag of `array` and `array_end` are the same (only
> `&array_end` has an memory/address tag of 0x2), and thus `for (int*
> ptr=array; ptr != array_end; ++ptr)` works normally.
>
> Also, if you have this same example but the array got rounded up to
>> the nearest granule e.g. (4 byte ints, 16 byte granules)
>> int array[3]; // rounded up to be array[4]
>> int* array_end = array[3];
>> Would you emit a normal RELATIVE relocation for array_end, because
>> it's within the bounds of the rounded up array. Or a TAGGED_RELATIVE
>> relocation because it's out of bounds of the original size of the
>> array?
>> (I don't think doing the former is a problem but I'm not a linker expert)
>
>
> At this stage, this would generate a TAGGED_RELATIVE. We expect
> TAGGED_RELATIVE to be relatively scarce, and coming up with a more complex
> scheme for the linker to optimise this edge case where it's in bounds of
> the granule padding (but not the symbol itself) seems over-the-top. In
> saying that, it's a possibility for later revisions.
>

The plan calls to
> Realign to granule size (16 bytes), resize to multiple of granule size
(e.g. 40B -> 48B).
so this would never happen.

The symbols are resized in order to prevent smaller untagged symbols from
getting into the padding of the 16-byte aligned tagged ones.
I'm not sure if it's desirable to change the symbol size just for this
reason. The linker could always suppress such packing for STO_TAGGED
symbols.

In any case, since all sizes and alignments are known, the compiler should
be allowed to emit RELATIVE in the rounded-up array case.

>
> On Fri, Sep 18, 2020 at 4:10 AM David Spickett <david.spickett at linaro.org>
> wrote:
>
>> Hi Mitch,
>>
>> In the intro you say:
>> > It would also allow attackers with a semilinear RW primitive to
>> trivially attack global variables if the offset is controllable. Dynamic
>> global tags are required to provide the same MTE mitigation guarantees that
>> are afforded to stack and heap memory.
>>
>> Then later:
>> > b) Hidden Symbols (static int g; or -fvisibility=hidden)
>> > Materialization of hidden symbols now fetch and insert the memory tag
>> via. `ldg`. On aarch64, this means non PC-relative
>> loads/stores/address-taken (*g = 7;) generates:
>> >  adrp x0, g;
>> >  ldg x0, [x0, :lo12:g]; // new instruction
>> >  mov x1, #7;
>> >  str x1, [x0, :lo12:g];
>>
>> Does the tagging of these hidden symbols only protect against RW
>> primitives without a similar ldg? If I knew the address of the hidden
>> symbol I could presumably use the same sequence, but I think I'm
>> stretching what memory tagging is supposed to protect against. Mostly
>> wanted to check I understood.
>>
>> Speaking of understanding...
>>
>> > Introduce a TAGGED_RELATIVE relocation - in order to solve the problem
>> where the tag derivation shouldn't be from the relocation result, e.g.
>> > static int array[16] = {};
>> > // array_end must have the same tag as array[]. array_end is out of
>> > // bounds w.r.t. array, and may point to a completely different global.
>> > int *array_end = &array[16];
>>
>> Does this mean that the value of array_end must have the same tag as
>> array[]. Then &array_end would have a different tag since it's a
>> different global?
>>
>> For example you might assign tag 1 to array, then tag 2 to array_end.
>> Which means that array_end has a tag of 2 and so does array[16].
>> (assuming they're sequential)
>> |            array            | array_end/array[16] |
>> | < 1> <1> <1> <1>  |            <2>               |
>>
>> So if we just did a RELATIVE relocation then array_end's value would
>> have a tag of 2, so you couldn't do:
>> for (int* ptr=array; ptr != array_end; ++ptr)
>> Since it's always != due to the tags.
>>
>> Do I have that right?
>>
>> Also, if you have this same example but the array got rounded up to
>> the nearest granule e.g. (4 byte ints, 16 byte granules)
>> int array[3]; // rounded up to be array[4]
>> int* array_end = array[3];
>>
>> Would you emit a normal RELATIVE relocation for array_end, because
>> it's within the bounds of the rounded up array. Or a TAGGED_RELATIVE
>> relocation because it's out of bounds of the original size of the
>> array?
>> (I don't think doing the former is a problem but I'm not a linker expert)
>>
>> Thanks,
>> David Spickett.
>>
>> On Thu, 17 Sep 2020 at 23:05, Mitch Phillips via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>> >
>> > Hi folks,
>> >
>> >
>> > ARM v8.5 introduces the Memory Tagging Extension (MTE), a hardware that
>> allows for detection of memory safety bugs (buffer overflows,
>> use-after-free, etc) with low overhead. So far, MTE support is implemented
>> in the Scudo hardened allocator (compiler-rt/lib/scudo/standalone) for
>> heap, and stack allocation is implemented in LLVM/Clang behind
>> -fsanitize=memtag.
>> >
>> >
>> > As part of a holistic MTE implementation, global memory should also be
>> properly tagged. HWASan (a software-only implementation of MTE) has a
>> schema that uses static tags, however these can be trivially determined by
>> an attacker with access to the ELF file. This would allow attackers with
>> arbitrary read/write to trivially attack global variables. It would also
>> allow attackers with a semilinear RW primitive to trivially attack global
>> variables if the offset is controllable. Dynamic global tags are required
>> to provide the same MTE mitigation guarantees that are afforded to stack
>> and heap memory.
>> >
>> >
>> > We've got a plan in mind about how to do MTE globals with fully dynamic
>> tags, but we'd love to get feedback from the community. In particular -
>> we'd like to try and align implementation details with GCC as the scheme
>> requires cooperation from the compiler, linker, and loader.
>> >
>> >
>> > Our current ideas are outlined below. All the compiler features
>> (including realignment, etc.) would be guarded behind -fsanitize=memtag.
>> Protection of read-only globals would be enabled-by-default, but can be
>> disabled at compile time behind a flag (likely
>> -f(no)sanitize-memtag-ro-globals).
>> >
>> >
>> > a) Dynamic symbols (int f; extern int f;)
>> >
>> > Mark all tagged global data symbols in the dynamic symbol table as
>> st_other.STO_TAGGED.
>> >
>> > Teach the loader to read the symbol table at load time (and dlopen())
>> prior to relocations, and apply random memory tags (via. `irg -> stg`) to
>> each STO_TAGGED carrying global.
>> >
>> > b) Hidden Symbols (static int g; or -fvisibility=hidden)
>> >
>> > Have the compiler mark hidden tagged globals in the symbol table as
>> st_other.STO_TAGGED.
>> >
>> > Have the linker read the symbol table and create a table of {
>> unrelocated virtual address, size } pairs for each STO_TAGGED carrying
>> hidden global, storing this in a new section (.mteglobtab).
>> >
>> > Create a new dynamic entry "DT_MTEGLOBTAB" that points to this segment,
>> along with "DT_MTEGLOBENT" for the size of each entry and "DT_MTEGLOBSZ"
>> for the size (in bytes) of the table.
>> >
>> > Similar to dynamic symbols, teach the loader to read this table and
>> apply random memory tags to each global prior to relocations.
>> >
>> > Materialization of hidden symbols now fetch and insert the memory tag
>> via. `ldg`. On aarch64, this means non PC-relative
>> loads/stores/address-taken (*g = 7;) generates:
>> >   adrp x0, g;
>> >   ldg x0, [x0, :lo12:g]; // new instruction
>> >   mov x1, #7;
>> >   str x1, [x0, :lo12:g];
>> >
>> > Note that this materialization sequence means that executables built
>> with MTE globals are not able to run on non-MTE hardware.
>> >
>> > Note: Some dynamic symbols can be transformed at link time into hidden
>> symbols if:
>> >
>> > The symbol is in an object file that is statically linked into an
>> executable and is not referenced in any shared libraries, or
>> >
>> > The symbol has its visibility changed with a version script.
>> >
>> > These globals always have their addresses derived from a GOT entry, and
>> thus have their address tag materialized through the RELATIVE relocation of
>> the GOT entry. Due to the lack of dynamic symbol table entry however, the
>> memory would go untagged. The linker must ensure it creates an MTEGLOBTAB
>> entry for all hidden MTE-globals, including those that are transformed from
>> external to hidden. DSO's linked with -Bsymbolic retain their dynamic
>> symbol table entries, and thus require no special handling.
>> >
>> >
>> > c) All symbols
>> >
>> > Realign to granule size (16 bytes), resize to multiple of granule size
>> (e.g. 40B -> 48B).
>> >
>> > Ban data folding (except where contents and size are same, no tail
>> merging).
>> >
>> > In the loader, ensure writable segments (and possibly .rodata, see next
>> dot point) are mapped MAP_ANONYMOUS and PROT_MTE (with the contents of the
>> mappings filled from the file), as file-based mappings aren't necessarily
>> backed by tag-capable memory. It also requires in-place remapping of data
>> segments from the program image (as they're already mapped by the kernel
>> before PT_INTERP invokes the loader).
>> >
>> > Make .rodata protection optional. When read-only protection is in use,
>> the .rodata section should be moved into a separate segment. For Bionic
>> libc, the rodata section takes up 20% of its ALLOC | READ segment, and we'd
>> like to be able to maintain page sharing for the remaining 189KiB of other
>> read-only data in this segment.
>> >
>> > d) Relocations
>> >
>> > GLOB_DAT, ABS64, and RELATIVE relocations change semantics - they would
>> be required to retrieve and insert the memory tag of the symbol into the
>> relocated value. For example, the ABS64 relocation becomes:
>> >   sym_addr = get_symbol_address() // sym_addr = 0x1008
>> >   sym_addr |= get_tag(sym_addr & 0xf) // get_tag(0x1008 & 0xf == 0x1000)
>> >   *r_offset = sym_addr + r_addend;
>> >
>> > Introduce a TAGGED_RELATIVE relocation - in order to solve the problem
>> where the tag derivation shouldn't be from the relocation result, e.g.
>> > static int array[16] = {};
>> > // array_end must have the same tag as array[]. array_end is out of
>> > // bounds w.r.t. array, and may point to a completely different global.
>> > int *array_end = &array[16];
>> >
>> > TAGGED_RELATIVE stores the untagged symbol value in the place
>> (*r_offset == &array[16]), and keeps the address where the tag should be
>> derived in the addend (RELA-only r_addend == &array[0]).
>> >
>> > For derived symbols where the granule-aligned address is in-bounds of
>> the tag (e.g. array_end = &array[7] implies the tag can be derived from
>> (&array[0] & 0xf)), we can use a normal RELATIVE relocation.
>> >
>> > The TAGGED_RELATIVE operand looks like:
>> >   *r_offset |= get_tag(r_addend & ~0xf);
>> >
>> > ABS64, RELATIVE, and TAGGED_RELATIVE relocations need a slight tweak to
>> grab the place's memory tag before use, as the place itself may be tagged.
>> So, for example, the TAGGED_RELATIVE operation above actually becomes:
>> >   r_offset = ldg(r_offset);
>> >   *r_offset |= get_tag(r_addend & ~0xf);
>> >
>> > Introduce an R_AARCH64_LDG_LO9_SHORT_NC relocation for relocating the
>> 9-bit immediate for the LDG instruction. This isn't MTE-globals specific,
>> we just seem to be missing the relocation to encode the 9-bit immediate for
>> LDG at bits [12..20]. This would save us an additional ADD instruction in
>> the inline-LDG sequence for hidden symbols.
>> >
>> > We considered a few other schemes, including:
>> >
>> > Creating a dynamic symbol table entry for all hidden globals and giving
>> them the same st_other.STO_TAGGED treatment. These entries would not
>> require symbol names, but Elf(Sym) entries are 24 bytes (in comparison to 8
>> bytes for the MTEGLOBTAB schema under the small code model). For an AOSP
>> build, using dynamic symbol entries instead of MTEGLOBTAB results in a
>> 2.3MiB code size increase across all DSO's.
>> >
>> > Making all hidden symbol accesses go through a local-GOT. Requires an
>> extra indirection for all local symbols - resulting in increased cache
>> pressure (and thus decreased performance) over a simple `ldg` of the tag
>> (as the dcache and tag-cache are going to be warmed anyway for the
>> load/store). Unlike the MTEGLOBTAG scheme however, this scheme is backwards
>> compatible, allowing MTE-globals built binaries to run on old ARM64
>> hardware (as no incompatible instructions are emitted), the same as heap
>> tagging. Stack tagging requires a new ABI - and we expect the MTE globals
>> scheme to be enabled in partnership with stack tagging, thus we are
>> unconcerned about the ABI requirement for the MTEGLOBTAG scheme.
>> >
>> >
>> > Please let us know any feedback you have. We're currently working on an
>> experimental version and will update with any more details as they arise.
>> >
>> >
>> > Thanks,
>> >
>> > Mitch.
>> >
>> >
>> > _______________________________________________
>> > LLVM Developers mailing list
>> > llvm-dev at lists.llvm.org
>> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200918/d64375d4/attachment.html>