[llvm-dev] [MTE] Globals Tagging - Discussion

Mitch Phillips via llvm-dev llvm-dev at lists.llvm.org
Fri Sep 18 12:18:13 PDT 2020


Hi David,

Does the tagging of these hidden symbols only protect against RW
> primitives without a similar ldg? If I knew the address of the hidden
> symbol I could presumably use the same sequence, but I think I'm
> stretching what memory tagging is supposed to protect against.


I might be missing your point here - but don't forget that the local
globals are always PC-relative direct loads/stores. The `ldg` sequence in
that example can only be used to get `&g` (and nothing else). There
shouldn't be any `ldg`'s of arbitrary addresses (unless an attacker already
has control of the instruction pointer, which means they've already
bypassed MTE).

Does this mean that the value of array_end must have the same tag as
> array[]. Then &array_end would have a different tag since it's a
> different global?
>

Yes, exactly.

For example you might assign tag 1 to array, then tag 2 to array_end.
> Which means that array_end has a tag of 2 and so does array[16].
> (assuming they're sequential)
> |            array            | array_end/array[16] |
> | < 1> <1> <1> <1>  |            <2>               |
>


So if we just did a RELATIVE relocation then array_end's value would
> have a tag of 2, so you couldn't do:
> for (int* ptr=array; ptr != array_end; ++ptr)
> Since it's always != due to the tags.
> Do I have that right?


 Yep - you've got it right, this is why we need TAGGED_RELATIVE. For
clarity, here's the memory layout where array_end is relocated using
TAGGED_RELATIVE{*r_offset = &array[16], r_addend = &array[0]}:
array array_end (padding)
Memory Tag 0x1 0x1 0x2 0x2
Value 0 0 0 0 (0x1 << 56) | &array[16] 0 0
So the address tag of `array` and `array_end` are the same (only
`&array_end` has an memory/address tag of 0x2), and thus `for (int*
ptr=array; ptr != array_end; ++ptr)` works normally.

Also, if you have this same example but the array got rounded up to
> the nearest granule e.g. (4 byte ints, 16 byte granules)
> int array[3]; // rounded up to be array[4]
> int* array_end = array[3];
> Would you emit a normal RELATIVE relocation for array_end, because
> it's within the bounds of the rounded up array. Or a TAGGED_RELATIVE
> relocation because it's out of bounds of the original size of the
> array?
> (I don't think doing the former is a problem but I'm not a linker expert)


At this stage, this would generate a TAGGED_RELATIVE. We expect
TAGGED_RELATIVE to be relatively scarce, and coming up with a more complex
scheme for the linker to optimise this edge case where it's in bounds of
the granule padding (but not the symbol itself) seems over-the-top. In
saying that, it's a possibility for later revisions.

On Fri, Sep 18, 2020 at 4:10 AM David Spickett <david.spickett at linaro.org>
wrote:

> Hi Mitch,
>
> In the intro you say:
> > It would also allow attackers with a semilinear RW primitive to
> trivially attack global variables if the offset is controllable. Dynamic
> global tags are required to provide the same MTE mitigation guarantees that
> are afforded to stack and heap memory.
>
> Then later:
> > b) Hidden Symbols (static int g; or -fvisibility=hidden)
> > Materialization of hidden symbols now fetch and insert the memory tag
> via. `ldg`. On aarch64, this means non PC-relative
> loads/stores/address-taken (*g = 7;) generates:
> >  adrp x0, g;
> >  ldg x0, [x0, :lo12:g]; // new instruction
> >  mov x1, #7;
> >  str x1, [x0, :lo12:g];
>
> Does the tagging of these hidden symbols only protect against RW
> primitives without a similar ldg? If I knew the address of the hidden
> symbol I could presumably use the same sequence, but I think I'm
> stretching what memory tagging is supposed to protect against. Mostly
> wanted to check I understood.
>
> Speaking of understanding...
>
> > Introduce a TAGGED_RELATIVE relocation - in order to solve the problem
> where the tag derivation shouldn't be from the relocation result, e.g.
> > static int array[16] = {};
> > // array_end must have the same tag as array[]. array_end is out of
> > // bounds w.r.t. array, and may point to a completely different global.
> > int *array_end = &array[16];
>
> Does this mean that the value of array_end must have the same tag as
> array[]. Then &array_end would have a different tag since it's a
> different global?
>
> For example you might assign tag 1 to array, then tag 2 to array_end.
> Which means that array_end has a tag of 2 and so does array[16].
> (assuming they're sequential)
> |            array            | array_end/array[16] |
> | < 1> <1> <1> <1>  |            <2>               |
>
> So if we just did a RELATIVE relocation then array_end's value would
> have a tag of 2, so you couldn't do:
> for (int* ptr=array; ptr != array_end; ++ptr)
> Since it's always != due to the tags.
>
> Do I have that right?
>
> Also, if you have this same example but the array got rounded up to
> the nearest granule e.g. (4 byte ints, 16 byte granules)
> int array[3]; // rounded up to be array[4]
> int* array_end = array[3];
>
> Would you emit a normal RELATIVE relocation for array_end, because
> it's within the bounds of the rounded up array. Or a TAGGED_RELATIVE
> relocation because it's out of bounds of the original size of the
> array?
> (I don't think doing the former is a problem but I'm not a linker expert)
>
> Thanks,
> David Spickett.
>
> On Thu, 17 Sep 2020 at 23:05, Mitch Phillips via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
> >
> > Hi folks,
> >
> >
> > ARM v8.5 introduces the Memory Tagging Extension (MTE), a hardware that
> allows for detection of memory safety bugs (buffer overflows,
> use-after-free, etc) with low overhead. So far, MTE support is implemented
> in the Scudo hardened allocator (compiler-rt/lib/scudo/standalone) for
> heap, and stack allocation is implemented in LLVM/Clang behind
> -fsanitize=memtag.
> >
> >
> > As part of a holistic MTE implementation, global memory should also be
> properly tagged. HWASan (a software-only implementation of MTE) has a
> schema that uses static tags, however these can be trivially determined by
> an attacker with access to the ELF file. This would allow attackers with
> arbitrary read/write to trivially attack global variables. It would also
> allow attackers with a semilinear RW primitive to trivially attack global
> variables if the offset is controllable. Dynamic global tags are required
> to provide the same MTE mitigation guarantees that are afforded to stack
> and heap memory.
> >
> >
> > We've got a plan in mind about how to do MTE globals with fully dynamic
> tags, but we'd love to get feedback from the community. In particular -
> we'd like to try and align implementation details with GCC as the scheme
> requires cooperation from the compiler, linker, and loader.
> >
> >
> > Our current ideas are outlined below. All the compiler features
> (including realignment, etc.) would be guarded behind -fsanitize=memtag.
> Protection of read-only globals would be enabled-by-default, but can be
> disabled at compile time behind a flag (likely
> -f(no)sanitize-memtag-ro-globals).
> >
> >
> > a) Dynamic symbols (int f; extern int f;)
> >
> > Mark all tagged global data symbols in the dynamic symbol table as
> st_other.STO_TAGGED.
> >
> > Teach the loader to read the symbol table at load time (and dlopen())
> prior to relocations, and apply random memory tags (via. `irg -> stg`) to
> each STO_TAGGED carrying global.
> >
> > b) Hidden Symbols (static int g; or -fvisibility=hidden)
> >
> > Have the compiler mark hidden tagged globals in the symbol table as
> st_other.STO_TAGGED.
> >
> > Have the linker read the symbol table and create a table of {
> unrelocated virtual address, size } pairs for each STO_TAGGED carrying
> hidden global, storing this in a new section (.mteglobtab).
> >
> > Create a new dynamic entry "DT_MTEGLOBTAB" that points to this segment,
> along with "DT_MTEGLOBENT" for the size of each entry and "DT_MTEGLOBSZ"
> for the size (in bytes) of the table.
> >
> > Similar to dynamic symbols, teach the loader to read this table and
> apply random memory tags to each global prior to relocations.
> >
> > Materialization of hidden symbols now fetch and insert the memory tag
> via. `ldg`. On aarch64, this means non PC-relative
> loads/stores/address-taken (*g = 7;) generates:
> >   adrp x0, g;
> >   ldg x0, [x0, :lo12:g]; // new instruction
> >   mov x1, #7;
> >   str x1, [x0, :lo12:g];
> >
> > Note that this materialization sequence means that executables built
> with MTE globals are not able to run on non-MTE hardware.
> >
> > Note: Some dynamic symbols can be transformed at link time into hidden
> symbols if:
> >
> > The symbol is in an object file that is statically linked into an
> executable and is not referenced in any shared libraries, or
> >
> > The symbol has its visibility changed with a version script.
> >
> > These globals always have their addresses derived from a GOT entry, and
> thus have their address tag materialized through the RELATIVE relocation of
> the GOT entry. Due to the lack of dynamic symbol table entry however, the
> memory would go untagged. The linker must ensure it creates an MTEGLOBTAB
> entry for all hidden MTE-globals, including those that are transformed from
> external to hidden. DSO's linked with -Bsymbolic retain their dynamic
> symbol table entries, and thus require no special handling.
> >
> >
> > c) All symbols
> >
> > Realign to granule size (16 bytes), resize to multiple of granule size
> (e.g. 40B -> 48B).
> >
> > Ban data folding (except where contents and size are same, no tail
> merging).
> >
> > In the loader, ensure writable segments (and possibly .rodata, see next
> dot point) are mapped MAP_ANONYMOUS and PROT_MTE (with the contents of the
> mappings filled from the file), as file-based mappings aren't necessarily
> backed by tag-capable memory. It also requires in-place remapping of data
> segments from the program image (as they're already mapped by the kernel
> before PT_INTERP invokes the loader).
> >
> > Make .rodata protection optional. When read-only protection is in use,
> the .rodata section should be moved into a separate segment. For Bionic
> libc, the rodata section takes up 20% of its ALLOC | READ segment, and we'd
> like to be able to maintain page sharing for the remaining 189KiB of other
> read-only data in this segment.
> >
> > d) Relocations
> >
> > GLOB_DAT, ABS64, and RELATIVE relocations change semantics - they would
> be required to retrieve and insert the memory tag of the symbol into the
> relocated value. For example, the ABS64 relocation becomes:
> >   sym_addr = get_symbol_address() // sym_addr = 0x1008
> >   sym_addr |= get_tag(sym_addr & 0xf) // get_tag(0x1008 & 0xf == 0x1000)
> >   *r_offset = sym_addr + r_addend;
> >
> > Introduce a TAGGED_RELATIVE relocation - in order to solve the problem
> where the tag derivation shouldn't be from the relocation result, e.g.
> > static int array[16] = {};
> > // array_end must have the same tag as array[]. array_end is out of
> > // bounds w.r.t. array, and may point to a completely different global.
> > int *array_end = &array[16];
> >
> > TAGGED_RELATIVE stores the untagged symbol value in the place (*r_offset
> == &array[16]), and keeps the address where the tag should be derived in
> the addend (RELA-only r_addend == &array[0]).
> >
> > For derived symbols where the granule-aligned address is in-bounds of
> the tag (e.g. array_end = &array[7] implies the tag can be derived from
> (&array[0] & 0xf)), we can use a normal RELATIVE relocation.
> >
> > The TAGGED_RELATIVE operand looks like:
> >   *r_offset |= get_tag(r_addend & ~0xf);
> >
> > ABS64, RELATIVE, and TAGGED_RELATIVE relocations need a slight tweak to
> grab the place's memory tag before use, as the place itself may be tagged.
> So, for example, the TAGGED_RELATIVE operation above actually becomes:
> >   r_offset = ldg(r_offset);
> >   *r_offset |= get_tag(r_addend & ~0xf);
> >
> > Introduce an R_AARCH64_LDG_LO9_SHORT_NC relocation for relocating the
> 9-bit immediate for the LDG instruction. This isn't MTE-globals specific,
> we just seem to be missing the relocation to encode the 9-bit immediate for
> LDG at bits [12..20]. This would save us an additional ADD instruction in
> the inline-LDG sequence for hidden symbols.
> >
> > We considered a few other schemes, including:
> >
> > Creating a dynamic symbol table entry for all hidden globals and giving
> them the same st_other.STO_TAGGED treatment. These entries would not
> require symbol names, but Elf(Sym) entries are 24 bytes (in comparison to 8
> bytes for the MTEGLOBTAB schema under the small code model). For an AOSP
> build, using dynamic symbol entries instead of MTEGLOBTAB results in a
> 2.3MiB code size increase across all DSO's.
> >
> > Making all hidden symbol accesses go through a local-GOT. Requires an
> extra indirection for all local symbols - resulting in increased cache
> pressure (and thus decreased performance) over a simple `ldg` of the tag
> (as the dcache and tag-cache are going to be warmed anyway for the
> load/store). Unlike the MTEGLOBTAG scheme however, this scheme is backwards
> compatible, allowing MTE-globals built binaries to run on old ARM64
> hardware (as no incompatible instructions are emitted), the same as heap
> tagging. Stack tagging requires a new ABI - and we expect the MTE globals
> scheme to be enabled in partnership with stack tagging, thus we are
> unconcerned about the ABI requirement for the MTEGLOBTAG scheme.
> >
> >
> > Please let us know any feedback you have. We're currently working on an
> experimental version and will update with any more details as they arise.
> >
> >
> > Thanks,
> >
> > Mitch.
> >
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200918/33ba4990/attachment.html>


More information about the llvm-dev mailing list