[llvm-dev] [MTE] Globals Tagging - Discussion

Tue Sep 22 09:56:38 PDT 2020

Hi Jessica,

Thanks for the info. I'm assuming that the CHERI-on-Morello scheme is going
to require its own relocation types and instructions in order to make
different use of MTE.

Is there anything in our specification that is cross-applicable under
Arm+CHERI? I'm assuming the symbol tagging scheme might be useful, but not
the RELATIVE_TAGGED relocation as it's designed for spatial and temporal
safety. Would you recommend any changes here to allow Arm+CHERI to take
advantage?

On Mon, Sep 21, 2020 at 3:28 PM Jessica Clarke via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> > On 21 Sep 2020, at 15:05, David Spickett via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> >
> >> I might be missing your point here - but don't forget that the local
> globals are always PC-relative direct loads/stores.
> >
> > I did forget! Thanks for clarifying, now I understand.
>
> I think it's worth pointing out this is only true on ABIs that implement
> pointers using integer addresses. On CHERI[1], and thus Arm's upcoming
> Morello
> research prototype[2,3], we use a pure capability ABI where every C
> language
> pointer is a bounded capability with associated permissions, but the same
> is
> also true for all the sub-language-level pointers such as the program
> counter,
> meaning that a PC-relative pointer has read and execute permission but not
> write permission. Thus, pointers to local globals still use a GOT (except
> containing capabilities, not addresses). It might be wise to pick a
> sufficiently-flexible scheme such that it would compose properly with
> CHERI.
>
> On the other hand, however, MTE on CHERI would be used for a very different
> purpose, as by having our capabilities be bounded we already enforce
> spatial
> memory safety and a notion of pointer provenance in a non-probabilistic
> manner,
> so there is no need to make use of the probabilistic protection that MTE
> can
> provide. One of our interests is using MTE to provide versioning of memory
> in
> order to be able to reuse the same memory multiple times in a
> temporally-safe
> way without having to perform revocation sweeps; anyone interested should
> take
> a look at §D.9 of CHERI ISAv7[4] (ISAv8 will be released within a few
> weeks and
> has a little more detail).
>
> Jess
>
> [1] https://www.cl.cam.ac.uk/research/security/ctsrd/cheri/
> [2]
> https://developer.arm.com/architectures/cpu-architecture/a-profile/morello
> [3] https://www.morello-project.org
> [4] https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-927.pdf
>
> > On Fri, 18 Sep 2020 at 20:51, Evgenii Stepanov <eugenis at google.com>
> wrote:
> >>
> >>
> >>
> >> On Fri, Sep 18, 2020 at 12:18 PM Mitch Phillips via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> >>>
> >>> Hi David,
> >>>
> >>>> Does the tagging of these hidden symbols only protect against RW
> >>>> primitives without a similar ldg? If I knew the address of the hidden
> >>>> symbol I could presumably use the same sequence, but I think I'm
> >>>> stretching what memory tagging is supposed to protect against.
> >>>
> >>>
> >>> I might be missing your point here - but don't forget that the local
> globals are always PC-relative direct loads/stores. The `ldg` sequence in
> that example can only be used to get `&g` (and nothing else). There
> shouldn't be any `ldg`'s of arbitrary addresses (unless an attacker already
> has control of the instruction pointer, which means they've already
> bypassed MTE).
> >>>
> >>>> Does this mean that the value of array_end must have the same tag as
> >>>> array[]. Then &array_end would have a different tag since it's a
> >>>> different global?
> >>>
> >>>
> >>> Yes, exactly.
> >>>
> >>>> For example you might assign tag 1 to array, then tag 2 to array_end.
> >>>> Which means that array_end has a tag of 2 and so does array[16].
> >>>> (assuming they're sequential)
> >>>> |            array            | array_end/array[16] |
> >>>> | < 1> <1> <1> <1>  |            <2>               |
> >>>>
> >>>>
> >>>>
> >>>> So if we just did a RELATIVE relocation then array_end's value would
> >>>> have a tag of 2, so you couldn't do:
> >>>> for (int* ptr=array; ptr != array_end; ++ptr)
> >>>> Since it's always != due to the tags.
> >>>> Do I have that right?
> >>>
> >>>
> >>> Yep - you've got it right, this is why we need TAGGED_RELATIVE. For
> clarity, here's the memory layout where array_end is relocated using
> TAGGED_RELATIVE{*r_offset = &array[16], r_addend = &array[0]}:
> >>> arrayarray_end(padding)
> >>> Memory Tag0x10x10x20x2
> >>> Value0000(0x1 << 56) | &array[16]00
> >>>
> >>> So the address tag of `array` and `array_end` are the same (only
> `&array_end` has an memory/address tag of 0x2), and thus `for (int*
> ptr=array; ptr != array_end; ++ptr)` works normally.
> >>>
> >>>> Also, if you have this same example but the array got rounded up to
> >>>> the nearest granule e.g. (4 byte ints, 16 byte granules)
> >>>> int array[3]; // rounded up to be array[4]
> >>>> int* array_end = array[3];
> >>>> Would you emit a normal RELATIVE relocation for array_end, because
> >>>> it's within the bounds of the rounded up array. Or a TAGGED_RELATIVE
> >>>> relocation because it's out of bounds of the original size of the
> >>>> array?
> >>>> (I don't think doing the former is a problem but I'm not a linker
> expert)
> >>>
> >>>
> >>> At this stage, this would generate a TAGGED_RELATIVE. We expect
> TAGGED_RELATIVE to be relatively scarce, and coming up with a more complex
> scheme for the linker to optimise this edge case where it's in bounds of
> the granule padding (but not the symbol itself) seems over-the-top. In
> saying that, it's a possibility for later revisions.
> >>
> >>
> >> The plan calls to
> >>> Realign to granule size (16 bytes), resize to multiple of granule size
> (e.g. 40B -> 48B).
> >> so this would never happen.
> >>
> >> The symbols are resized in order to prevent smaller untagged symbols
> from getting into the padding of the 16-byte aligned tagged ones.
> >> I'm not sure if it's desirable to change the symbol size just for this
> reason. The linker could always suppress such packing for STO_TAGGED
> symbols.
> >>
> >> In any case, since all sizes and alignments are known, the compiler
> should be allowed to emit RELATIVE in the rounded-up array case.
> >>
> >>>
> >>>
> >>> On Fri, Sep 18, 2020 at 4:10 AM David Spickett <
> david.spickett at linaro.org> wrote:
> >>>>
> >>>> Hi Mitch,
> >>>>
> >>>> In the intro you say:
> >>>>> It would also allow attackers with a semilinear RW primitive to
> trivially attack global variables if the offset is controllable. Dynamic
> global tags are required to provide the same MTE mitigation guarantees that
> are afforded to stack and heap memory.
> >>>>
> >>>> Then later:
> >>>>> b) Hidden Symbols (static int g; or -fvisibility=hidden)
> >>>>> Materialization of hidden symbols now fetch and insert the memory
> tag via. `ldg`. On aarch64, this means non PC-relative
> loads/stores/address-taken (*g = 7;) generates:
> >>>>> adrp x0, g;
> >>>>> ldg x0, [x0, :lo12:g]; // new instruction
> >>>>> mov x1, #7;
> >>>>> str x1, [x0, :lo12:g];
> >>>>
> >>>> Does the tagging of these hidden symbols only protect against RW
> >>>> primitives without a similar ldg? If I knew the address of the hidden
> >>>> symbol I could presumably use the same sequence, but I think I'm
> >>>> stretching what memory tagging is supposed to protect against. Mostly
> >>>> wanted to check I understood.
> >>>>
> >>>> Speaking of understanding...
> >>>>
> >>>>> Introduce a TAGGED_RELATIVE relocation - in order to solve the
> problem where the tag derivation shouldn't be from the relocation result,
> e.g.
> >>>>> static int array[16] = {};
> >>>>> // array_end must have the same tag as array[]. array_end is out of
> >>>>> // bounds w.r.t. array, and may point to a completely different
> global.
> >>>>> int *array_end = &array[16];
> >>>>
> >>>> Does this mean that the value of array_end must have the same tag as
> >>>> array[]. Then &array_end would have a different tag since it's a
> >>>> different global?
> >>>>
> >>>> For example you might assign tag 1 to array, then tag 2 to array_end.
> >>>> Which means that array_end has a tag of 2 and so does array[16].
> >>>> (assuming they're sequential)
> >>>> |            array            | array_end/array[16] |
> >>>> | < 1> <1> <1> <1>  |            <2>               |
> >>>>
> >>>> So if we just did a RELATIVE relocation then array_end's value would
> >>>> have a tag of 2, so you couldn't do:
> >>>> for (int* ptr=array; ptr != array_end; ++ptr)
> >>>> Since it's always != due to the tags.
> >>>>
> >>>> Do I have that right?
> >>>>
> >>>> Also, if you have this same example but the array got rounded up to
> >>>> the nearest granule e.g. (4 byte ints, 16 byte granules)
> >>>> int array[3]; // rounded up to be array[4]
> >>>> int* array_end = array[3];
> >>>>
> >>>> Would you emit a normal RELATIVE relocation for array_end, because
> >>>> it's within the bounds of the rounded up array. Or a TAGGED_RELATIVE
> >>>> relocation because it's out of bounds of the original size of the
> >>>> array?
> >>>> (I don't think doing the former is a problem but I'm not a linker
> expert)
> >>>>
> >>>> Thanks,
> >>>> David Spickett.
> >>>>
> >>>> On Thu, 17 Sep 2020 at 23:05, Mitch Phillips via llvm-dev
> >>>> <llvm-dev at lists.llvm.org> wrote:
> >>>>>
> >>>>> Hi folks,
> >>>>>
> >>>>>
> >>>>> ARM v8.5 introduces the Memory Tagging Extension (MTE), a hardware
> that allows for detection of memory safety bugs (buffer overflows,
> use-after-free, etc) with low overhead. So far, MTE support is implemented
> in the Scudo hardened allocator (compiler-rt/lib/scudo/standalone) for
> heap, and stack allocation is implemented in LLVM/Clang behind
> -fsanitize=memtag.
> >>>>>
> >>>>>
> >>>>> As part of a holistic MTE implementation, global memory should also
> be properly tagged. HWASan (a software-only implementation of MTE) has a
> schema that uses static tags, however these can be trivially determined by
> an attacker with access to the ELF file. This would allow attackers with
> arbitrary read/write to trivially attack global variables. It would also
> allow attackers with a semilinear RW primitive to trivially attack global
> variables if the offset is controllable. Dynamic global tags are required
> to provide the same MTE mitigation guarantees that are afforded to stack
> and heap memory.
> >>>>>
> >>>>>
> >>>>> We've got a plan in mind about how to do MTE globals with fully
> dynamic tags, but we'd love to get feedback from the community. In
> particular - we'd like to try and align implementation details with GCC as
> the scheme requires cooperation from the compiler, linker, and loader.
> >>>>>
> >>>>>
> >>>>> Our current ideas are outlined below. All the compiler features
> (including realignment, etc.) would be guarded behind -fsanitize=memtag.
> Protection of read-only globals would be enabled-by-default, but can be
> disabled at compile time behind a flag (likely
> -f(no)sanitize-memtag-ro-globals).
> >>>>>
> >>>>>
> >>>>> a) Dynamic symbols (int f; extern int f;)
> >>>>>
> >>>>> Mark all tagged global data symbols in the dynamic symbol table as
> st_other.STO_TAGGED.
> >>>>>
> >>>>> Teach the loader to read the symbol table at load time (and
> dlopen()) prior to relocations, and apply random memory tags (via. `irg ->
> stg`) to each STO_TAGGED carrying global.
> >>>>>
> >>>>> b) Hidden Symbols (static int g; or -fvisibility=hidden)
> >>>>>
> >>>>> Have the compiler mark hidden tagged globals in the symbol table as
> st_other.STO_TAGGED.
> >>>>>
> >>>>> Have the linker read the symbol table and create a table of {
> unrelocated virtual address, size } pairs for each STO_TAGGED carrying
> hidden global, storing this in a new section (.mteglobtab).
> >>>>>
> >>>>> Create a new dynamic entry "DT_MTEGLOBTAB" that points to this
> segment, along with "DT_MTEGLOBENT" for the size of each entry and
> "DT_MTEGLOBSZ" for the size (in bytes) of the table.
> >>>>>
> >>>>> Similar to dynamic symbols, teach the loader to read this table and
> apply random memory tags to each global prior to relocations.
> >>>>>
> >>>>> Materialization of hidden symbols now fetch and insert the memory
> tag via. `ldg`. On aarch64, this means non PC-relative
> loads/stores/address-taken (*g = 7;) generates:
> >>>>>  adrp x0, g;
> >>>>>  ldg x0, [x0, :lo12:g]; // new instruction
> >>>>>  mov x1, #7;
> >>>>>  str x1, [x0, :lo12:g];
> >>>>>
> >>>>> Note that this materialization sequence means that executables built
> with MTE globals are not able to run on non-MTE hardware.
> >>>>>
> >>>>> Note: Some dynamic symbols can be transformed at link time into
> hidden symbols if:
> >>>>>
> >>>>> The symbol is in an object file that is statically linked into an
> executable and is not referenced in any shared libraries, or
> >>>>>
> >>>>> The symbol has its visibility changed with a version script.
> >>>>>
> >>>>> These globals always have their addresses derived from a GOT entry,
> and thus have their address tag materialized through the RELATIVE
> relocation of the GOT entry. Due to the lack of dynamic symbol table entry
> however, the memory would go untagged. The linker must ensure it creates an
> MTEGLOBTAB entry for all hidden MTE-globals, including those that are
> transformed from external to hidden. DSO's linked with -Bsymbolic retain
> their dynamic symbol table entries, and thus require no special handling.
> >>>>>
> >>>>>
> >>>>> c) All symbols
> >>>>>
> >>>>> Realign to granule size (16 bytes), resize to multiple of granule
> size (e.g. 40B -> 48B).
> >>>>>
> >>>>> Ban data folding (except where contents and size are same, no tail
> merging).
> >>>>>
> >>>>> In the loader, ensure writable segments (and possibly .rodata, see
> next dot point) are mapped MAP_ANONYMOUS and PROT_MTE (with the contents of
> the mappings filled from the file), as file-based mappings aren't
> necessarily backed by tag-capable memory. It also requires in-place
> remapping of data segments from the program image (as they're already
> mapped by the kernel before PT_INTERP invokes the loader).
> >>>>>
> >>>>> Make .rodata protection optional. When read-only protection is in
> use, the .rodata section should be moved into a separate segment. For
> Bionic libc, the rodata section takes up 20% of its ALLOC | READ segment,
> and we'd like to be able to maintain page sharing for the remaining 189KiB
> of other read-only data in this segment.
> >>>>>
> >>>>> d) Relocations
> >>>>>
> >>>>> GLOB_DAT, ABS64, and RELATIVE relocations change semantics - they
> would be required to retrieve and insert the memory tag of the symbol into
> the relocated value. For example, the ABS64 relocation becomes:
> >>>>>  sym_addr = get_symbol_address() // sym_addr = 0x1008
> >>>>>  sym_addr |= get_tag(sym_addr & 0xf) // get_tag(0x1008 & 0xf ==
> 0x1000)
> >>>>>  *r_offset = sym_addr + r_addend;
> >>>>>
> >>>>> Introduce a TAGGED_RELATIVE relocation - in order to solve the
> problem where the tag derivation shouldn't be from the relocation result,
> e.g.
> >>>>> static int array[16] = {};
> >>>>> // array_end must have the same tag as array[]. array_end is out of
> >>>>> // bounds w.r.t. array, and may point to a completely different
> global.
> >>>>> int *array_end = &array[16];
> >>>>>
> >>>>> TAGGED_RELATIVE stores the untagged symbol value in the place
> (*r_offset == &array[16]), and keeps the address where the tag should be
> derived in the addend (RELA-only r_addend == &array[0]).
> >>>>>
> >>>>> For derived symbols where the granule-aligned address is in-bounds
> of the tag (e.g. array_end = &array[7] implies the tag can be derived from
> (&array[0] & 0xf)), we can use a normal RELATIVE relocation.
> >>>>>
> >>>>> The TAGGED_RELATIVE operand looks like:
> >>>>>  *r_offset |= get_tag(r_addend & ~0xf);
> >>>>>
> >>>>> ABS64, RELATIVE, and TAGGED_RELATIVE relocations need a slight tweak
> to grab the place's memory tag before use, as the place itself may be
> tagged. So, for example, the TAGGED_RELATIVE operation above actually
> becomes:
> >>>>>  r_offset = ldg(r_offset);
> >>>>>  *r_offset |= get_tag(r_addend & ~0xf);
> >>>>>
> >>>>> Introduce an R_AARCH64_LDG_LO9_SHORT_NC relocation for relocating
> the 9-bit immediate for the LDG instruction. This isn't MTE-globals
> specific, we just seem to be missing the relocation to encode the 9-bit
> immediate for LDG at bits [12..20]. This would save us an additional ADD
> instruction in the inline-LDG sequence for hidden symbols.
> >>>>>
> >>>>> We considered a few other schemes, including:
> >>>>>
> >>>>> Creating a dynamic symbol table entry for all hidden globals and
> giving them the same st_other.STO_TAGGED treatment. These entries would not
> require symbol names, but Elf(Sym) entries are 24 bytes (in comparison to 8
> bytes for the MTEGLOBTAB schema under the small code model). For an AOSP
> build, using dynamic symbol entries instead of MTEGLOBTAB results in a
> 2.3MiB code size increase across all DSO's.
> >>>>>
> >>>>> Making all hidden symbol accesses go through a local-GOT. Requires
> an extra indirection for all local symbols - resulting in increased cache
> pressure (and thus decreased performance) over a simple `ldg` of the tag
> (as the dcache and tag-cache are going to be warmed anyway for the
> load/store). Unlike the MTEGLOBTAG scheme however, this scheme is backwards
> compatible, allowing MTE-globals built binaries to run on old ARM64
> hardware (as no incompatible instructions are emitted), the same as heap
> tagging. Stack tagging requires a new ABI - and we expect the MTE globals
> scheme to be enabled in partnership with stack tagging, thus we are
> unconcerned about the ABI requirement for the MTEGLOBTAG scheme.
> >>>>>
> >>>>>
> >>>>> Please let us know any feedback you have. We're currently working on
> an experimental version and will update with any more details as they arise.
> >>>>>
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>> Mitch.
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> LLVM Developers mailing list
> >>>>> llvm-dev at lists.llvm.org
> >>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >>>
> >>> _______________________________________________
> >>> LLVM Developers mailing list
> >>> llvm-dev at lists.llvm.org
> >>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200922/9ddddffa/attachment-0001.html>