<div dir="ltr"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">note: these bits are not really reserved for os or processor<br>specific use in ELF. in practice they are processor specific<br>so it will be STO_AARCH64_TAGGED.<br></blockquote><div><br></div><div>Correct.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">note2: undefined symbol references will need correct marking<br>too if objects may get copy relocated into the main exe and<br>linkers should check if definitions match references.</blockquote><div><br></div><div>Yep - at this point I expect that resolving an untagged reference with a tagged symbol (or vice versa) should result in a link-time error, but I don't feel particularly strongly about this. Downgrading to untagged should always be safe - but I think this subverts the object files' desire to have tagged globals. This also affects linking object files that are some tagged, some untagged into the same DSO.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">it would be better to discuss on a linux abi or arm abi forum</blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">than on llvm-dev</blockquote><div><br></div><div>If you have any recommendations, that would be much appreciated. We have some ARM ELF folks on the line here, but it's probably not as broad as I would like.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">are object sizes reliable in the dynamic symbol table?<br>is this why there is a need for per symbol marking? </blockquote><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">so it can be a completely dynamic linker internal decision</blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">what globals to tag and how. (it is also backward compat<br>with existing binaries, but it might make sense to have<br>an opt-in mechanism for such tagging.)</blockquote><div><br></div><div>Object sizes are reliable - but marking symbols explicitly allows us to have mixed tagged and untagged symbols in the same segment (think of a symbol we know is being used by non-compliant assembly, we can mark it with __attribute__((nosanitize("mte"))).</div><div><br></div><div>IMO marking symbols in the dynamic symbol table gives us greater flexibility than indiscriminately tagging granule-aligned symbols that fall in the right segments.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">i think a design that prevents sharing is not acceptable.<br></blockquote><div><br></div><div>Unfortunately shared memory isn't required to be tag capable (<a href="https://www.kernel.org/doc/Documentation/filesystems/dax.txt">DAX</a> is an example) - so any PROT_MTE mappings must be anonymous. That's why we'd like to carve out rodata into its own segment, to continue to allow page sharing for the rest of the 80% of the stuff in that segment.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> static int a[8];</blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">static int *p = a - 5;<br>...<br>        p[10] = 1;<br>should work (even if it's not valid in c it can be valid as<br>a c extension or written in asm, so ELF should support it).</blockquote><div><br></div><div>IMO this is exactly the kind of thing that MTE is trying to <i>prevent</i>. I don't see why we would want to support something like this.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">i think tls needs some thought too, arrays are probably<br>not common there, but some protection may be possible in<br>some cases..<br></blockquote><div><br></div><div>Definitely agreed - we haven't fleshed out a TLS story at this point in time -- we're considering it for later iterations though. </div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Oct 8, 2020 at 11:42 AM Szabolcs Nagy via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">* Mitch Phillips via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>> [2020-09-17 15:05:18 -0700]:<br>

> ARM v8.5 introduces the Memory Tagging Extension (MTE), a hardware that<br>

> allows for detection of memory safety bugs (buffer overflows,<br>

> use-after-free, etc) with low overhead. So far, MTE support is implemented<br>

> in the Scudo hardened allocator (compiler-rt/lib/scudo/standalone) for<br>

> heap, and stack allocation is implemented in LLVM/Clang behind<br>

> -fsanitize=memtag <<a href="https://llvm.org/docs/MemTagSanitizer.html" rel="noreferrer" target="_blank">https://llvm.org/docs/MemTagSanitizer.html</a>>.<br>

> <br>

> As part of a holistic MTE implementation, global memory should also be<br>

> properly tagged. HWASan<br>

> <<a href="http://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html" rel="noreferrer" target="_blank">http://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html</a>> (a<br>

> software-only implementation of MTE) has a schema that uses static tags,<br>

> however these can be trivially determined by an attacker with access to the<br>

> ELF file. This would allow attackers with arbitrary read/write to trivially<br>

> attack global variables. It would also allow attackers with a semilinear RW<br>

> primitive to trivially attack global variables if the offset is<br>

> controllable. Dynamic global tags are required to provide the same MTE<br>

> mitigation guarantees that are afforded to stack and heap memory.<br>

> <br>

> We've got a plan in mind about how to do MTE globals with fully dynamic<br>

> tags, but we'd love to get feedback from the community. In particular -<br>

> we'd like to try and align implementation details with GCC as the scheme<br>

> requires cooperation from the compiler, linker, and loader.<br>

> <br>

> Our current ideas are outlined below. All the compiler features (including<br>

> realignment, etc.) would be guarded behind -fsanitize=memtag. Protection of<br>

> read-only globals would be enabled-by-default, but can be disabled at<br>

> compile time behind a flag (likely -f(no)sanitize-memtag-ro-globals).<br>

<br>

i think -fsanitize is not appropriate for an mte abi.<br>

<br>

(i mean you can have an -fsanitize for it, but mte can be<br>

a proper abi between libc, linkers and compilers that<br>

several toolchains can implement independently, not an<br>

llvm vs compiler-rt internal design or android only design.)<br>

<br>

> <br>

> a) Dynamic symbols (int f; extern int f;)<br>

> <br>

>    1.<br>

> <br>

>    Mark all tagged global data symbols in the dynamic symbol table as<br>

>    st_other.STO_TAGGED.<br>

<br>

note: these bits are not really reserved for os or processor<br>

specific use in ELF. in practice they are processor specific<br>

so it will be STO_AARCH64_TAGGED.<br>

<br>

note2: undefined symbol references will need correct marking<br>

too if objects may get copy relocated into the main exe and<br>

linkers should check if definitions match references.<br>

<br>

this will require an ABI bump (otherwise old tools will<br>

silently ignore the new STO flag).<br>

<br>

but i'm not convinced yet that per symbol marking is needed.<br>

<br>

it would be better to discuss on a linux abi or arm abi forum<br>

than on llvm-dev (at least in my experience unsubscribed mail<br>

gets dropped or significantly delayed here and many linux or<br>

arm abi folks are not subscribed)<br>

<br>

>    2.<br>

> <br>

>    Teach the loader to read the symbol table at load time (and dlopen())<br>

>    prior to relocations, and apply random memory tags (via. `irg -> stg`) to<br>

>    each STO_TAGGED carrying global.<br>

<br>

are object sizes reliable in the dynamic symbol table?<br>

is this why there is a need for per symbol marking?<br>

<br>

> <br>

> b) Hidden Symbols (static int g; or -fvisibility=hidden)<br>

> <br>

>    1.<br>

> <br>

>    Have the compiler mark hidden tagged globals in the symbol table as<br>

>    st_other.STO_TAGGED.<br>

>    2.<br>

> <br>

>    Have the linker read the symbol table and create a table of {<br>

>    unrelocated virtual address, size } pairs for each STO_TAGGED carrying<br>

>    hidden global, storing this in a new section (.mteglobtab).<br>

>    3.<br>

> <br>

>    Create a new dynamic entry "DT_MTEGLOBTAB" that points to this segment,<br>

>    along with "DT_MTEGLOBENT" for the size of each entry and "DT_MTEGLOBSZ"<br>

>    for the size (in bytes) of the table.<br>

>    4.<br>

> <br>

>    Similar to dynamic symbols, teach the loader to read this table and<br>

>    apply random memory tags to each global prior to relocations.<br>

<br>

for static linking it's possible to make a static exe self<br>

relocating like how static pie handles RELATIVE relocs, but<br>

this sounds a bit nasty (and will need to use rcrt1.o or a<br>

new *crt1.o entry that guarantees such self relocation).<br>

<br>

>    5.<br>

> <br>

>    Materialization of hidden symbols now fetch and insert the memory tag<br>

>    via. `ldg`. On aarch64, this means non PC-relative<br>

>    loads/stores/address-taken (*g = 7;) generates:<br>

>      adrp x0, g;<br>

>      ldg x0, [x0, :lo12:g]; // new instruction<br>

>      mov x1, #7;<br>

>      str x1, [x0, :lo12:g];<br>

> <br>

>    Note that this materialization sequence means that executables built<br>

>    with MTE globals are not able to run on non-MTE hardware.<br>

<br>

i need to think about this, i think a compiler may transform<br>

<br>

static int a[8];<br>

<br>

void f(int i)<br>

{<br>

        a[i-5] = 0;<br>

}<br>

<br>

to<br>

<br>

        (a-5)[i] = 0;<br>

<br>

i.e. instead of offsetting i, compute the address of a-5 with<br>

adrp and then less instructions can be used for indexing.<br>

<br>

but then ldg on the computed address is not valid.<br>

(this is likely not a performance concern, but implies that<br>

there may be code generation troubles if we assume anything<br>

that is computed with adrp can be fixed up with ldg.)<br>

<br>

> <br>

> Note: Some dynamic symbols can be transformed at link time into hidden<br>

> symbols if:<br>

> <br>

>    1.<br>

> <br>

>    The symbol is in an object file that is statically linked into an<br>

>    executable and is not referenced in any shared libraries, or<br>

>    2.<br>

> <br>

>    The symbol has its visibility changed with a version script.<br>

> <br>

> These globals always have their addresses derived from a GOT entry, and<br>

> thus have their address tag materialized through the RELATIVE relocation of<br>

> the GOT entry. Due to the lack of dynamic symbol table entry however, the<br>

> memory would go untagged. The linker must ensure it creates an MTEGLOBTAB<br>

> entry for all hidden MTE-globals, including those that are transformed from<br>

> external to hidden. DSO's linked with -Bsymbolic retain their dynamic<br>

> symbol table entries, and thus require no special handling.<br>

> <br>

> c) All symbols<br>

> <br>

>    1.<br>

> <br>

>    Realign to granule size (16 bytes), resize to multiple of granule size<br>

>    (e.g. 40B -> 48B).<br>

>    2.<br>

> <br>

>    Ban data folding (except where contents and size are same, no tail<br>

>    merging).<br>

>    3.<br>

> <br>

>    In the loader, ensure writable segments (and possibly .rodata, see next<br>

>    dot point) are mapped MAP_ANONYMOUS and PROT_MTE (with the contents of the<br>

>    mappings filled from the file), as file-based mappings aren't necessarily<br>

>    backed by tag-capable memory. It also requires in-place remapping of data<br>

>    segments from the program image (as they're already mapped by the kernel<br>

>    before PT_INTERP invokes the loader).<br>

<br>

copying file data is a bit ugly but i think this is ok.<br>

<br>

>    4.<br>

> <br>

>    Make .rodata protection optional. When read-only protection is in use,<br>

>    the .rodata section should be moved into a separate segment. For Bionic<br>

>    libc, the rodata section takes up 20% of its ALLOC | READ segment, and we'd<br>

>    like to be able to maintain page sharing for the remaining 189KiB of other<br>

>    read-only data in this segment.<br>

<br>

i think a design that prevents sharing is not acceptable.<br>

<br>

> <br>

> d) Relocations<br>

> <br>

>    1.<br>

> <br>

>    GLOB_DAT, ABS64, and RELATIVE relocations change semantics - they would<br>

>    be required to retrieve and insert the memory tag of the symbol into the<br>

>    relocated value. For example, the ABS64 relocation becomes:<br>

>      sym_addr = get_symbol_address() // sym_addr = 0x1008<br>

>      sym_addr |= get_tag(sym_addr & 0xf) // get_tag(0x1008 & 0xf == 0x1000)<br>

>      *r_offset = sym_addr + r_addend;<br>

>    2.<br>

> <br>

>    Introduce a TAGGED_RELATIVE relocation - in order to solve the problem<br>

>    where the tag derivation shouldn't be from the relocation result, e.g.<br>

>    static int array[16] = {};<br>

>    // array_end must have the same tag as array[]. array_end is out of<br>

>    // bounds w.r.t. array, and may point to a completely different global.<br>

>    int *array_end = &array[16];<br>

> <br>

>    TAGGED_RELATIVE stores the untagged symbol value in the place (*r_offset<br>

>    == &array[16]), and keeps the address where the tag should be derived in<br>

>    the addend (RELA-only r_addend == &array[0]).<br>

> <br>

>    For derived symbols where the granule-aligned address is in-bounds of<br>

>    the tag (e.g. array_end = &array[7] implies the tag can be derived<br>

> from (&array[0]<br>

>    & 0xf)), we can use a normal RELATIVE relocation.<br>

> <br>

>    The TAGGED_RELATIVE operand looks like:<br>

>      *r_offset |= get_tag(r_addend & ~0xf);<br>

>    3.<br>

> <br>

>    ABS64, RELATIVE, and TAGGED_RELATIVE relocations need a slight tweak to<br>

>    grab the place's memory tag before use, as the place itself may be tagged.<br>

>    So, for example, the TAGGED_RELATIVE operation above actually becomes:<br>

>      r_offset = ldg(r_offset);<br>

>      *r_offset |= get_tag(r_addend & ~0xf);<br>

>    4.<br>

> <br>

>    Introduce an R_AARCH64_LDG_LO9_SHORT_NC relocation for relocating the<br>

>    9-bit immediate for the LDG instruction. This isn't MTE-globals specific,<br>

>    we just seem to be missing the relocation to encode the 9-bit immediate for<br>

>    LDG at bits [12..20]. This would save us an additional ADD instruction in<br>

>    the inline-LDG sequence for hidden symbols.<br>

> <br>

> We considered a few other schemes, including:<br>

> <br>

>    1.<br>

> <br>

>    Creating a dynamic symbol table entry for all hidden globals and giving<br>

>    them the same st_other.STO_TAGGED treatment. These entries would not<br>

>    require symbol names, but Elf(Sym) entries are 24 bytes (in comparison to 8<br>

>    bytes for the MTEGLOBTAB schema under the small code model). For an AOSP<br>

>    build, using dynamic symbol entries instead of MTEGLOBTAB results in a<br>

>    2.3MiB code size increase across all DSO's.<br>

>    2.<br>

> <br>

>    Making all hidden symbol accesses go through a local-GOT. Requires an<br>

>    extra indirection for all local symbols - resulting in increased cache<br>

>    pressure (and thus decreased performance) over a simple `ldg` of the tag<br>

>    (as the dcache and tag-cache are going to be warmed anyway for the<br>

>    load/store). Unlike the MTEGLOBTAG scheme however, this scheme is backwards<br>

>    compatible, allowing MTE-globals built binaries to run on old ARM64<br>

>    hardware (as no incompatible instructions are emitted), the same as heap<br>

>    tagging. Stack tagging requires a new ABI - and we expect the MTE globals<br>

>    scheme to be enabled in partnership with stack tagging, thus we are<br>

>    unconcerned about the ABI requirement for the MTEGLOBTAG scheme.<br>

<br>

if object access goes via symbolic dynamic relocation<br>

(GOT, ABS) then there is no need to do anything special:<br>

<br>

- pointer representation is controlled by the dynamic<br>

  linker via the relocs<br>

<br>

- location of object is known (object bounds and<br>

  if it's in a PROT_MTE segment)<br>

<br>

so it can be a completely dynamic linker internal decision<br>

what globals to tag and how. (it is also backward compat<br>

with existing binaries, but it might make sense to have<br>

an opt-in mechanism for such tagging.)<br>

<br>

<br>

new abi is needed to protect local accesses, i'm not yet<br>

sure about the proposed design with two RELATIVE relocs.<br>

i think RELATIVE reloc should not assume that the computed<br>

pointer can be dereferenced, this is not just for the array<br>

end case but for other oob computed pointers too. e.g.<br>

<br>

static int a[8];<br>

static int *p = a - 5;<br>

...<br>

        p[10] = 1;<br>

<br>

should work (even if it's not valid in c it can be valid as<br>

a c extension or written in asm, so ELF should support it).<br>

<br>

e.g. the r_info field in the RELATIVE reloc can index the<br>

MTEGLOBTAB and use the object bounds from there for ldg<br>

(and if r_info==0 means untagged this falls back to normal<br>

RELATIVE reloc processing), but i don't yet know what is<br>

the best solution here.<br>

<br>

i think tls needs some thought too, arrays are probably<br>

not common there, but some protection may be possible in<br>

some cases..<br>

<br>

> <br>

> <br>

> Please let us know any feedback you have. We're currently working on an<br>

> experimental version and will update with any more details as they arise.<br>

> <br>

> Thanks,<br>

> <br>

> Mitch.<br>

<br>

> _______________________________________________<br>

> LLVM Developers mailing list<br>

> <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>

> <a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>

<br>

_______________________________________________<br>

LLVM Developers mailing list<br>

<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>

<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>

</blockquote></div>