[llvm-dev] Emitting aligned nlist_64 structures for Mach-O in MC

David Blaikie via llvm-dev llvm-dev at lists.llvm.org
Tue May 26 15:55:10 PDT 2020


+Lang for Darwin linker things - Lang, any ideas if this is correct in
ld64? Should be fixed there (for back-compat) or fixed forward (because
things seem to have been going OK so far... )?

On Tue, May 26, 2020 at 3:24 PM James Y Knight <jyknight at google.com> wrote:

> Yes, those OSReadLittleInt* functions are *quite* the attractive
> nuisance. The interface is highly suggestive that it should work on values
> of arbitrary alignments...but it doesn't.
>
> In fact, nearly every use I've seen of this family of functions has been
> an error of this kind. We had to fix a bunch of that internally to Google a
> while back (e.g.
> https://github.com/protocolbuffers/protobuf/commit/a0eb83f5904757cc9690fec7867c04b8c2c95249).
> I'm going to bet that the ld64 developers were under the same
> misapprehension about this function's contract as everyone else was. :)
>
>
> On Tue, May 26, 2020 at 6:12 PM Shoaib Meenai via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> I looked into this further. ld64 has a macho_nlist abstraction over the
>> various underlying nlist structures [1]. On x86-64, the P::getP referenced
>> in n_value will resolve to [2], which in turn goes to [3], which calls
>> OSReadLittleInt64. On a little endian machine, OSReadLittleEndian just
>> calls _OSReadInt64 [4], which in turn does a pointer arithmetic and cast
>> and then dereferences the pointer [5]. As far as I can see, this will
>> invoke UB for an unaligned int64, unless ld64 is built against a different
>> version of OSReadLittleEndian, or unless I’m missing something.
>>
>>
>>
>> [1]
>> https://github.com/apple-opensource/ld64/blob/fd3feabb0a1eb18ab5d7910f3c3a5eed99cef6ab/src/abstraction/MachOFileAbstraction.hpp#L1283-L1304
>>
>> [2]
>> https://github.com/apple-opensource/ld64/blob/fd3feabb0a1eb18ab5d7910f3c3a5eed99cef6ab/src/abstraction/FileAbstraction.hpp#L137
>>
>> [3]
>> https://github.com/apple-opensource/ld64/blob/fd3feabb0a1eb18ab5d7910f3c3a5eed99cef6ab/src/abstraction/FileAbstraction.hpp#L96
>>
>> [4]
>> https://github.com/apple/darwin-xnu/blob/a449c6a3b8014d9406c2ddbdc81795da24aa7443/libkern/libkern/OSByteOrder.h#L246
>>
>> [5]
>> https://github.com/apple/darwin-xnu/blob/a449c6a3b8014d9406c2ddbdc81795da24aa7443/libkern/libkern/OSByteOrder.h#L109-L117
>>
>>
>>
>> *From: *David Blaikie <dblaikie at gmail.com>
>> *Date: *Tuesday, May 26, 2020 at 11:25 AM
>> *To: *Shoaib Meenai <smeenai at fb.com>
>> *Cc: *"llvm-dev at lists.llvm.org" <llvm-dev at lists.llvm.org>
>> *Subject: *Re: [llvm-dev] Emitting aligned nlist_64 structures for
>> Mach-O in MC
>>
>>
>>
>> I'd have figured if ld64 is able to handle this correctly (are you sure
>> ld64 has UB in this situation when the lists are underaligned? perhaps it
>> has utilities for doing underaligned reads, etc?) it's probably important
>> for lld to handle them too - old object files in archives, things built by
>> not-llvm, etc.
>>
>>
>>
>> On Tue, May 26, 2020 at 11:01 AM Shoaib Meenai via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>> As part of our work on LLD for Mach-O, we’ve observed that the object
>> files produced by LLVM don’t always have aligned nlist_64 entries. For
>> context, nlist_64 is the 64-bit symbol table entry structure for Mach-O,
>> and the symbol table is an array of nlist_64 entries. nlist_64 has an 8
>> byte member, so it should be 8-byte aligned, but we’ve seen object files
>> where the symbol table only has a 4-byte alignment.
>>
>> I don't know if Mach-O mandates an alignment for the symbol table (I
>> haven't found any such requirements in the documentation), but it would be
>> convenient for MC to emit an 8-byte aligned symbol table. We parse an input
>> file in LLD by memory mapping it and using casts to access the various
>> structures. When the symbol table isn't 8-byte aligned, we're accessing the
>> nlist_64 entries in an unaligned fashion, which makes UBSAN (rightly)
>> unhappy. As far as I can see, ld64 also does pointer arithmetic and casting
>> to parse the symbol table, so it would run into the same issue. Does anyone
>> see any problems with making MC always emit an 8-byte aligned symbol table
>> for 64-bit Mach-O files, to avoid this issue?
>>
>> Of course, we should still handle object files we see in the wild without
>> the 8-byte alignment. From what I understand, the blessed way to handle a
>> potentially unaligned access is with a memcpy, and the compiler should be
>> able to optimize out the memcpy on architectures that support unaligned
>> accesses. A coworker is implementing that approach in
>> https://reviews.llvm.org/D80414
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__reviews.llvm.org_D80414&d=DwMFaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=o3kDXzdBUE3ljQXKeTWOMw&m=Ux8t6Su4JVUd45UmVgQRCU2YZHX4HTWB1742jKxnhJo&s=Lg9_RU6QSNSHF88ry4t5DTpADTj7oQ_ur3L5j-mCIf8&e=>,
>> but before we go ahead with that, I wanted to confirm (a) how reliable the
>> memcpy optimization is across all our supported compilers (in particular,
>> I've seen MSVC struggle with this optimization in the past), because we
>> really don't want to be paying the cost of an actual memcpy in this
>> codepath, and (b) if there's other good ways to handle this that we could
>> explore.
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.llvm.org_cgi-2Dbin_mailman_listinfo_llvm-2Ddev&d=DwMFaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=o3kDXzdBUE3ljQXKeTWOMw&m=Ux8t6Su4JVUd45UmVgQRCU2YZHX4HTWB1742jKxnhJo&s=ya3X_moM7YSEY6nBjT5Aov0B4JH04p7MJHwNSJiCPDw&e=>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200526/be6ca075/attachment.html>


More information about the llvm-dev mailing list