[PATCH] D16599: ELF: Define another entry point.

David Blaikie via llvm-commits llvm-commits at lists.llvm.org
Tue Feb 2 11:07:55 PST 2016

On Tue, Feb 2, 2016 at 10:59 AM, Rui Ueyama <ruiu at google.com> wrote:

> On Tue, Feb 2, 2016 at 8:44 AM, David Blaikie <dblaikie at gmail.com> wrote:
>> On Mon, Feb 1, 2016 at 11:05 PM, Sean Silva via llvm-commits <
>> llvm-commits at lists.llvm.org> wrote:
>>> On Mon, Feb 1, 2016 at 12:27 PM, Rui Ueyama <ruiu at google.com> wrote:
>>>> Even if a file is technically sane, you can craft a malicious one; for
>>>> example, you can probably crash the linker by OOM by setting a very large
>>>> number as an alignment requirement for each section so that the size of
>>>> output becomes huge. It is easily doable using assembly. So my answer
>>>> is "any clang or gcc produced .o not including inline asm". (It does not
>>>> mean that we do not try to recover from errors caused by bad assembly code,
>>>> but we don't/can't guarantee 100% recovery.)
>>> You can probably find some way to set the alignment using an attribute
>>> or whatever even from clang (and without inlineasm).
>>> I don't think there is a platonically-ideal answer for this. It's more
>>> about goals:
>>> - as a command line tool, we don't want legitimate users to see us
>>> crashing during normal use (if a user is intentionally trying to kill LLD,
>>> it is not as embarrassing though, so we don't need to worry much about that
>>> case).
>>> - we want to be useful (someday) as a library that can be safely used
>>> in-process, so we need to provide certain guarantees (but these are not
>>> hugely constraining, because we can assume that the calling code is
>>> programmatically generating the file in good faith).
>> I don't think this is a valid assumption for all programmatic users (&
>> indeed Clang and LLVM both have ways of accepting untrusted inputs - the
>> assumption in LLVM is "if it's not already in the in-memory representation,
>> it's not trusted" (parsing bitcode, reading files, etc) and I think the
>> same would probably be reasonable in lld - callers with object contents in
>> memory (or even a higher level representation - the same as the difference
>> between LLVM IR and LLVM bitcode in a memory buffer) can choose to have lld
>> assume validity (if they produced it from an API they trust/are willing to
>> bugfix if it's ever wrong) or ask for verification (if they got the object
>> over a network connection or other untrusted source (perhaps read it out of
>> a compressed archive, etc))). An API integration of LLD into the Clang
>> driver wouldn't be a sound place to make this assumption - some objects may
>> be passed to Clang (not generated by it) from some other compilation or
>> source, for example.
> The difference is we do not have an in-memory representation of object
> files, or we are using mmap'ed ELF files as the internal representation.
> So, if files are not not trustworthy, you can not make any assumption on
> the data you are handling throughout the program execution time. That's
> probably too hostile environment and doing error check on the way would be
> error-prone, slow, or complicate the code.

I'm not sure I believe that's the case (that it's necessarily
slow/complicated/error-prone) anymore than Clang is - it has untrusted
inputs & has to handle all the possible ways people can write incorrect
source code. (& LLVM too, but, yes - it often gets trusted input in-memory,
but once it goes to disk, LTO for example, verifies it every time - in the
same way I would expect a linker to do so for object files off disk)

> If we use an analogy of Clang and LLVM, we probably want to have a
> separate verifier for object files which you can run on object files from
> untrusted source before passing it to the link() function (so, although the
> two are in the same format, untrusted ELF files are "external
> representation", and verified ELF files are "internal representation").

*nod* but I'm suggesting if it's from disk it's untrusted (at least that's
how LTO, LLVM, and Clang work) & since that's the majority case for a
linker, that it's likely to be the case we care about for API use and for
performance. LLVM's JIT is the sort of case I imagine having "trusted"
inputs - generated in memory by a trusted API, any time the generation and
consumption disagree on validity it would be considered a programmer error
and fixed as a bug in the program as a whole (by fixing producer or
consumer). (such a JIT would also have untrusted inputs it would read from
the filesystem too, no doubt - predefined libraries to link in, etc)

- David

>>> -- Sean Silva
>>>> On Mon, Feb 1, 2016 at 12:11 PM, Rafael EspĂ­ndola <
>>>> rafael.espindola at gmail.com> wrote:
>>>>> On 1 February 2016 at 15:06, Rui Ueyama <ruiu at google.com> wrote:
>>>>> > On Mon, Feb 1, 2016 at 11:57 AM, Rafael EspĂ­ndola
>>>>> > <rafael.espindola at gmail.com> wrote:
>>>>> >>
>>>>> >> On 1 February 2016 at 14:46, Sean Silva <chisophugis at gmail.com>
>>>>> wrote:
>>>>> >> > I think one of the main use cases that has been requested is to
>>>>> be able
>>>>> >> > to
>>>>> >> > programmatically call the linker with "known good" object files
>>>>> (i.e.
>>>>> >> > produced by the compiler). That simplifies things a lot. Rui's
>>>>> recent
>>>>> >> > patches that are thread_local'izing existing globals seems like a
>>>>> >> > satisfactory approach. Or am I missing something?
>>>>> >>
>>>>> >> Yes, known good files are a lot easier to handle. We just have to be
>>>>> >> clear what "known good" is.
>>>>> >>
>>>>> >> > The R_X86_64_REX_GOTPCRELX situation can probably be likened to
>>>>> someone
>>>>> >> > giving clang a piece of source code with an inline asm that has:
>>>>> >> >
>>>>> >> > .text
>>>>> >> > .byte <some garbage>
>>>>> >> >
>>>>> >> > in it. We don't guarantee that the output "makes sense" because
>>>>> there's
>>>>> >> > really no way for us to know what "makes sense" in a precise way
>>>>> (i.e.,
>>>>> >> > a
>>>>> >> > way that we can program).
>>>>> >>
>>>>> >> Would we still be required to check the offsets so we don't crash?
>>>>> An
>>>>> >> assembly file can contain
>>>>> >>
>>>>> >> .reloc 0, R_X86_64_REX_GOTPCRELX, foo
>>>>> >> .long 4
>>>>> >>
>>>>> >> which would put that relocation in an invalid location. In general,
>>>>> is
>>>>> >> an arbitrary assembly file to be considered "known good"? Is that
>>>>> true
>>>>> >> even for things like
>>>>> >>
>>>>> >> .section .eh_frame, ....
>>>>> >> garbage
>>>>> >>
>>>>> >> that the linker has to parse?
>>>>> >
>>>>> >
>>>>> > I think the answer is case-by-case, but I don't think we have to
>>>>> guarantee
>>>>> > to recover from errors caused by carefully-crafted malicious object
>>>>> files.
>>>>> > (Is there anyone who disagrees with that?)
>>>>> It is definitely not a use case *I* have an interest in. I just want
>>>>> to be an agreement on what use case we want to support at the moment.
>>>>> Is it "any .o file", "any llvm-mc or gas produced .o", "any clang or
>>>>> gcc produced .o not including inline asm"?
>>>>> Cheers,
>>>>> Rafael
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160202/3b53a433/attachment.html>

More information about the llvm-commits mailing list