[PATCH] D16599: ELF: Define another entry point.

Tue Feb 2 10:59:54 PST 2016

On Tue, Feb 2, 2016 at 8:44 AM, David Blaikie <dblaikie at gmail.com> wrote:

>
>
> On Mon, Feb 1, 2016 at 11:05 PM, Sean Silva via llvm-commits <
> llvm-commits at lists.llvm.org> wrote:
>
>>
>>
>> On Mon, Feb 1, 2016 at 12:27 PM, Rui Ueyama <ruiu at google.com> wrote:
>>
>>> Even if a file is technically sane, you can craft a malicious one; for
>>> example, you can probably crash the linker by OOM by setting a very large
>>> number as an alignment requirement for each section so that the size of
>>> output becomes huge. It is easily doable using assembly. So my answer
>>> is "any clang or gcc produced .o not including inline asm". (It does not
>>> mean that we do not try to recover from errors caused by bad assembly code,
>>> but we don't/can't guarantee 100% recovery.)
>>>
>>
>> You can probably find some way to set the alignment using an attribute or
>> whatever even from clang (and without inlineasm).
>>
>> I don't think there is a platonically-ideal answer for this. It's more
>> about goals:
>> - as a command line tool, we don't want legitimate users to see us
>> crashing during normal use (if a user is intentionally trying to kill LLD,
>> it is not as embarrassing though, so we don't need to worry much about that
>> case).
>> - we want to be useful (someday) as a library that can be safely used
>> in-process, so we need to provide certain guarantees (but these are not
>> hugely constraining, because we can assume that the calling code is
>> programmatically generating the file in good faith).
>>
>
> I don't think this is a valid assumption for all programmatic users (&
> indeed Clang and LLVM both have ways of accepting untrusted inputs - the
> assumption in LLVM is "if it's not already in the in-memory representation,
> it's not trusted" (parsing bitcode, reading files, etc) and I think the
> same would probably be reasonable in lld - callers with object contents in
> memory (or even a higher level representation - the same as the difference
> between LLVM IR and LLVM bitcode in a memory buffer) can choose to have lld
> assume validity (if they produced it from an API they trust/are willing to
> bugfix if it's ever wrong) or ask for verification (if they got the object
> over a network connection or other untrusted source (perhaps read it out of
> a compressed archive, etc))). An API integration of LLD into the Clang
> driver wouldn't be a sound place to make this assumption - some objects may
> be passed to Clang (not generated by it) from some other compilation or
> source, for example.
>

The difference is we do not have an in-memory representation of object
files, or we are using mmap'ed ELF files as the internal representation.
So, if files are not not trustworthy, you can not make any assumption on
the data you are handling throughout the program execution time. That's
probably too hostile environment and doing error check on the way would be
error-prone, slow, or complicate the code. If we use an analogy of Clang
and LLVM, we probably want to have a separate verifier for object files
which you can run on object files from untrusted source before passing it
to the link() function (so, although the two are in the same format,
untrusted ELF files are "external representation", and verified ELF files
are "internal representation").

>

>> -- Sean Silva
>>
>>
>>>
>>> On Mon, Feb 1, 2016 at 12:11 PM, Rafael Espíndola <
>>> rafael.espindola at gmail.com> wrote:
>>>
>>>> On 1 February 2016 at 15:06, Rui Ueyama <ruiu at google.com> wrote:
>>>> > On Mon, Feb 1, 2016 at 11:57 AM, Rafael Espíndola
>>>> > <rafael.espindola at gmail.com> wrote:
>>>> >>
>>>> >> On 1 February 2016 at 14:46, Sean Silva <chisophugis at gmail.com>
>>>> wrote:
>>>> >> > I think one of the main use cases that has been requested is to be
>>>> able
>>>> >> > to
>>>> >> > programmatically call the linker with "known good" object files
>>>> (i.e.
>>>> >> > produced by the compiler). That simplifies things a lot. Rui's
>>>> recent
>>>> >> > patches that are thread_local'izing existing globals seems like a
>>>> >> > satisfactory approach. Or am I missing something?
>>>> >>
>>>> >> Yes, known good files are a lot easier to handle. We just have to be
>>>> >> clear what "known good" is.
>>>> >>
>>>> >> > The R_X86_64_REX_GOTPCRELX situation can probably be likened to
>>>> someone
>>>> >> > giving clang a piece of source code with an inline asm that has:
>>>> >> >
>>>> >> > .text
>>>> >> > .byte <some garbage>
>>>> >> >
>>>> >> > in it. We don't guarantee that the output "makes sense" because
>>>> there's
>>>> >> > really no way for us to know what "makes sense" in a precise way
>>>> (i.e.,
>>>> >> > a
>>>> >> > way that we can program).
>>>> >>
>>>> >> Would we still be required to check the offsets so we don't crash? An
>>>> >> assembly file can contain
>>>> >>
>>>> >> .reloc 0, R_X86_64_REX_GOTPCRELX, foo
>>>> >> .long 4
>>>> >>
>>>> >> which would put that relocation in an invalid location. In general,
>>>> is
>>>> >> an arbitrary assembly file to be considered "known good"? Is that
>>>> true
>>>> >> even for things like
>>>> >>
>>>> >> .section .eh_frame, ....
>>>> >> garbage
>>>> >>
>>>> >> that the linker has to parse?
>>>> >
>>>> >
>>>> > I think the answer is case-by-case, but I don't think we have to
>>>> guarantee
>>>> > to recover from errors caused by carefully-crafted malicious object
>>>> files.
>>>> > (Is there anyone who disagrees with that?)
>>>>
>>>> It is definitely not a use case *I* have an interest in. I just want
>>>> to be an agreement on what use case we want to support at the moment.
>>>> Is it "any .o file", "any llvm-mc or gas produced .o", "any clang or
>>>> gcc produced .o not including inline asm"?
>>>>
>>>> Cheers,
>>>> Rafael
>>>>
>>>
>>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160202/c88ed0db/attachment.html>