[PATCH] D16599: ELF: Define another entry point.

Tue Feb 2 15:18:50 PST 2016

On Tue, Feb 2, 2016 at 2:25 PM, Rui Ueyama <ruiu at google.com> wrote:

> On Tue, Feb 2, 2016 at 1:42 PM, Sean Silva <chisophugis at gmail.com> wrote:
>
>>
>>
>> On Tue, Feb 2, 2016 at 8:44 AM, David Blaikie <dblaikie at gmail.com> wrote:
>>
>>>
>>>
>>> On Mon, Feb 1, 2016 at 11:05 PM, Sean Silva via llvm-commits <
>>> llvm-commits at lists.llvm.org> wrote:
>>>
>>>>
>>>>
>>>> On Mon, Feb 1, 2016 at 12:27 PM, Rui Ueyama <ruiu at google.com> wrote:
>>>>
>>>>> Even if a file is technically sane, you can craft a malicious one; for
>>>>> example, you can probably crash the linker by OOM by setting a very large
>>>>> number as an alignment requirement for each section so that the size of
>>>>> output becomes huge. It is easily doable using assembly. So my answer
>>>>> is "any clang or gcc produced .o not including inline asm". (It does not
>>>>> mean that we do not try to recover from errors caused by bad assembly code,
>>>>> but we don't/can't guarantee 100% recovery.)
>>>>>
>>>>
>>>> You can probably find some way to set the alignment using an attribute
>>>> or whatever even from clang (and without inlineasm).
>>>>
>>>> I don't think there is a platonically-ideal answer for this. It's more
>>>> about goals:
>>>> - as a command line tool, we don't want legitimate users to see us
>>>> crashing during normal use (if a user is intentionally trying to kill LLD,
>>>> it is not as embarrassing though, so we don't need to worry much about that
>>>> case).
>>>> - we want to be useful (someday) as a library that can be safely used
>>>> in-process, so we need to provide certain guarantees (but these are not
>>>> hugely constraining, because we can assume that the calling code is
>>>> programmatically generating the file in good faith).
>>>>
>>>
>>> I don't think this is a valid assumption for all programmatic users (&
>>> indeed Clang and LLVM both have ways of accepting untrusted inputs - the
>>> assumption in LLVM is "if it's not already in the in-memory representation,
>>> it's not trusted" (parsing bitcode, reading files, etc) and I think the
>>> same would probably be reasonable in lld - callers with object contents in
>>> memory (or even a higher level representation - the same as the difference
>>> between LLVM IR and LLVM bitcode in a memory buffer) can choose to have lld
>>> assume validity (if they produced it from an API they trust/are willing to
>>> bugfix if it's ever wrong) or ask for verification (if they got the object
>>> over a network connection or other untrusted source (perhaps read it out of
>>> a compressed archive, etc))). An API integration of LLD into the Clang
>>> driver wouldn't be a sound place to make this assumption - some objects may
>>> be passed to Clang (not generated by it) from some other compilation or
>>> source, for example.
>>>
>>
>> I think these can serve as a baseline that we can document / elaborate on
>> down the road though.
>> For the moment, we can document our current intentions/policies. That way
>> people can either a) concretely file bug reports against us for violating
>> our intentions or b) we can have a concrete discussion on llvm-dev about
>> changing those documented policies/intentions.
>>
>
> Good point. We need to document the current policy whatever it is. And the
> current policy after I submit these pending patches is that "the linker
> doesn't crash or exit (or it is a bug) as long as you don't give
> corrupted/malicious object files." I will write that to the Driver file
> which all people who wants to use will see.
>

README.txt is probably a more appropriate location for a "project policy".

-- Sean Silva

>
> It seems our current situation is that any time anything related to this
>> comes up, everybody and their dog start talking about different
>> hypothetical situations that nobody is actively working on using LLD for
>> (since there are other, higher priorities right now). These may or may not
>> be true, or the parallels to clang/LLVM may or may not be true, but
>> currently we don't have a starting point for a useful discussion. It is all
>> ad-hoc. We need a fixed point of reference for future discussion and what I
>> posted (in this thread and others) seems like a sweet spot to start with;
>> it provides reasonable guarantees and avoids overcommitting our development
>> effort at an early stage.
>>
> I actually have points to say in response to what you said, but here in an
>> llvm-commits discussion is not the right place to discuss it.
>>
>> -- Sean Silva
>>
>>
>>>
>>>> -- Sean Silva
>>>>
>>>>
>>>>>
>>>>> On Mon, Feb 1, 2016 at 12:11 PM, Rafael Espíndola <
>>>>> rafael.espindola at gmail.com> wrote:
>>>>>
>>>>>> On 1 February 2016 at 15:06, Rui Ueyama <ruiu at google.com> wrote:
>>>>>> > On Mon, Feb 1, 2016 at 11:57 AM, Rafael Espíndola
>>>>>> > <rafael.espindola at gmail.com> wrote:
>>>>>> >>
>>>>>> >> On 1 February 2016 at 14:46, Sean Silva <chisophugis at gmail.com>
>>>>>> wrote:
>>>>>> >> > I think one of the main use cases that has been requested is to
>>>>>> be able
>>>>>> >> > to
>>>>>> >> > programmatically call the linker with "known good" object files
>>>>>> (i.e.
>>>>>> >> > produced by the compiler). That simplifies things a lot. Rui's
>>>>>> recent
>>>>>> >> > patches that are thread_local'izing existing globals seems like a
>>>>>> >> > satisfactory approach. Or am I missing something?
>>>>>> >>
>>>>>> >> Yes, known good files are a lot easier to handle. We just have to
>>>>>> be
>>>>>> >> clear what "known good" is.
>>>>>> >>
>>>>>> >> > The R_X86_64_REX_GOTPCRELX situation can probably be likened to
>>>>>> someone
>>>>>> >> > giving clang a piece of source code with an inline asm that has:
>>>>>> >> >
>>>>>> >> > .text
>>>>>> >> > .byte <some garbage>
>>>>>> >> >
>>>>>> >> > in it. We don't guarantee that the output "makes sense" because
>>>>>> there's
>>>>>> >> > really no way for us to know what "makes sense" in a precise way
>>>>>> (i.e.,
>>>>>> >> > a
>>>>>> >> > way that we can program).
>>>>>> >>
>>>>>> >> Would we still be required to check the offsets so we don't crash?
>>>>>> An
>>>>>> >> assembly file can contain
>>>>>> >>
>>>>>> >> .reloc 0, R_X86_64_REX_GOTPCRELX, foo
>>>>>> >> .long 4
>>>>>> >>
>>>>>> >> which would put that relocation in an invalid location. In
>>>>>> general, is
>>>>>> >> an arbitrary assembly file to be considered "known good"? Is that
>>>>>> true
>>>>>> >> even for things like
>>>>>> >>
>>>>>> >> .section .eh_frame, ....
>>>>>> >> garbage
>>>>>> >>
>>>>>> >> that the linker has to parse?
>>>>>> >
>>>>>> >
>>>>>> > I think the answer is case-by-case, but I don't think we have to
>>>>>> guarantee
>>>>>> > to recover from errors caused by carefully-crafted malicious object
>>>>>> files.
>>>>>> > (Is there anyone who disagrees with that?)
>>>>>>
>>>>>> It is definitely not a use case *I* have an interest in. I just want
>>>>>> to be an agreement on what use case we want to support at the moment.
>>>>>> Is it "any .o file", "any llvm-mc or gas produced .o", "any clang or
>>>>>> gcc produced .o not including inline asm"?
>>>>>>
>>>>>> Cheers,
>>>>>> Rafael
>>>>>>
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> llvm-commits mailing list
>>>> llvm-commits at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160202/a90188dd/attachment.html>