[PATCH] D16599: ELF: Define another entry point.

Tue Feb 2 22:03:19 PST 2016

On Tue, Feb 2, 2016 at 3:39 PM, Rui Ueyama <ruiu at google.com> wrote:

> And looks like r259597 was the last patch that had to be submitted to
> realize this policy. So, you can call the linker's entry point if you want
> and expect it to return in normal use cases. I hope that satisfies people's
> need who were looking for an alternative to the old linker's main function!
>

Congratulations! I'm glad there is a sweet spot of
maintainability/complexity and safe library use.

And thanks for taking the time to implement all the changes needed for this!

-- Sean Silva

>
> On Tue, Feb 2, 2016 at 3:28 PM, Rui Ueyama <ruiu at google.com> wrote:
>
>> Thank you for the advice. I refined it a bit.
>>
>> diff --git a/ELF/README.md b/ELF/README.md
>> index 49b8167..b71faf4 100644
>> --- a/ELF/README.md
>> +++ b/ELF/README.md
>> @@ -19,3 +19,16 @@ Achieving good performance is one of our goals. It's
>> too early to reach a
>>  conclusion, but we are optimistic about that as it currently seems to be
>> faster
>>  than GNU gold. It will be interesting to compare when we are close to
>> feature
>>  parity.
>> +
>> +Library Use
>> +-----------
>> +
>> +You can embed LLD to your program by linking against it and calling the
>> linker's
>> +entry point function lld::elf2::link.
>> +
>> +The current policy is that it is your reponsibility to give trustworthy
>> object
>> +files. The function is guaranteed to return as long as you do not pass
>> corrupted
>> +or malicious object files. A corrupted file could cause a fatal error or
>> SEGV.
>> +That being said, you don't need to worry too much about it if you create
>> object
>> +files in a usual way and give it to the linker (it is naturally expected
>> to
>> +work, or otherwise it's a linker's bug.)
>>
>> On Tue, Feb 2, 2016 at 3:21 PM, Sean Silva <chisophugis at gmail.com> wrote:
>>
>>>
>>>
>>> On Tue, Feb 2, 2016 at 3:04 PM, Rui Ueyama <ruiu at google.com> wrote:
>>>
>>>> I'm going to add to the linker.
>>>>
>>>> +// Entry point of the ELF linker. Returns true on success. It is
>>>> +// guaranteed to return as long as you do not pass corrupted or malicious
>>>> +// object files. A corrupted file could cause a fatal error or SEGV.
>>>> +// That being said, you don't need to worry too much about it if you
>>>> +// create object files in a usual way and feed it to the linker
>>>> +// (it is naturally expected to work, or otherwise that's a linker's bug.)
>>>>  bool link(ArrayRef<const char *> Args, llvm::raw_ostream &Error = llvm::errs());
>>>>
>>>>
>>>>
>>> That sounds fine to me. I would consider adding it to README.txt
>>> instead, and to phrase it as "this is our current policy" instead of casual
>>> advice (otherwise it is difficult to use as a starting point for discussion
>>> IMO). Whatever you think makes sense though.
>>>
>>> -- Sean Silva
>>>
>>>
>>>> On Tue, Feb 2, 2016 at 2:25 PM, Rui Ueyama <ruiu at google.com> wrote:
>>>>
>>>>> On Tue, Feb 2, 2016 at 1:42 PM, Sean Silva <chisophugis at gmail.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Feb 2, 2016 at 8:44 AM, David Blaikie <dblaikie at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Feb 1, 2016 at 11:05 PM, Sean Silva via llvm-commits <
>>>>>>> llvm-commits at lists.llvm.org> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Feb 1, 2016 at 12:27 PM, Rui Ueyama <ruiu at google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Even if a file is technically sane, you can craft a malicious one;
>>>>>>>>> for example, you can probably crash the linker by OOM by setting a very
>>>>>>>>> large number as an alignment requirement for each section so that the size
>>>>>>>>> of output becomes huge. It is easily doable using assembly. So my answer
>>>>>>>>> is "any clang or gcc produced .o not including inline asm". (It does not
>>>>>>>>> mean that we do not try to recover from errors caused by bad assembly code,
>>>>>>>>> but we don't/can't guarantee 100% recovery.)
>>>>>>>>>
>>>>>>>>
>>>>>>>> You can probably find some way to set the alignment using an
>>>>>>>> attribute or whatever even from clang (and without inlineasm).
>>>>>>>>
>>>>>>>> I don't think there is a platonically-ideal answer for this. It's
>>>>>>>> more about goals:
>>>>>>>> - as a command line tool, we don't want legitimate users to see us
>>>>>>>> crashing during normal use (if a user is intentionally trying to kill LLD,
>>>>>>>> it is not as embarrassing though, so we don't need to worry much about that
>>>>>>>> case).
>>>>>>>> - we want to be useful (someday) as a library that can be safely
>>>>>>>> used in-process, so we need to provide certain guarantees (but these are
>>>>>>>> not hugely constraining, because we can assume that the calling code is
>>>>>>>> programmatically generating the file in good faith).
>>>>>>>>
>>>>>>>
>>>>>>> I don't think this is a valid assumption for all programmatic users
>>>>>>> (& indeed Clang and LLVM both have ways of accepting untrusted inputs - the
>>>>>>> assumption in LLVM is "if it's not already in the in-memory representation,
>>>>>>> it's not trusted" (parsing bitcode, reading files, etc) and I think the
>>>>>>> same would probably be reasonable in lld - callers with object contents in
>>>>>>> memory (or even a higher level representation - the same as the difference
>>>>>>> between LLVM IR and LLVM bitcode in a memory buffer) can choose to have lld
>>>>>>> assume validity (if they produced it from an API they trust/are willing to
>>>>>>> bugfix if it's ever wrong) or ask for verification (if they got the object
>>>>>>> over a network connection or other untrusted source (perhaps read it out of
>>>>>>> a compressed archive, etc))). An API integration of LLD into the Clang
>>>>>>> driver wouldn't be a sound place to make this assumption - some objects may
>>>>>>> be passed to Clang (not generated by it) from some other compilation or
>>>>>>> source, for example.
>>>>>>>
>>>>>>
>>>>>> I think these can serve as a baseline that we can document /
>>>>>> elaborate on down the road though.
>>>>>> For the moment, we can document our current intentions/policies. That
>>>>>> way people can either a) concretely file bug reports against us for
>>>>>> violating our intentions or b) we can have a concrete discussion on
>>>>>> llvm-dev about changing those documented policies/intentions.
>>>>>>
>>>>>
>>>>> Good point. We need to document the current policy whatever it is. And
>>>>> the current policy after I submit these pending patches is that "the linker
>>>>> doesn't crash or exit (or it is a bug) as long as you don't give
>>>>> corrupted/malicious object files." I will write that to the Driver file
>>>>> which all people who wants to use will see.
>>>>>
>>>>> It seems our current situation is that any time anything related to
>>>>>> this comes up, everybody and their dog start talking about different
>>>>>> hypothetical situations that nobody is actively working on using LLD for
>>>>>> (since there are other, higher priorities right now). These may or may not
>>>>>> be true, or the parallels to clang/LLVM may or may not be true, but
>>>>>> currently we don't have a starting point for a useful discussion. It is all
>>>>>> ad-hoc. We need a fixed point of reference for future discussion and what I
>>>>>> posted (in this thread and others) seems like a sweet spot to start with;
>>>>>> it provides reasonable guarantees and avoids overcommitting our development
>>>>>> effort at an early stage.
>>>>>>
>>>>> I actually have points to say in response to what you said, but here
>>>>>> in an llvm-commits discussion is not the right place to discuss it.
>>>>>>
>>>>>> -- Sean Silva
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>> -- Sean Silva
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Feb 1, 2016 at 12:11 PM, Rafael Espíndola <
>>>>>>>>> rafael.espindola at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> On 1 February 2016 at 15:06, Rui Ueyama <ruiu at google.com> wrote:
>>>>>>>>>> > On Mon, Feb 1, 2016 at 11:57 AM, Rafael Espíndola
>>>>>>>>>> > <rafael.espindola at gmail.com> wrote:
>>>>>>>>>> >>
>>>>>>>>>> >> On 1 February 2016 at 14:46, Sean Silva <chisophugis at gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>> >> > I think one of the main use cases that has been requested is
>>>>>>>>>> to be able
>>>>>>>>>> >> > to
>>>>>>>>>> >> > programmatically call the linker with "known good" object
>>>>>>>>>> files (i.e.
>>>>>>>>>> >> > produced by the compiler). That simplifies things a lot.
>>>>>>>>>> Rui's recent
>>>>>>>>>> >> > patches that are thread_local'izing existing globals seems
>>>>>>>>>> like a
>>>>>>>>>> >> > satisfactory approach. Or am I missing something?
>>>>>>>>>> >>
>>>>>>>>>> >> Yes, known good files are a lot easier to handle. We just have
>>>>>>>>>> to be
>>>>>>>>>> >> clear what "known good" is.
>>>>>>>>>> >>
>>>>>>>>>> >> > The R_X86_64_REX_GOTPCRELX situation can probably be likened
>>>>>>>>>> to someone
>>>>>>>>>> >> > giving clang a piece of source code with an inline asm that
>>>>>>>>>> has:
>>>>>>>>>> >> >
>>>>>>>>>> >> > .text
>>>>>>>>>> >> > .byte <some garbage>
>>>>>>>>>> >> >
>>>>>>>>>> >> > in it. We don't guarantee that the output "makes sense"
>>>>>>>>>> because there's
>>>>>>>>>> >> > really no way for us to know what "makes sense" in a precise
>>>>>>>>>> way (i.e.,
>>>>>>>>>> >> > a
>>>>>>>>>> >> > way that we can program).
>>>>>>>>>> >>
>>>>>>>>>> >> Would we still be required to check the offsets so we don't
>>>>>>>>>> crash? An
>>>>>>>>>> >> assembly file can contain
>>>>>>>>>> >>
>>>>>>>>>> >> .reloc 0, R_X86_64_REX_GOTPCRELX, foo
>>>>>>>>>> >> .long 4
>>>>>>>>>> >>
>>>>>>>>>> >> which would put that relocation in an invalid location. In
>>>>>>>>>> general, is
>>>>>>>>>> >> an arbitrary assembly file to be considered "known good"? Is
>>>>>>>>>> that true
>>>>>>>>>> >> even for things like
>>>>>>>>>> >>
>>>>>>>>>> >> .section .eh_frame, ....
>>>>>>>>>> >> garbage
>>>>>>>>>> >>
>>>>>>>>>> >> that the linker has to parse?
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> > I think the answer is case-by-case, but I don't think we have
>>>>>>>>>> to guarantee
>>>>>>>>>> > to recover from errors caused by carefully-crafted malicious
>>>>>>>>>> object files.
>>>>>>>>>> > (Is there anyone who disagrees with that?)
>>>>>>>>>>
>>>>>>>>>> It is definitely not a use case *I* have an interest in. I just
>>>>>>>>>> want
>>>>>>>>>> to be an agreement on what use case we want to support at the
>>>>>>>>>> moment.
>>>>>>>>>> Is it "any .o file", "any llvm-mc or gas produced .o", "any clang
>>>>>>>>>> or
>>>>>>>>>> gcc produced .o not including inline asm"?
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>> Rafael
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> llvm-commits mailing list
>>>>>>>> llvm-commits at lists.llvm.org
>>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160202/c032f20c/attachment.html>