[PATCH] D16599: ELF: Define another entry point.

Tue Feb 2 15:39:21 PST 2016

And looks like r259597 was the last patch that had to be submitted to
realize this policy. So, you can call the linker's entry point if you want
and expect it to return in normal use cases. I hope that satisfies people's
need who were looking for an alternative to the old linker's main function!

On Tue, Feb 2, 2016 at 3:28 PM, Rui Ueyama <ruiu at google.com> wrote:

> Thank you for the advice. I refined it a bit.
>
> diff --git a/ELF/README.md b/ELF/README.md
> index 49b8167..b71faf4 100644
> --- a/ELF/README.md
> +++ b/ELF/README.md
> @@ -19,3 +19,16 @@ Achieving good performance is one of our goals. It's
> too early to reach a
>  conclusion, but we are optimistic about that as it currently seems to be
> faster
>  than GNU gold. It will be interesting to compare when we are close to
> feature
>  parity.
> +
> +Library Use
> +-----------
> +
> +You can embed LLD to your program by linking against it and calling the
> linker's
> +entry point function lld::elf2::link.
> +
> +The current policy is that it is your reponsibility to give trustworthy
> object
> +files. The function is guaranteed to return as long as you do not pass
> corrupted
> +or malicious object files. A corrupted file could cause a fatal error or
> SEGV.
> +That being said, you don't need to worry too much about it if you create
> object
> +files in a usual way and give it to the linker (it is naturally expected
> to
> +work, or otherwise it's a linker's bug.)
>
> On Tue, Feb 2, 2016 at 3:21 PM, Sean Silva <chisophugis at gmail.com> wrote:
>
>>
>>
>> On Tue, Feb 2, 2016 at 3:04 PM, Rui Ueyama <ruiu at google.com> wrote:
>>
>>> I'm going to add to the linker.
>>>
>>> +// Entry point of the ELF linker. Returns true on success. It is
>>> +// guaranteed to return as long as you do not pass corrupted or malicious
>>> +// object files. A corrupted file could cause a fatal error or SEGV.
>>> +// That being said, you don't need to worry too much about it if you
>>> +// create object files in a usual way and feed it to the linker
>>> +// (it is naturally expected to work, or otherwise that's a linker's bug.)
>>>  bool link(ArrayRef<const char *> Args, llvm::raw_ostream &Error = llvm::errs());
>>>
>>>
>>>
>> That sounds fine to me. I would consider adding it to README.txt instead,
>> and to phrase it as "this is our current policy" instead of casual advice
>> (otherwise it is difficult to use as a starting point for discussion IMO).
>> Whatever you think makes sense though.
>>
>> -- Sean Silva
>>
>>
>>> On Tue, Feb 2, 2016 at 2:25 PM, Rui Ueyama <ruiu at google.com> wrote:
>>>
>>>> On Tue, Feb 2, 2016 at 1:42 PM, Sean Silva <chisophugis at gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Tue, Feb 2, 2016 at 8:44 AM, David Blaikie <dblaikie at gmail.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Feb 1, 2016 at 11:05 PM, Sean Silva via llvm-commits <
>>>>>> llvm-commits at lists.llvm.org> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Feb 1, 2016 at 12:27 PM, Rui Ueyama <ruiu at google.com> wrote:
>>>>>>>
>>>>>>>> Even if a file is technically sane, you can craft a malicious one;
>>>>>>>> for example, you can probably crash the linker by OOM by setting a very
>>>>>>>> large number as an alignment requirement for each section so that the size
>>>>>>>> of output becomes huge. It is easily doable using assembly. So my answer
>>>>>>>> is "any clang or gcc produced .o not including inline asm". (It does not
>>>>>>>> mean that we do not try to recover from errors caused by bad assembly code,
>>>>>>>> but we don't/can't guarantee 100% recovery.)
>>>>>>>>
>>>>>>>
>>>>>>> You can probably find some way to set the alignment using an
>>>>>>> attribute or whatever even from clang (and without inlineasm).
>>>>>>>
>>>>>>> I don't think there is a platonically-ideal answer for this. It's
>>>>>>> more about goals:
>>>>>>> - as a command line tool, we don't want legitimate users to see us
>>>>>>> crashing during normal use (if a user is intentionally trying to kill LLD,
>>>>>>> it is not as embarrassing though, so we don't need to worry much about that
>>>>>>> case).
>>>>>>> - we want to be useful (someday) as a library that can be safely
>>>>>>> used in-process, so we need to provide certain guarantees (but these are
>>>>>>> not hugely constraining, because we can assume that the calling code is
>>>>>>> programmatically generating the file in good faith).
>>>>>>>
>>>>>>
>>>>>> I don't think this is a valid assumption for all programmatic users
>>>>>> (& indeed Clang and LLVM both have ways of accepting untrusted inputs - the
>>>>>> assumption in LLVM is "if it's not already in the in-memory representation,
>>>>>> it's not trusted" (parsing bitcode, reading files, etc) and I think the
>>>>>> same would probably be reasonable in lld - callers with object contents in
>>>>>> memory (or even a higher level representation - the same as the difference
>>>>>> between LLVM IR and LLVM bitcode in a memory buffer) can choose to have lld
>>>>>> assume validity (if they produced it from an API they trust/are willing to
>>>>>> bugfix if it's ever wrong) or ask for verification (if they got the object
>>>>>> over a network connection or other untrusted source (perhaps read it out of
>>>>>> a compressed archive, etc))). An API integration of LLD into the Clang
>>>>>> driver wouldn't be a sound place to make this assumption - some objects may
>>>>>> be passed to Clang (not generated by it) from some other compilation or
>>>>>> source, for example.
>>>>>>
>>>>>
>>>>> I think these can serve as a baseline that we can document / elaborate
>>>>> on down the road though.
>>>>> For the moment, we can document our current intentions/policies. That
>>>>> way people can either a) concretely file bug reports against us for
>>>>> violating our intentions or b) we can have a concrete discussion on
>>>>> llvm-dev about changing those documented policies/intentions.
>>>>>
>>>>
>>>> Good point. We need to document the current policy whatever it is. And
>>>> the current policy after I submit these pending patches is that "the linker
>>>> doesn't crash or exit (or it is a bug) as long as you don't give
>>>> corrupted/malicious object files." I will write that to the Driver file
>>>> which all people who wants to use will see.
>>>>
>>>> It seems our current situation is that any time anything related to
>>>>> this comes up, everybody and their dog start talking about different
>>>>> hypothetical situations that nobody is actively working on using LLD for
>>>>> (since there are other, higher priorities right now). These may or may not
>>>>> be true, or the parallels to clang/LLVM may or may not be true, but
>>>>> currently we don't have a starting point for a useful discussion. It is all
>>>>> ad-hoc. We need a fixed point of reference for future discussion and what I
>>>>> posted (in this thread and others) seems like a sweet spot to start with;
>>>>> it provides reasonable guarantees and avoids overcommitting our development
>>>>> effort at an early stage.
>>>>>
>>>> I actually have points to say in response to what you said, but here in
>>>>> an llvm-commits discussion is not the right place to discuss it.
>>>>>
>>>>> -- Sean Silva
>>>>>
>>>>>
>>>>>>
>>>>>>> -- Sean Silva
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Feb 1, 2016 at 12:11 PM, Rafael Espíndola <
>>>>>>>> rafael.espindola at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> On 1 February 2016 at 15:06, Rui Ueyama <ruiu at google.com> wrote:
>>>>>>>>> > On Mon, Feb 1, 2016 at 11:57 AM, Rafael Espíndola
>>>>>>>>> > <rafael.espindola at gmail.com> wrote:
>>>>>>>>> >>
>>>>>>>>> >> On 1 February 2016 at 14:46, Sean Silva <chisophugis at gmail.com>
>>>>>>>>> wrote:
>>>>>>>>> >> > I think one of the main use cases that has been requested is
>>>>>>>>> to be able
>>>>>>>>> >> > to
>>>>>>>>> >> > programmatically call the linker with "known good" object
>>>>>>>>> files (i.e.
>>>>>>>>> >> > produced by the compiler). That simplifies things a lot.
>>>>>>>>> Rui's recent
>>>>>>>>> >> > patches that are thread_local'izing existing globals seems
>>>>>>>>> like a
>>>>>>>>> >> > satisfactory approach. Or am I missing something?
>>>>>>>>> >>
>>>>>>>>> >> Yes, known good files are a lot easier to handle. We just have
>>>>>>>>> to be
>>>>>>>>> >> clear what "known good" is.
>>>>>>>>> >>
>>>>>>>>> >> > The R_X86_64_REX_GOTPCRELX situation can probably be likened
>>>>>>>>> to someone
>>>>>>>>> >> > giving clang a piece of source code with an inline asm that
>>>>>>>>> has:
>>>>>>>>> >> >
>>>>>>>>> >> > .text
>>>>>>>>> >> > .byte <some garbage>
>>>>>>>>> >> >
>>>>>>>>> >> > in it. We don't guarantee that the output "makes sense"
>>>>>>>>> because there's
>>>>>>>>> >> > really no way for us to know what "makes sense" in a precise
>>>>>>>>> way (i.e.,
>>>>>>>>> >> > a
>>>>>>>>> >> > way that we can program).
>>>>>>>>> >>
>>>>>>>>> >> Would we still be required to check the offsets so we don't
>>>>>>>>> crash? An
>>>>>>>>> >> assembly file can contain
>>>>>>>>> >>
>>>>>>>>> >> .reloc 0, R_X86_64_REX_GOTPCRELX, foo
>>>>>>>>> >> .long 4
>>>>>>>>> >>
>>>>>>>>> >> which would put that relocation in an invalid location. In
>>>>>>>>> general, is
>>>>>>>>> >> an arbitrary assembly file to be considered "known good"? Is
>>>>>>>>> that true
>>>>>>>>> >> even for things like
>>>>>>>>> >>
>>>>>>>>> >> .section .eh_frame, ....
>>>>>>>>> >> garbage
>>>>>>>>> >>
>>>>>>>>> >> that the linker has to parse?
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > I think the answer is case-by-case, but I don't think we have to
>>>>>>>>> guarantee
>>>>>>>>> > to recover from errors caused by carefully-crafted malicious
>>>>>>>>> object files.
>>>>>>>>> > (Is there anyone who disagrees with that?)
>>>>>>>>>
>>>>>>>>> It is definitely not a use case *I* have an interest in. I just
>>>>>>>>> want
>>>>>>>>> to be an agreement on what use case we want to support at the
>>>>>>>>> moment.
>>>>>>>>> Is it "any .o file", "any llvm-mc or gas produced .o", "any clang
>>>>>>>>> or
>>>>>>>>> gcc produced .o not including inline asm"?
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> Rafael
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> llvm-commits mailing list
>>>>>>> llvm-commits at lists.llvm.org
>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160202/6bc851d3/attachment.html>