[PATCH] D16599: ELF: Define another entry point.

Sean Silva via llvm-commits llvm-commits at lists.llvm.org
Tue Feb 2 13:42:59 PST 2016

On Tue, Feb 2, 2016 at 8:44 AM, David Blaikie <dblaikie at gmail.com> wrote:

> On Mon, Feb 1, 2016 at 11:05 PM, Sean Silva via llvm-commits <
> llvm-commits at lists.llvm.org> wrote:
>> On Mon, Feb 1, 2016 at 12:27 PM, Rui Ueyama <ruiu at google.com> wrote:
>>> Even if a file is technically sane, you can craft a malicious one; for
>>> example, you can probably crash the linker by OOM by setting a very large
>>> number as an alignment requirement for each section so that the size of
>>> output becomes huge. It is easily doable using assembly. So my answer
>>> is "any clang or gcc produced .o not including inline asm". (It does not
>>> mean that we do not try to recover from errors caused by bad assembly code,
>>> but we don't/can't guarantee 100% recovery.)
>> You can probably find some way to set the alignment using an attribute or
>> whatever even from clang (and without inlineasm).
>> I don't think there is a platonically-ideal answer for this. It's more
>> about goals:
>> - as a command line tool, we don't want legitimate users to see us
>> crashing during normal use (if a user is intentionally trying to kill LLD,
>> it is not as embarrassing though, so we don't need to worry much about that
>> case).
>> - we want to be useful (someday) as a library that can be safely used
>> in-process, so we need to provide certain guarantees (but these are not
>> hugely constraining, because we can assume that the calling code is
>> programmatically generating the file in good faith).
> I don't think this is a valid assumption for all programmatic users (&
> indeed Clang and LLVM both have ways of accepting untrusted inputs - the
> assumption in LLVM is "if it's not already in the in-memory representation,
> it's not trusted" (parsing bitcode, reading files, etc) and I think the
> same would probably be reasonable in lld - callers with object contents in
> memory (or even a higher level representation - the same as the difference
> between LLVM IR and LLVM bitcode in a memory buffer) can choose to have lld
> assume validity (if they produced it from an API they trust/are willing to
> bugfix if it's ever wrong) or ask for verification (if they got the object
> over a network connection or other untrusted source (perhaps read it out of
> a compressed archive, etc))). An API integration of LLD into the Clang
> driver wouldn't be a sound place to make this assumption - some objects may
> be passed to Clang (not generated by it) from some other compilation or
> source, for example.

I think these can serve as a baseline that we can document / elaborate on
down the road though.
For the moment, we can document our current intentions/policies. That way
people can either a) concretely file bug reports against us for violating
our intentions or b) we can have a concrete discussion on llvm-dev about
changing those documented policies/intentions.

It seems our current situation is that any time anything related to this
comes up, everybody and their dog start talking about different
hypothetical situations that nobody is actively working on using LLD for
(since there are other, higher priorities right now). These may or may not
be true, or the parallels to clang/LLVM may or may not be true, but
currently we don't have a starting point for a useful discussion. It is all
ad-hoc. We need a fixed point of reference for future discussion and what I
posted (in this thread and others) seems like a sweet spot to start with;
it provides reasonable guarantees and avoids overcommitting our development
effort at an early stage.

I actually have points to say in response to what you said, but here in an
llvm-commits discussion is not the right place to discuss it.

-- Sean Silva

>> -- Sean Silva
>>> On Mon, Feb 1, 2016 at 12:11 PM, Rafael EspĂ­ndola <
>>> rafael.espindola at gmail.com> wrote:
>>>> On 1 February 2016 at 15:06, Rui Ueyama <ruiu at google.com> wrote:
>>>> > On Mon, Feb 1, 2016 at 11:57 AM, Rafael EspĂ­ndola
>>>> > <rafael.espindola at gmail.com> wrote:
>>>> >>
>>>> >> On 1 February 2016 at 14:46, Sean Silva <chisophugis at gmail.com>
>>>> wrote:
>>>> >> > I think one of the main use cases that has been requested is to be
>>>> able
>>>> >> > to
>>>> >> > programmatically call the linker with "known good" object files
>>>> (i.e.
>>>> >> > produced by the compiler). That simplifies things a lot. Rui's
>>>> recent
>>>> >> > patches that are thread_local'izing existing globals seems like a
>>>> >> > satisfactory approach. Or am I missing something?
>>>> >>
>>>> >> Yes, known good files are a lot easier to handle. We just have to be
>>>> >> clear what "known good" is.
>>>> >>
>>>> >> > The R_X86_64_REX_GOTPCRELX situation can probably be likened to
>>>> someone
>>>> >> > giving clang a piece of source code with an inline asm that has:
>>>> >> >
>>>> >> > .text
>>>> >> > .byte <some garbage>
>>>> >> >
>>>> >> > in it. We don't guarantee that the output "makes sense" because
>>>> there's
>>>> >> > really no way for us to know what "makes sense" in a precise way
>>>> (i.e.,
>>>> >> > a
>>>> >> > way that we can program).
>>>> >>
>>>> >> Would we still be required to check the offsets so we don't crash? An
>>>> >> assembly file can contain
>>>> >>
>>>> >> .reloc 0, R_X86_64_REX_GOTPCRELX, foo
>>>> >> .long 4
>>>> >>
>>>> >> which would put that relocation in an invalid location. In general,
>>>> is
>>>> >> an arbitrary assembly file to be considered "known good"? Is that
>>>> true
>>>> >> even for things like
>>>> >>
>>>> >> .section .eh_frame, ....
>>>> >> garbage
>>>> >>
>>>> >> that the linker has to parse?
>>>> >
>>>> >
>>>> > I think the answer is case-by-case, but I don't think we have to
>>>> guarantee
>>>> > to recover from errors caused by carefully-crafted malicious object
>>>> files.
>>>> > (Is there anyone who disagrees with that?)
>>>> It is definitely not a use case *I* have an interest in. I just want
>>>> to be an agreement on what use case we want to support at the moment.
>>>> Is it "any .o file", "any llvm-mc or gas produced .o", "any clang or
>>>> gcc produced .o not including inline asm"?
>>>> Cheers,
>>>> Rafael
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160202/23265876/attachment.html>

More information about the llvm-commits mailing list