[PATCH] D16599: ELF: Define another entry point.

Tue Feb 2 11:44:22 PST 2016

On Tue, Feb 2, 2016 at 11:36 AM, Rui Ueyama <ruiu at google.com> wrote:

> On Tue, Feb 2, 2016 at 11:07 AM, David Blaikie <dblaikie at gmail.com> wrote:
>
>>
>>
>> On Tue, Feb 2, 2016 at 10:59 AM, Rui Ueyama <ruiu at google.com> wrote:
>>
>>> On Tue, Feb 2, 2016 at 8:44 AM, David Blaikie <dblaikie at gmail.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Mon, Feb 1, 2016 at 11:05 PM, Sean Silva via llvm-commits <
>>>> llvm-commits at lists.llvm.org> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Mon, Feb 1, 2016 at 12:27 PM, Rui Ueyama <ruiu at google.com> wrote:
>>>>>
>>>>>> Even if a file is technically sane, you can craft a malicious one;
>>>>>> for example, you can probably crash the linker by OOM by setting a very
>>>>>> large number as an alignment requirement for each section so that the size
>>>>>> of output becomes huge. It is easily doable using assembly. So my answer
>>>>>> is "any clang or gcc produced .o not including inline asm". (It does not
>>>>>> mean that we do not try to recover from errors caused by bad assembly code,
>>>>>> but we don't/can't guarantee 100% recovery.)
>>>>>>
>>>>>
>>>>> You can probably find some way to set the alignment using an attribute
>>>>> or whatever even from clang (and without inlineasm).
>>>>>
>>>>> I don't think there is a platonically-ideal answer for this. It's more
>>>>> about goals:
>>>>> - as a command line tool, we don't want legitimate users to see us
>>>>> crashing during normal use (if a user is intentionally trying to kill LLD,
>>>>> it is not as embarrassing though, so we don't need to worry much about that
>>>>> case).
>>>>> - we want to be useful (someday) as a library that can be safely used
>>>>> in-process, so we need to provide certain guarantees (but these are not
>>>>> hugely constraining, because we can assume that the calling code is
>>>>> programmatically generating the file in good faith).
>>>>>
>>>>
>>>> I don't think this is a valid assumption for all programmatic users (&
>>>> indeed Clang and LLVM both have ways of accepting untrusted inputs - the
>>>> assumption in LLVM is "if it's not already in the in-memory representation,
>>>> it's not trusted" (parsing bitcode, reading files, etc) and I think the
>>>> same would probably be reasonable in lld - callers with object contents in
>>>> memory (or even a higher level representation - the same as the difference
>>>> between LLVM IR and LLVM bitcode in a memory buffer) can choose to have lld
>>>> assume validity (if they produced it from an API they trust/are willing to
>>>> bugfix if it's ever wrong) or ask for verification (if they got the object
>>>> over a network connection or other untrusted source (perhaps read it out of
>>>> a compressed archive, etc))). An API integration of LLD into the Clang
>>>> driver wouldn't be a sound place to make this assumption - some objects may
>>>> be passed to Clang (not generated by it) from some other compilation or
>>>> source, for example.
>>>>
>>>
>>> The difference is we do not have an in-memory representation of object
>>> files, or we are using mmap'ed ELF files as the internal representation.
>>> So, if files are not not trustworthy, you can not make any assumption on
>>> the data you are handling throughout the program execution time. That's
>>> probably too hostile environment and doing error check on the way would be
>>> error-prone, slow, or complicate the code.
>>>
>>
>> I'm not sure I believe that's the case (that it's necessarily
>> slow/complicated/error-prone) anymore than Clang is - it has untrusted
>> inputs & has to handle all the possible ways people can write incorrect
>> source code. (& LLVM too, but, yes - it often gets trusted input in-memory,
>> but once it goes to disk, LTO for example, verifies it every time - in the
>> same way I would expect a linker to do so for object files off disk)
>>
>>
>>> If we use an analogy of Clang and LLVM, we probably want to have a
>>> separate verifier for object files which you can run on object files from
>>> untrusted source before passing it to the link() function (so, although the
>>> two are in the same format, untrusted ELF files are "external
>>> representation", and verified ELF files are "internal representation").
>>>
>>
>> *nod* but I'm suggesting if it's from disk it's untrusted (at least
>> that's how LTO, LLVM, and Clang work) & since that's the majority case for
>> a linker, that it's likely to be the case we care about for API use and for
>> performance. LLVM's JIT is the sort of case I imagine having "trusted"
>> inputs - generated in memory by a trusted API, any time the generation and
>> consumption disagree on validity it would be considered a programmer error
>> and fixed as a bug in the program as a whole (by fixing producer or
>> consumer). (such a JIT would also have untrusted inputs it would read from
>> the filesystem too, no doubt - predefined libraries to link in, etc)
>>
>
> There may be a way to handle all possible inputs all the way throughout
> the linker execution time, but I think that the discussion went a bit too
> far. We have a number of good patches (which I hoped) that at least stop
> linker from exiting as long as inputs are not malicious or corrupted, and I
> expect that should work at least a transient, and submitting them doesn't
> prevent us from doing more in future if we need to. Can you give us time to
> work on stuff that's not directly related to this topic?
>

Sure - didn't mean to rush anyone, was just saying "I don't think this is
an entire answer/where we want to be long-term" (the tone of the
conversation/some of the statements seemed to sound like "this addresses
the issue, we wouldn't need to do anything else for API users & anything
else would include hardening LLD" - I think it will be necessary to be
API-usable for untrusted inputs even for fairly basic uses and that
security level hardening doesn't have to be the goal as soon as we step
into this area)

Just trying to be clear, so that if, 6 months from now, the topic comes up
again there's not another round of confusion over what's
reasonable/intended/in-scope or out of scope.

>
>
>> - David
>>
>>
>>>
>>>
>>>>
>>>
>>>>> -- Sean Silva
>>>>>
>>>>>
>>>>>>
>>>>>> On Mon, Feb 1, 2016 at 12:11 PM, Rafael Espíndola <
>>>>>> rafael.espindola at gmail.com> wrote:
>>>>>>
>>>>>>> On 1 February 2016 at 15:06, Rui Ueyama <ruiu at google.com> wrote:
>>>>>>> > On Mon, Feb 1, 2016 at 11:57 AM, Rafael Espíndola
>>>>>>> > <rafael.espindola at gmail.com> wrote:
>>>>>>> >>
>>>>>>> >> On 1 February 2016 at 14:46, Sean Silva <chisophugis at gmail.com>
>>>>>>> wrote:
>>>>>>> >> > I think one of the main use cases that has been requested is to
>>>>>>> be able
>>>>>>> >> > to
>>>>>>> >> > programmatically call the linker with "known good" object files
>>>>>>> (i.e.
>>>>>>> >> > produced by the compiler). That simplifies things a lot. Rui's
>>>>>>> recent
>>>>>>> >> > patches that are thread_local'izing existing globals seems like
>>>>>>> a
>>>>>>> >> > satisfactory approach. Or am I missing something?
>>>>>>> >>
>>>>>>> >> Yes, known good files are a lot easier to handle. We just have to
>>>>>>> be
>>>>>>> >> clear what "known good" is.
>>>>>>> >>
>>>>>>> >> > The R_X86_64_REX_GOTPCRELX situation can probably be likened to
>>>>>>> someone
>>>>>>> >> > giving clang a piece of source code with an inline asm that has:
>>>>>>> >> >
>>>>>>> >> > .text
>>>>>>> >> > .byte <some garbage>
>>>>>>> >> >
>>>>>>> >> > in it. We don't guarantee that the output "makes sense" because
>>>>>>> there's
>>>>>>> >> > really no way for us to know what "makes sense" in a precise
>>>>>>> way (i.e.,
>>>>>>> >> > a
>>>>>>> >> > way that we can program).
>>>>>>> >>
>>>>>>> >> Would we still be required to check the offsets so we don't
>>>>>>> crash? An
>>>>>>> >> assembly file can contain
>>>>>>> >>
>>>>>>> >> .reloc 0, R_X86_64_REX_GOTPCRELX, foo
>>>>>>> >> .long 4
>>>>>>> >>
>>>>>>> >> which would put that relocation in an invalid location. In
>>>>>>> general, is
>>>>>>> >> an arbitrary assembly file to be considered "known good"? Is that
>>>>>>> true
>>>>>>> >> even for things like
>>>>>>> >>
>>>>>>> >> .section .eh_frame, ....
>>>>>>> >> garbage
>>>>>>> >>
>>>>>>> >> that the linker has to parse?
>>>>>>> >
>>>>>>> >
>>>>>>> > I think the answer is case-by-case, but I don't think we have to
>>>>>>> guarantee
>>>>>>> > to recover from errors caused by carefully-crafted malicious
>>>>>>> object files.
>>>>>>> > (Is there anyone who disagrees with that?)
>>>>>>>
>>>>>>> It is definitely not a use case *I* have an interest in. I just want
>>>>>>> to be an agreement on what use case we want to support at the moment.
>>>>>>> Is it "any .o file", "any llvm-mc or gas produced .o", "any clang or
>>>>>>> gcc produced .o not including inline asm"?
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Rafael
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> llvm-commits mailing list
>>>>> llvm-commits at lists.llvm.org
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160202/7d35740d/attachment.html>