[PATCH] D16599: ELF: Define another entry point.

Tue Feb 2 12:57:19 PST 2016

On Tue, Feb 2, 2016 at 12:00 PM, Rui Ueyama <ruiu at google.com> wrote:

> On Tue, Feb 2, 2016 at 11:44 AM, David Blaikie <dblaikie at gmail.com> wrote:
>
>>
>>
>> On Tue, Feb 2, 2016 at 11:36 AM, Rui Ueyama <ruiu at google.com> wrote:
>>
>>> On Tue, Feb 2, 2016 at 11:07 AM, David Blaikie <dblaikie at gmail.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Tue, Feb 2, 2016 at 10:59 AM, Rui Ueyama <ruiu at google.com> wrote:
>>>>
>>>>> On Tue, Feb 2, 2016 at 8:44 AM, David Blaikie <dblaikie at gmail.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Feb 1, 2016 at 11:05 PM, Sean Silva via llvm-commits <
>>>>>> llvm-commits at lists.llvm.org> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Feb 1, 2016 at 12:27 PM, Rui Ueyama <ruiu at google.com> wrote:
>>>>>>>
>>>>>>>> Even if a file is technically sane, you can craft a malicious one;
>>>>>>>> for example, you can probably crash the linker by OOM by setting a very
>>>>>>>> large number as an alignment requirement for each section so that the size
>>>>>>>> of output becomes huge. It is easily doable using assembly. So my answer
>>>>>>>> is "any clang or gcc produced .o not including inline asm". (It does not
>>>>>>>> mean that we do not try to recover from errors caused by bad assembly code,
>>>>>>>> but we don't/can't guarantee 100% recovery.)
>>>>>>>>
>>>>>>>
>>>>>>> You can probably find some way to set the alignment using an
>>>>>>> attribute or whatever even from clang (and without inlineasm).
>>>>>>>
>>>>>>> I don't think there is a platonically-ideal answer for this. It's
>>>>>>> more about goals:
>>>>>>> - as a command line tool, we don't want legitimate users to see us
>>>>>>> crashing during normal use (if a user is intentionally trying to kill LLD,
>>>>>>> it is not as embarrassing though, so we don't need to worry much about that
>>>>>>> case).
>>>>>>> - we want to be useful (someday) as a library that can be safely
>>>>>>> used in-process, so we need to provide certain guarantees (but these are
>>>>>>> not hugely constraining, because we can assume that the calling code is
>>>>>>> programmatically generating the file in good faith).
>>>>>>>
>>>>>>
>>>>>> I don't think this is a valid assumption for all programmatic users
>>>>>> (& indeed Clang and LLVM both have ways of accepting untrusted inputs - the
>>>>>> assumption in LLVM is "if it's not already in the in-memory representation,
>>>>>> it's not trusted" (parsing bitcode, reading files, etc) and I think the
>>>>>> same would probably be reasonable in lld - callers with object contents in
>>>>>> memory (or even a higher level representation - the same as the difference
>>>>>> between LLVM IR and LLVM bitcode in a memory buffer) can choose to have lld
>>>>>> assume validity (if they produced it from an API they trust/are willing to
>>>>>> bugfix if it's ever wrong) or ask for verification (if they got the object
>>>>>> over a network connection or other untrusted source (perhaps read it out of
>>>>>> a compressed archive, etc))). An API integration of LLD into the Clang
>>>>>> driver wouldn't be a sound place to make this assumption - some objects may
>>>>>> be passed to Clang (not generated by it) from some other compilation or
>>>>>> source, for example.
>>>>>>
>>>>>
>>>>> The difference is we do not have an in-memory representation of object
>>>>> files, or we are using mmap'ed ELF files as the internal representation.
>>>>> So, if files are not not trustworthy, you can not make any assumption on
>>>>> the data you are handling throughout the program execution time. That's
>>>>> probably too hostile environment and doing error check on the way would be
>>>>> error-prone, slow, or complicate the code.
>>>>>
>>>>
>>>> I'm not sure I believe that's the case (that it's necessarily
>>>> slow/complicated/error-prone) anymore than Clang is - it has untrusted
>>>> inputs & has to handle all the possible ways people can write incorrect
>>>> source code. (& LLVM too, but, yes - it often gets trusted input in-memory,
>>>> but once it goes to disk, LTO for example, verifies it every time - in the
>>>> same way I would expect a linker to do so for object files off disk)
>>>>
>>>>
>>>>> If we use an analogy of Clang and LLVM, we probably want to have a
>>>>> separate verifier for object files which you can run on object files from
>>>>> untrusted source before passing it to the link() function (so, although the
>>>>> two are in the same format, untrusted ELF files are "external
>>>>> representation", and verified ELF files are "internal representation").
>>>>>
>>>>
>>>> *nod* but I'm suggesting if it's from disk it's untrusted (at least
>>>> that's how LTO, LLVM, and Clang work) & since that's the majority case for
>>>> a linker, that it's likely to be the case we care about for API use and for
>>>> performance. LLVM's JIT is the sort of case I imagine having "trusted"
>>>> inputs - generated in memory by a trusted API, any time the generation and
>>>> consumption disagree on validity it would be considered a programmer error
>>>> and fixed as a bug in the program as a whole (by fixing producer or
>>>> consumer). (such a JIT would also have untrusted inputs it would read from
>>>> the filesystem too, no doubt - predefined libraries to link in, etc)
>>>>
>>>
>>> There may be a way to handle all possible inputs all the way throughout
>>> the linker execution time, but I think that the discussion went a bit too
>>> far. We have a number of good patches (which I hoped) that at least stop
>>> linker from exiting as long as inputs are not malicious or corrupted, and I
>>> expect that should work at least a transient, and submitting them doesn't
>>> prevent us from doing more in future if we need to. Can you give us time to
>>> work on stuff that's not directly related to this topic?
>>>
>>
>> Sure - didn't mean to rush anyone, was just saying "I don't think this is
>> an entire answer/where we want to be long-term" (the tone of the
>> conversation/some of the statements seemed to sound like "this addresses
>> the issue, we wouldn't need to do anything else for API users & anything
>> else would include hardening LLD" - I think it will be necessary to be
>> API-usable for untrusted inputs even for fairly basic uses and that
>> security level hardening doesn't have to be the goal as soon as we step
>> into this area)
>>
>> Just trying to be clear, so that if, 6 months from now, the topic comes
>> up again there's not another round of confusion over what's
>> reasonable/intended/in-scope or out of scope.
>>
>
> For the record, I didn't agree that we absolutely have to handle files
> read from disk as untrusted. I agree that that's a good thing, and I
> promise I will make a reasonable effort, but that is not a conclusion of
> this thread. (I'm sorry to be defensive saying, but I'm afraid that if we
> come back 6 months from now, it would have looked like a conclusion of this
> thread.)
>

Fair enough - I just wanted to register a dissenting opinion to Sean's
(which didn't seem like an opinion expressed by the original patches you
sent out, Rui - but something that changed/was implied along the way,
perhaps) to make it clear that this doesn't necessarily meet the needs of
some fairly plausible/basic uses of lld-as-a-library.

I appreciate your perspective and clarity here, Rui, whichever way it goes
for now/later.

- Dave

>
>
>>>
>>>> - David
>>>>
>>>>
>>>>>
>>>>>
>>>>>>
>>>>>
>>>>>>> -- Sean Silva
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Feb 1, 2016 at 12:11 PM, Rafael Espíndola <
>>>>>>>> rafael.espindola at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> On 1 February 2016 at 15:06, Rui Ueyama <ruiu at google.com> wrote:
>>>>>>>>> > On Mon, Feb 1, 2016 at 11:57 AM, Rafael Espíndola
>>>>>>>>> > <rafael.espindola at gmail.com> wrote:
>>>>>>>>> >>
>>>>>>>>> >> On 1 February 2016 at 14:46, Sean Silva <chisophugis at gmail.com>
>>>>>>>>> wrote:
>>>>>>>>> >> > I think one of the main use cases that has been requested is
>>>>>>>>> to be able
>>>>>>>>> >> > to
>>>>>>>>> >> > programmatically call the linker with "known good" object
>>>>>>>>> files (i.e.
>>>>>>>>> >> > produced by the compiler). That simplifies things a lot.
>>>>>>>>> Rui's recent
>>>>>>>>> >> > patches that are thread_local'izing existing globals seems
>>>>>>>>> like a
>>>>>>>>> >> > satisfactory approach. Or am I missing something?
>>>>>>>>> >>
>>>>>>>>> >> Yes, known good files are a lot easier to handle. We just have
>>>>>>>>> to be
>>>>>>>>> >> clear what "known good" is.
>>>>>>>>> >>
>>>>>>>>> >> > The R_X86_64_REX_GOTPCRELX situation can probably be likened
>>>>>>>>> to someone
>>>>>>>>> >> > giving clang a piece of source code with an inline asm that
>>>>>>>>> has:
>>>>>>>>> >> >
>>>>>>>>> >> > .text
>>>>>>>>> >> > .byte <some garbage>
>>>>>>>>> >> >
>>>>>>>>> >> > in it. We don't guarantee that the output "makes sense"
>>>>>>>>> because there's
>>>>>>>>> >> > really no way for us to know what "makes sense" in a precise
>>>>>>>>> way (i.e.,
>>>>>>>>> >> > a
>>>>>>>>> >> > way that we can program).
>>>>>>>>> >>
>>>>>>>>> >> Would we still be required to check the offsets so we don't
>>>>>>>>> crash? An
>>>>>>>>> >> assembly file can contain
>>>>>>>>> >>
>>>>>>>>> >> .reloc 0, R_X86_64_REX_GOTPCRELX, foo
>>>>>>>>> >> .long 4
>>>>>>>>> >>
>>>>>>>>> >> which would put that relocation in an invalid location. In
>>>>>>>>> general, is
>>>>>>>>> >> an arbitrary assembly file to be considered "known good"? Is
>>>>>>>>> that true
>>>>>>>>> >> even for things like
>>>>>>>>> >>
>>>>>>>>> >> .section .eh_frame, ....
>>>>>>>>> >> garbage
>>>>>>>>> >>
>>>>>>>>> >> that the linker has to parse?
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > I think the answer is case-by-case, but I don't think we have to
>>>>>>>>> guarantee
>>>>>>>>> > to recover from errors caused by carefully-crafted malicious
>>>>>>>>> object files.
>>>>>>>>> > (Is there anyone who disagrees with that?)
>>>>>>>>>
>>>>>>>>> It is definitely not a use case *I* have an interest in. I just
>>>>>>>>> want
>>>>>>>>> to be an agreement on what use case we want to support at the
>>>>>>>>> moment.
>>>>>>>>> Is it "any .o file", "any llvm-mc or gas produced .o", "any clang
>>>>>>>>> or
>>>>>>>>> gcc produced .o not including inline asm"?
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> Rafael
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> llvm-commits mailing list
>>>>>>> llvm-commits at lists.llvm.org
>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160202/b57e6eac/attachment.html>