<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Feb 2, 2016 at 12:00 PM, Rui Ueyama <span dir="ltr"><<a href="mailto:ruiu@google.com" target="_blank">ruiu@google.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span class="">On Tue, Feb 2, 2016 at 11:44 AM, David Blaikie <span dir="ltr"><<a href="mailto:dblaikie@gmail.com" target="_blank">dblaikie@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote"><span>On Tue, Feb 2, 2016 at 11:36 AM, Rui Ueyama <span dir="ltr"><<a href="mailto:ruiu@google.com" target="_blank">ruiu@google.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span>On Tue, Feb 2, 2016 at 11:07 AM, David Blaikie <span dir="ltr"><<a href="mailto:dblaikie@gmail.com" target="_blank">dblaikie@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote"><span>On Tue, Feb 2, 2016 at 10:59 AM, Rui Ueyama <span dir="ltr"><<a href="mailto:ruiu@google.com" target="_blank">ruiu@google.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span>On Tue, Feb 2, 2016 at 8:44 AM, David Blaikie <span dir="ltr"><<a href="mailto:dblaikie@gmail.com" target="_blank">dblaikie@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote"><span>On Mon, Feb 1, 2016 at 11:05 PM, Sean Silva via llvm-commits <span dir="ltr"><<a href="mailto:llvm-commits@lists.llvm.org" target="_blank">llvm-commits@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote"><span>On Mon, Feb 1, 2016 at 12:27 PM, Rui Ueyama <span dir="ltr"><<a href="mailto:ruiu@google.com" target="_blank">ruiu@google.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Even if a file is technically sane, you can craft a malicious one; for example, you can probably crash the linker by OOM by setting a very large number as an alignment requirement for each section so that the size of output becomes huge. It is easily doable using assembly. So my answer is "any clang or gcc produced .o not including inline asm". (It does not mean that we do not try to recover from errors caused by bad assembly code, but we don't/can't guarantee 100% recovery.)</div></blockquote><div><br></div></span><div>You can probably find some way to set the alignment using an attribute or whatever even from clang (and without inlineasm).</div><div><br></div><div>I don't think there is a platonically-ideal answer for this. It's more about goals:</div><div>- as a command line tool, we don't want legitimate users to see us crashing during normal use (if a user is intentionally trying to kill LLD, it is not as embarrassing though, so we don't need to worry much about that case).</div><div>- we want to be useful (someday) as a library that can be safely used in-process, so we need to provide certain guarantees (but these are not hugely constraining, because we can assume that the calling code is programmatically generating the file in good faith).</div></div></div></div></blockquote></span><div><br>I don't think this is a valid assumption for all programmatic users (& indeed Clang and LLVM both have ways of accepting untrusted inputs - the assumption in LLVM is "if it's not already in the in-memory representation, it's not trusted" (parsing bitcode, reading files, etc) and I think the same would probably be reasonable in lld - callers with object contents in memory (or even a higher level representation - the same as the difference between LLVM IR and LLVM bitcode in a memory buffer) can choose to have lld assume validity (if they produced it from an API they trust/are willing to bugfix if it's ever wrong) or ask for verification (if they got the object over a network connection or other untrusted source (perhaps read it out of a compressed archive, etc))). An API integration of LLD into the Clang driver wouldn't be a sound place to make this assumption - some objects may be passed to Clang (not generated by it) from some other compilation or source, for example. </div></div></div></div></blockquote><div><br></div></span><div>The difference is we do not have an in-memory representation of object files, or we are using mmap'ed ELF files as the internal representation. So, if files are not not trustworthy, you can not make any assumption on the data you are handling throughout the program execution time. That's probably too hostile environment and doing error check on the way would be error-prone, slow, or complicate the code.</div></div></div></div></blockquote><div><br></div></span><div>I'm not sure I believe that's the case (that it's necessarily slow/complicated/error-prone) anymore than Clang is - it has untrusted inputs & has to handle all the possible ways people can write incorrect source code. (& LLVM too, but, yes - it often gets trusted input in-memory, but once it goes to disk, LTO for example, verifies it every time - in the same way I would expect a linker to do so for object files off disk)</div><span><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div> If we use an analogy of Clang and LLVM, we probably want to have a separate verifier for object files which you can run on object files from untrusted source before passing it to the link() function (so, although the two are in the same format, untrusted ELF files are "external representation", and verified ELF files are "internal representation").</div></div></div></div></blockquote><div><br></div></span><div>*nod* but I'm suggesting if it's from disk it's untrusted (at least that's how LTO, LLVM, and Clang work) & since that's the majority case for a linker, that it's likely to be the case we care about for API use and for performance. LLVM's JIT is the sort of case I imagine having "trusted" inputs - generated in memory by a trusted API, any time the generation and consumption disagree on validity it would be considered a programmer error and fixed as a bug in the program as a whole (by fixing producer or consumer). (such a JIT would also have untrusted inputs it would read from the filesystem too, no doubt - predefined libraries to link in, etc)</div></div></div></div></blockquote><div><br></div></span><div>There may be a way to handle all possible inputs all the way throughout the linker execution time, but I think that the discussion went a bit too far. We have a number of good patches (which I hoped) that at least stop linker from exiting as long as inputs are not malicious or corrupted, and I expect that should work at least a transient, and submitting them doesn't prevent us from doing more in future if we need to. Can you give us time to work on stuff that's not directly related to this topic?</div></div></div></div></blockquote><div><br></div></span><div>Sure - didn't mean to rush anyone, was just saying "I don't think this is an entire answer/where we want to be long-term" (the tone of the conversation/some of the statements seemed to sound like "this addresses the issue, we wouldn't need to do anything else for API users & anything else would include hardening LLD" - I think it will be necessary to be API-usable for untrusted inputs even for fairly basic uses and that security level hardening doesn't have to be the goal as soon as we step into this area)<br><br>Just trying to be clear, so that if, 6 months from now, the topic comes up again there's not another round of confusion over what's reasonable/intended/in-scope or out of scope.</div></div></div></div></blockquote><div><br></div></span><div>For the record, I didn't agree that we absolutely have to handle files read from disk as untrusted. I agree that that's a good thing, and I promise I will make a reasonable effort, but that is not a conclusion of this thread. (I'm sorry to be defensive saying, but I'm afraid that if we come back 6 months from now, it would have looked like a conclusion of this thread.)</div></div></div></div></blockquote><div><br></div><div>Fair enough - I just wanted to register a dissenting opinion to Sean's (which didn't seem like an opinion expressed by the original patches you sent out, Rui - but something that changed/was implied along the way, perhaps) to make it clear that this doesn't necessarily meet the needs of some fairly plausible/basic uses of lld-as-a-library.<br><br>I appreciate your perspective and clarity here, Rui, whichever way it goes for now/later.<br><br>- Dave</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><div class="h5"><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><span><font color="#888888"><br>- David</font></span></div><div><div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div> </div></div></div></div></blockquote><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><br></div><div>-- Sean Silva</div><div><div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Feb 1, 2016 at 12:11 PM, Rafael Espíndola <span dir="ltr"><<a href="mailto:rafael.espindola@gmail.com" target="_blank">rafael.espindola@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div>On 1 February 2016 at 15:06, Rui Ueyama <<a href="mailto:ruiu@google.com" target="_blank">ruiu@google.com</a>> wrote:<br>
> On Mon, Feb 1, 2016 at 11:57 AM, Rafael Espíndola<br>
> <<a href="mailto:rafael.espindola@gmail.com" target="_blank">rafael.espindola@gmail.com</a>> wrote:<br>
>><br>
>> On 1 February 2016 at 14:46, Sean Silva <<a href="mailto:chisophugis@gmail.com" target="_blank">chisophugis@gmail.com</a>> wrote:<br>
>> > I think one of the main use cases that has been requested is to be able<br>
>> > to<br>
>> > programmatically call the linker with "known good" object files (i.e.<br>
>> > produced by the compiler). That simplifies things a lot. Rui's recent<br>
>> > patches that are thread_local'izing existing globals seems like a<br>
>> > satisfactory approach. Or am I missing something?<br>
>><br>
>> Yes, known good files are a lot easier to handle. We just have to be<br>
>> clear what "known good" is.<br>
>><br>
>> > The R_X86_64_REX_GOTPCRELX situation can probably be likened to someone<br>
>> > giving clang a piece of source code with an inline asm that has:<br>
>> ><br>
>> > .text<br>
>> > .byte <some garbage><br>
>> ><br>
>> > in it. We don't guarantee that the output "makes sense" because there's<br>
>> > really no way for us to know what "makes sense" in a precise way (i.e.,<br>
>> > a<br>
>> > way that we can program).<br>
>><br>
>> Would we still be required to check the offsets so we don't crash? An<br>
>> assembly file can contain<br>
>><br>
>> .reloc 0, R_X86_64_REX_GOTPCRELX, foo<br>
>> .long 4<br>
>><br>
>> which would put that relocation in an invalid location. In general, is<br>
>> an arbitrary assembly file to be considered "known good"? Is that true<br>
>> even for things like<br>
>><br>
>> .section .eh_frame, ....<br>
>> garbage<br>
>><br>
>> that the linker has to parse?<br>
><br>
><br>
> I think the answer is case-by-case, but I don't think we have to guarantee<br>
> to recover from errors caused by carefully-crafted malicious object files.<br>
> (Is there anyone who disagrees with that?)<br>
<br>
</div></div>It is definitely not a use case *I* have an interest in. I just want<br>
to be an agreement on what use case we want to support at the moment.<br>
Is it "any .o file", "any llvm-mc or gas produced .o", "any clang or<br>
gcc produced .o not including inline asm"?<br>
<br>
Cheers,<br>
Rafael<br>
</blockquote></div><br></div>
</div></div></blockquote></div></div></div><br></div></div>
<br></div></div><span>_______________________________________________<br>
llvm-commits mailing list<br>
<a href="mailto:llvm-commits@lists.llvm.org" target="_blank">llvm-commits@lists.llvm.org</a><br>
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits</a><br>
<br></span></blockquote></div><br></div></div>
</blockquote></div></div></div><br></div></div>
</blockquote></div></div></div><br></div></div>
</blockquote></div></div></div><br></div></div>
</blockquote></div></div></div><br></div></div>
</blockquote></div></div></div><br></div></div>
</blockquote></div><br></div></div>