[llvm-dev] Need help with code generation

Wed Mar 23 10:53:32 PDT 2016

On Wed, Mar 23, 2016 at 5:11 PM, David Blaikie <dblaikie at gmail.com> wrote:

>
>
> On Wed, Mar 23, 2016 at 3:52 AM, Rui Ueyama <ruiu at google.com> wrote:
>
>> On Tue, Mar 22, 2016 at 10:33 PM, David Blaikie <dblaikie at gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Tue, Mar 22, 2016 at 1:29 PM, Rui Ueyama <ruiu at google.com> wrote:
>>>
>>>> On Tue, Mar 22, 2016 at 9:19 PM, David Blaikie <dblaikie at gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Tue, Mar 22, 2016 at 1:15 PM, Rui Ueyama <ruiu at google.com> wrote:
>>>>>
>>>>>> On Tue, Mar 22, 2016 at 9:00 PM, David Blaikie <dblaikie at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Mar 22, 2016 at 12:36 PM, Rui Ueyama <ruiu at google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> On Tue, Mar 22, 2016 at 7:36 PM, Rui Ueyama <ruiu at google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I have a question. If there is a ELF verifier function that walks
>>>>>>>>> every part of an ELF file to verify that the file is sane, and if you can
>>>>>>>>> call that before calling LLD's function, are you guys happy with that?
>>>>>>>>>
>>>>>>>>
>>>>>>>> I'd like to get you guys opinion on this question.
>>>>>>>>
>>>>>>>
>>>>>>> I'd still find it problematic that lld itself would consider
>>>>>>> crash-on-invalid "not a bug" to the point of not reviewing/approving
>>>>>>> patches to fix such issues. That's what I'm concerned about in this thread.
>>>>>>>
>>>>>>
>>>>>> That's one way to see that. The other view is it as a whole has a
>>>>>> boolean option *IsInputTrustworthy* and works accordingly. What
>>>>>> matters most is what we provide to the users as a guarantee.  You have an
>>>>>> opinion that that should be implemented within LLD, but that would now an
>>>>>> internal design choice. Hypothetically we had such pass to verify inputs,
>>>>>> and if you send a patch to "fix" crash bug of LLD, we wouldn't probably
>>>>>> reject that but instead argue that that needs to be addressed in the
>>>>>> verifier pass instead. This is about "how" something should be implemented
>>>>>> and usual design choice discussion, no?
>>>>>>
>>>>>
>>>>> OK, sorry - some confusion. I assume that you wouldn't run this
>>>>> verifier pass by default in lld-the-command-line-tool, right? (I would
>>>>> guess it wouldn't meet your performance criteria)
>>>>>
>>>>
>>>> Correct.
>>>>
>>>>
>>>>> So from that perspective, lld-the-command-line-tool would still be
>>>>> crashing-by-design on certain inputs in its default/normal/user-facing mode
>>>>> & that would be seem problematic to me.
>>>>>
>>>>
>>>> I disagree. You are in almost all case handling valid object files
>>>> created by compilers, and if omitting some error checks for really pathetic
>>>> case would make code simpler and improve performance, I think having that
>>>> option is worth it (and I believe that's the case, at least those who are
>>>> actually writing code seems to take that stance.) Also, if you give broken
>>>> object files, you wouldn't get an output anyways. The only difference that
>>>> a user can observe is whether it dies with an error message or not. We
>>>> could even catch an segfault and run the verifier on the input again to
>>>> print out an error after something goes wrong.
>>>>
>>>
>>> Sure, I understand that we disagree here, I was merely answering your
>>> question "I have a question. If there is a ELF verifier function that walks
>>> every part of an ELF file to verify that the file is sane, and if you can
>>> call that before calling LLD's function, are you guys happy with that?" to
>>> help you understand that point of disagreement/my position (& probably the
>>> position of other people on this thread)
>>>
>>
>> Yes, I understand that this is where we disagree. LLD is robust in my
>> definition and we are vigorously trying to make it so. To me, however, it
>> is unfortunate but acceptable if LLD crashes on a malicious, hand-crafted
>> object file which is intended to crash LLD if the cost of fixing it is too
>> expensive. I understand a number of people are concerned about the design
>> choices. If the LLVM foundation or whatever wants to say that all LLVM
>> projects must have these minimum standards, and those standards include
>> this design choice, we'd do that. But that does not seems to be the case
>> right now.
>>
>> I'd be happy if we could handle all possible errors elegantly with very
>> low overhead, but there seems to be no easy way to do that. We may not have
>> just found it yet. If you come up with a solution and send us a patch, we
>> can discuss that, but what is happening here is that a number of people who
>> are thinking that our design choice is unreasonable are not contributing to
>> the project, so we are repeating the same discussion.
>>
>> The reason why I asked that question is it is guaranteed that we can
>> provide a protection for those who wants a 100% crash-free-ness (although I
>> doubt about how effective it is from the user's point of view compared to
>> other components which have crash bugs as discussed in the thread.) People
>> who are in this thread seem to believe that the design choice is not
>> irreversible or will reach to a point where it is irreversible because we
>> will have written too much code with the design, so we won't be able to
>> "fix" any crash "bugs" in future. That is not the case. We can at least
>> provide a new pass anytime to rigorously check any user input. Whether it
>> needs to be a separate pass or integrated one is a detailed design choice
>> that needs a first-hand knowledge on the code base.
>>
>
> Except we all seem to agree that this pass would probably make lld rather
> slow - possibly slower than gold/binutils ld, I assume? (perhaps that's an
> incorrect assumption) so at that point I imagine people would just go back
> to using those tools which, as Paul pointed out, do treat UB on invalid
> input as bugs today.
>

That would naturally make LLD slower, but there is no reason to believe
that it would make LLD slower than bfd or gold. As a person who is actually
hacking the stuff, I could however make a wild guess. I don't think
verifier would be costlier than linker, so let's say it would add 0.5x
overhead. Then, as long as our linker is 33% faster than gold, we are fine
with that with the error checking mode. That is not a high bar.

I also strongly disagree that it will become irreversible as we write more
>> code. It is plainly wrong. We are doing large incremental refactoring
>> pretty often (I'm the person who are doing it most often), and if you come
>> up with a way to handle every possible errors elegantly with very low
>> overhead, we could do that as well.
>>
>
> Indeed, we do do pretty large refactorings across the LLVM project
> regularly - the position/argument is that building such things in from the
> beginning may be relatively cheap compared to the refactoring cost of
> adding things later. This is the difficult balance we all make on so many
> design decisions across the LLVM project.
>
> There's also a difficult balance between placing the cost when/where it is
> needed (the person who wants a library perhaps should be the one paying the
> engineering cost of making a library (assuming it was no more expensive to
> do it when-needed than to build it in earlier, which is unclear)) or to lay
> some groundwork to enable the possibility of new scenarios with relatively
> low cost (Clang being a great example of this - building Clang as a library
> was/is a core design goal that enables many features the original Clang
> developers never could've foreseen - if we expected the first person to
> write a syntax highlighter (or even the larger Clang Tooling/AST Matcher
> infrastructure) to library-ify Clang, those efforts wouldn't've happened,
> or would've happened by building a new compiler (one of the major reasons
> Clang was built because those tools were hard to build with GCC))
>

I understand that you believe that sooner is easier than later to alter
this design decision, and that seems to be why you are pushing so hard to
convince us to adopt your strategy right now, and in that sense we didn't
understand the modification cost which you understood, although we are the
people who have first-hand knowledge of all the code. My view is different.
We are talking based on each own hypothesis, but when we are talking about
actual modification cost, I'd think your argument is less convincing
because of lack of knowledge about the code and design of the software
which we are talking about.

I again strongly disagree that that would not be irreversible in terms of
modification cost. We made a number of modifications that affected multiple
places in the linker, such as making it not to die when it sees errors in
valid object files. That was not hard, and there is no reason to believe
that that would have been harder if we had done it later. I also do not see
a reason to believe that adding error checking in some form will be much
costlier than doing it now, assuming you comes up with an error checking
method which is elegant and cheap in terms of run-time cost. If there is a
specific reason to believe that the change you are thinking is painfully
hard in future but can be done relatively cheap now, please show it to me.

Given that, I think you are worrying too much about one design choice that
>> I made. If you need a linker which never crash (whatever it means), then
>> I'm sorry but LLD may not be your choice at least at this moment. It is
>> however probably not suitable for your purpose anyways because it's too
>> early to use -- we are energetically working on implementing missing
>> features*. *If you have a suggestion to expand its scope without
>> sacrificing usability in other fields, we'd be happy to discuss that.
>>
>> (UB is more than just segfaulting, though. So there are more possible
>>> failures (including silently producing output & exiting with 0) than
>>> segfault.)
>>>
>>> - David
>>>
>>>
>>>>
>>>> If you're proposing having this verifier run by default - sure, then I
>>>>> can't construct an input that crashes the linker, it'll fail with an error
>>>>> message an exit. Yes, that would be fine by me - the way the feature is
>>>>> implemented is not something I mean to imply constraints on.
>>>>>
>>>>> - David
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>>>
>>>>>>>>> On Tue, Mar 22, 2016 at 6:39 PM, Hal Finkel via llvm-dev <
>>>>>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ------------------------------
>>>>>>>>>>
>>>>>>>>>> *From: *"David Blaikie via llvm-dev" <llvm-dev at lists.llvm.org>
>>>>>>>>>> *To: *"Rafael Espíndola" <rafael.espindola at gmail.com>
>>>>>>>>>> *Cc: *"llvm-dev" <llvm-dev at lists.llvm.org>, "Bruce Hoult" <
>>>>>>>>>> bruce at hoult.org>
>>>>>>>>>> *Sent: *Tuesday, March 22, 2016 10:18:03 AM
>>>>>>>>>> *Subject: *Re: [llvm-dev] Need help with code generation
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Mar 22, 2016 at 4:27 AM, Rafael Espíndola <
>>>>>>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>>>>>>
>>>>>>>>>>> > Maybe not, but it's not impossible either - browsers manage to
>>>>>>>>>>> harden themselves against malicious input and they operate in a far hostile
>>>>>>>>>>> environment with many more input formats than we do.
>>>>>>>>>>>
>>>>>>>>>>> It is important to note how different they are. Both Firefox and
>>>>>>>>>>> Chromium have people working just to try to make them more
>>>>>>>>>>> secure.
>>>>>>>>>>> Compare that with LLVM: One week ago I pointed out that your
>>>>>>>>>>> patch
>>>>>>>>>>> (r263521) introduces a crash. It still hasn't been reverted or
>>>>>>>>>>> even
>>>>>>>>>>> acknowledge yet.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> > I'm not trying to shift your personal goal, or to direct the
>>>>>>>>>>> features that you choose to put your time into, but I am interested in
>>>>>>>>>>> project policy.
>>>>>>>>>>>
>>>>>>>>>>> Why do you care about policy that is not followed? A policy
>>>>>>>>>>> saying
>>>>>>>>>>> llvm should not crash on any input is as relevant as one that
>>>>>>>>>>> says
>>>>>>>>>>> that clang should keep bootstrapping in under one second.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> It's pretty different when you say, essentially, that patches to
>>>>>>>>>> address these things are unlikely to be accepted. It doesn't seem
>>>>>>>>>> surprising that people wouldn't try to provide those patches and would
>>>>>>>>>> choose not to use the project if that's the expressed policy of the
>>>>>>>>>> developers on the project and doesn't line up with the needs of other
>>>>>>>>>> people.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> +1
>>>>>>>>>>
>>>>>>>>>>  -Hal
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> So, if we stick to reality, what we have is that lld (ELF and
>>>>>>>>>>> COFF)
>>>>>>>>>>> are already the most reliable parts of the toolchain. If not for
>>>>>>>>>>> Rui
>>>>>>>>>>> and I being upfront about it most people would not even know
>>>>>>>>>>> that you
>>>>>>>>>>> could crash it. So please, just let us keep working on the most
>>>>>>>>>>> reliable part of the toolchain.
>>>>>>>>>>>
>>>>>>>>>>> Cheers,
>>>>>>>>>>> Rafael
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> LLVM Developers mailing list
>>>>>>>>>>> llvm-dev at lists.llvm.org
>>>>>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> LLVM Developers mailing list
>>>>>>>>>> llvm-dev at lists.llvm.org
>>>>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Hal Finkel
>>>>>>>>>> Assistant Computational Scientist
>>>>>>>>>> Leadership Computing Facility
>>>>>>>>>> Argonne National Laboratory
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> LLVM Developers mailing list
>>>>>>>>>> llvm-dev at lists.llvm.org
>>>>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160323/c5f423aa/attachment.html>