[llvm-dev] Need help with code generation

Mon Mar 21 17:15:11 PDT 2016

On Tue, Mar 22, 2016 at 1:09 AM, Lang Hames <lhames at gmail.com> wrote:

> Hi Rui,
>
> LLVM's pass can crash if the previous pass is buggy.
>
>
> That's a bug that should be fixed in the previous pass.
>
> What we can do is set a boundary and make best effort to guarantee that as
>> long as you are within the boundary, we handle any input in some reasonable
>> way.
>
>
> That boundary is usually user input. We assume that the program's memory
> hasn't been compromised, but anything the user puts in should be treated
> with suspicion. Would you use a browser that didn't check for buffer
> overruns?
>
> Part of the problem is that we assume the linker is being used in a
> context where the input can be trusted. A lot of the time that's true, but
> assuming it limits the contexts in which LLD could be used. For example,
> you couldn't use LLD as the linker in a build-farm if it crashed on
> malformed input - what's to stop someone uploading a malformed ELF file and
> tricking the linker into sniffing other projects being built on the same
> server?
>

In such a hostile environment, do you really run any linker in an
unprotected environment?

Cheers,
> Lang.
>
> On Mon, Mar 21, 2016 at 4:56 PM, Rui Ueyama via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> On Tue, Mar 22, 2016 at 12:46 AM, David Blaikie <dblaikie at gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Mon, Mar 21, 2016 at 4:42 PM, Rui Ueyama <ruiu at google.com> wrote:
>>>
>>>> On Tue, Mar 22, 2016 at 12:32 AM, David Blaikie <dblaikie at gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Mon, Mar 21, 2016 at 4:21 PM, Rui Ueyama <ruiu at google.com> wrote:
>>>>>
>>>>>> From the user's point of view, I think it's still the same. As long
>>>>>> as LLVM is guaranteed to be undefined behavior-free (including any unknown
>>>>>> bugs), users are not guaranteed from getting undefined outputs. (And please
>>>>>> keep it in mind that we are talking about rare cases such as you created
>>>>>> ELF files by your own by hand or with a buggy tool.)
>>>>>>
>>>>>
>>>>> The same is true of any software (all software has bugs) - including
>>>>> the software outside that would be forking a subprocess to run lld, no?
>>>>>
>>>>> With LLVM we consider these bugs and fix them (or at least pretty much
>>>>> without question accept patches to fix them at least). It seems like the
>>>>> bar for getting such a patch into LLD is being set much higher - this seems
>>>>> problematic to me at least.
>>>>>
>>>>> A library doesn't have to be guaranteed to be free of bugs to be a
>>>>> library - that seems like an unrealistic standard (& one not present in any
>>>>> other project that I know of)
>>>>>
>>>>> In any case, I'm talking about just LLD itself when I'm expressing
>>>>> concern about "not a bug UB". It seems very different for the user of lld
>>>>> at the command line between "this program will give a short error and
>>>>> exit(1)" and "this program has known/intended undefined behavior". Even on
>>>>> uncommon inputs.
>>>>>
>>>>
>>>> As long as you can't prove that a program has no UB bug, you cannot say
>>>> that "this program has no undefined behavior." From the user's point of
>>>> view, it is still UB even if it is known to developers and fixed in earlier
>>>> version.
>>>>
>>>
>>> Then pretty much all software and all libraries do not meet the bar you
>>> are describing - so do so many try to fix these bugs? And if such a program
>>> or library is willing to say "we'll fix bugs if we find them" and wants to
>>> use lld - wouldn't it be reasonable to support them? Since that's pretty
>>> much the bar to which most most software is developed.
>>>
>>
>> A kernel is allowed (and choose to) crash with panic() if a device
>> behaves weirdly. LLVM's pass can crash if the previous pass is buggy. Many
>> regexp engines goes into virtually infinite loops if you give malicious
>> regexp. And any program can do anything weird if there is a bug. What we
>> can do is set a boundary and make best effort to guarantee that as long as
>> you are within the boundary, we handle any input in some reasonable way.
>> This is what we do -- and other programs do. And where the boundary should
>> be set depends on program.
>>
>>
>>>
>>>
>>>>
>>>> Again, I'd like to emphasize that we are talking about ill-formed ELF
>>>> header or something. If you are intentionally trying to break the linker,
>>>> I'd say "don't do that." As long as your input is not corrupted in terms of
>>>> file formatting, LLD behaves definitely (as far as we can guarantee.)
>>>>
>>>
>>> Right - all we're asking for is the same guarantee (not a very strong
>>> guarantee - you haven't provide it's defined for all valid inputs, I'm sure
>>> (formal proofs are really expensive, and buggy, and even fuzzing just helps
>>> it doesn't guarantee)) for other inputs - in both cases we know it's not an
>>> iron clad guarantee/proven truth.
>>>
>>>
>>>>
>>>>
>>>>> - David
>>>>>
>>>>>
>>>>>>
>>>>>> On Tue, Mar 22, 2016 at 12:02 AM, David Blaikie <dblaikie at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Mar 21, 2016 at 2:54 PM, Rui Ueyama <ruiu at google.com> wrote:
>>>>>>>
>>>>>>>> On Mon, Mar 21, 2016 at 10:49 PM, David Blaikie via llvm-dev <
>>>>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Mar 21, 2016 at 2:46 PM, Rafael Espíndola <
>>>>>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>>>>>
>>>>>>>>>> On 21 March 2016 at 17:34, Tim Northover via llvm-dev
>>>>>>>>>> <llvm-dev at lists.llvm.org> wrote:
>>>>>>>>>> >> My understanding is that clang and llvm themselves are
>>>>>>>>>> designed this way
>>>>>>>>>> >> (crash when the unexpected happens).
>>>>>>>>>> >
>>>>>>>>>> > I don't think so. I'd view any Clang crash as a bug (probably
>>>>>>>>>> to be
>>>>>>>>>> > prioritised below silent CodeGen and many others, but not
>>>>>>>>>> "working as
>>>>>>>>>> > designed").
>>>>>>>>>> >
>>>>>>>>>> >> For example the fact that clang forks itself to be able to
>>>>>>>>>> report diagnostics
>>>>>>>>>> >
>>>>>>>>>> > That seems like just trying to make our own job easier to me. I
>>>>>>>>>> think
>>>>>>>>>> > the entire point of the fork is to get a backtrace we can fix,
>>>>>>>>>> and
>>>>>>>>>> > point out where the user should send it.
>>>>>>>>>> >
>>>>>>>>>> >> llvm is full of report_fatal_error() (or worse, assertions
>>>>>>>>>> that can fire on unexpected user input).
>>>>>>>>>> >
>>>>>>>>>> > A bit of a grey area since LLVM isn't itself a user-facing
>>>>>>>>>> tool, but I
>>>>>>>>>> > think I'd still say that a report_fatal_error that's not
>>>>>>>>>> actionable by
>>>>>>>>>> > the user is actually an LLVM bug. And a segfault definitely so.
>>>>>>>>>>
>>>>>>>>>> It is completely trivial to crash llvm. A case I wrote today in
>>>>>>>>>> another thread while waiting for tests to run:
>>>>>>>>>>
>>>>>>>>>> target triple = "x86_64-unknown-linux-gnu"
>>>>>>>>>> @".data" = global i32 42
>>>>>>>>>>
>>>>>>>>>> That will crash "llc -filetype=obj". The fact that it is
>>>>>>>>>> considered a
>>>>>>>>>> bug doesn't mean much if there is no coordinated effort to fix
>>>>>>>>>> them.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I think it does, actually - that patches will be accepted to fix
>>>>>>>>> pretty much any crash in LLVM. (llc isn't a user facing tool, so that's a
>>>>>>>>> praticularly low priority - but as a general library (I assume your example
>>>>>>>>> also crashes Clang, which would be where this would surface in a more
>>>>>>>>> important way) it's pretty well accepted that crashes are bugs, I think)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Right now lld is already harder to crash than llvm. We are just
>>>>>>>>>> being
>>>>>>>>>> honest about the fact that it is possible to craft a .o file that
>>>>>>>>>> will
>>>>>>>>>> crash it.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> But the difference seems to be you know about these cases and
>>>>>>>>> don't consider them to be bugs/anything to fix. In LLVM if they're known,
>>>>>>>>> they're at least considered bugs and often/usually considered by someone to
>>>>>>>>> be worth fixing at some point.
>>>>>>>>>
>>>>>>>>
>>>>>>>> I think this is the same from the user's point of view. If LLVM is
>>>>>>>> not crash-bug-free in the version you are using, you need some precaution
>>>>>>>> such as forking in order to protect your program from crashing if you need
>>>>>>>> 100% guarantee.
>>>>>>>>
>>>>>>>
>>>>>>> Crashes seem very different from a user's point of view - does the
>>>>>>> program execute undefined behavior (potentially silently producing output
>>>>>>> and exiting 0) or does it have well defined behavior (even if that behavior
>>>>>>> is "print an error and exit(1)").
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> - Dave
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>> Rafael
>>>>>>>>>> _______________________________________________
>>>>>>>>>> LLVM Developers mailing list
>>>>>>>>>> llvm-dev at lists.llvm.org
>>>>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> LLVM Developers mailing list
>>>>>>>>> llvm-dev at lists.llvm.org
>>>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160322/090ef5ef/attachment.html>