[llvm-dev] Need help with code generation

Mon Mar 21 17:09:02 PDT 2016

Hi Rui,

LLVM's pass can crash if the previous pass is buggy.

That's a bug that should be fixed in the previous pass.

What we can do is set a boundary and make best effort to guarantee that as
> long as you are within the boundary, we handle any input in some reasonable
> way.

That boundary is usually user input. We assume that the program's memory
hasn't been compromised, but anything the user puts in should be treated
with suspicion. Would you use a browser that didn't check for buffer
overruns?

Part of the problem is that we assume the linker is being used in a context
where the input can be trusted. A lot of the time that's true, but assuming
it limits the contexts in which LLD could be used. For example, you
couldn't use LLD as the linker in a build-farm if it crashed on malformed
input - what's to stop someone uploading a malformed ELF file and tricking
the linker into sniffing other projects being built on the same server?

Cheers,
Lang.

On Mon, Mar 21, 2016 at 4:56 PM, Rui Ueyama via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> On Tue, Mar 22, 2016 at 12:46 AM, David Blaikie <dblaikie at gmail.com>
> wrote:
>
>>
>>
>> On Mon, Mar 21, 2016 at 4:42 PM, Rui Ueyama <ruiu at google.com> wrote:
>>
>>> On Tue, Mar 22, 2016 at 12:32 AM, David Blaikie <dblaikie at gmail.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Mon, Mar 21, 2016 at 4:21 PM, Rui Ueyama <ruiu at google.com> wrote:
>>>>
>>>>> From the user's point of view, I think it's still the same. As long as
>>>>> LLVM is guaranteed to be undefined behavior-free (including any unknown
>>>>> bugs), users are not guaranteed from getting undefined outputs. (And please
>>>>> keep it in mind that we are talking about rare cases such as you created
>>>>> ELF files by your own by hand or with a buggy tool.)
>>>>>
>>>>
>>>> The same is true of any software (all software has bugs) - including
>>>> the software outside that would be forking a subprocess to run lld, no?
>>>>
>>>> With LLVM we consider these bugs and fix them (or at least pretty much
>>>> without question accept patches to fix them at least). It seems like the
>>>> bar for getting such a patch into LLD is being set much higher - this seems
>>>> problematic to me at least.
>>>>
>>>> A library doesn't have to be guaranteed to be free of bugs to be a
>>>> library - that seems like an unrealistic standard (& one not present in any
>>>> other project that I know of)
>>>>
>>>> In any case, I'm talking about just LLD itself when I'm expressing
>>>> concern about "not a bug UB". It seems very different for the user of lld
>>>> at the command line between "this program will give a short error and
>>>> exit(1)" and "this program has known/intended undefined behavior". Even on
>>>> uncommon inputs.
>>>>
>>>
>>> As long as you can't prove that a program has no UB bug, you cannot say
>>> that "this program has no undefined behavior." From the user's point of
>>> view, it is still UB even if it is known to developers and fixed in earlier
>>> version.
>>>
>>
>> Then pretty much all software and all libraries do not meet the bar you
>> are describing - so do so many try to fix these bugs? And if such a program
>> or library is willing to say "we'll fix bugs if we find them" and wants to
>> use lld - wouldn't it be reasonable to support them? Since that's pretty
>> much the bar to which most most software is developed.
>>
>
> A kernel is allowed (and choose to) crash with panic() if a device behaves
> weirdly. LLVM's pass can crash if the previous pass is buggy. Many regexp
> engines goes into virtually infinite loops if you give malicious regexp.
> And any program can do anything weird if there is a bug. What we can do is
> set a boundary and make best effort to guarantee that as long as you are
> within the boundary, we handle any input in some reasonable way. This is
> what we do -- and other programs do. And where the boundary should be set
> depends on program.
>
>
>>
>>
>>>
>>> Again, I'd like to emphasize that we are talking about ill-formed ELF
>>> header or something. If you are intentionally trying to break the linker,
>>> I'd say "don't do that." As long as your input is not corrupted in terms of
>>> file formatting, LLD behaves definitely (as far as we can guarantee.)
>>>
>>
>> Right - all we're asking for is the same guarantee (not a very strong
>> guarantee - you haven't provide it's defined for all valid inputs, I'm sure
>> (formal proofs are really expensive, and buggy, and even fuzzing just helps
>> it doesn't guarantee)) for other inputs - in both cases we know it's not an
>> iron clad guarantee/proven truth.
>>
>>
>>>
>>>
>>>> - David
>>>>
>>>>
>>>>>
>>>>> On Tue, Mar 22, 2016 at 12:02 AM, David Blaikie <dblaikie at gmail.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Mar 21, 2016 at 2:54 PM, Rui Ueyama <ruiu at google.com> wrote:
>>>>>>
>>>>>>> On Mon, Mar 21, 2016 at 10:49 PM, David Blaikie via llvm-dev <
>>>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Mar 21, 2016 at 2:46 PM, Rafael Espíndola <
>>>>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>>>>
>>>>>>>>> On 21 March 2016 at 17:34, Tim Northover via llvm-dev
>>>>>>>>> <llvm-dev at lists.llvm.org> wrote:
>>>>>>>>> >> My understanding is that clang and llvm themselves are designed
>>>>>>>>> this way
>>>>>>>>> >> (crash when the unexpected happens).
>>>>>>>>> >
>>>>>>>>> > I don't think so. I'd view any Clang crash as a bug (probably to
>>>>>>>>> be
>>>>>>>>> > prioritised below silent CodeGen and many others, but not
>>>>>>>>> "working as
>>>>>>>>> > designed").
>>>>>>>>> >
>>>>>>>>> >> For example the fact that clang forks itself to be able to
>>>>>>>>> report diagnostics
>>>>>>>>> >
>>>>>>>>> > That seems like just trying to make our own job easier to me. I
>>>>>>>>> think
>>>>>>>>> > the entire point of the fork is to get a backtrace we can fix,
>>>>>>>>> and
>>>>>>>>> > point out where the user should send it.
>>>>>>>>> >
>>>>>>>>> >> llvm is full of report_fatal_error() (or worse, assertions that
>>>>>>>>> can fire on unexpected user input).
>>>>>>>>> >
>>>>>>>>> > A bit of a grey area since LLVM isn't itself a user-facing tool,
>>>>>>>>> but I
>>>>>>>>> > think I'd still say that a report_fatal_error that's not
>>>>>>>>> actionable by
>>>>>>>>> > the user is actually an LLVM bug. And a segfault definitely so.
>>>>>>>>>
>>>>>>>>> It is completely trivial to crash llvm. A case I wrote today in
>>>>>>>>> another thread while waiting for tests to run:
>>>>>>>>>
>>>>>>>>> target triple = "x86_64-unknown-linux-gnu"
>>>>>>>>> @".data" = global i32 42
>>>>>>>>>
>>>>>>>>> That will crash "llc -filetype=obj". The fact that it is
>>>>>>>>> considered a
>>>>>>>>> bug doesn't mean much if there is no coordinated effort to fix
>>>>>>>>> them.
>>>>>>>>>
>>>>>>>>
>>>>>>>> I think it does, actually - that patches will be accepted to fix
>>>>>>>> pretty much any crash in LLVM. (llc isn't a user facing tool, so that's a
>>>>>>>> praticularly low priority - but as a general library (I assume your example
>>>>>>>> also crashes Clang, which would be where this would surface in a more
>>>>>>>> important way) it's pretty well accepted that crashes are bugs, I think)
>>>>>>>>
>>>>>>>>
>>>>>>>>> Right now lld is already harder to crash than llvm. We are just
>>>>>>>>> being
>>>>>>>>> honest about the fact that it is possible to craft a .o file that
>>>>>>>>> will
>>>>>>>>> crash it.
>>>>>>>>>
>>>>>>>>
>>>>>>>> But the difference seems to be you know about these cases and don't
>>>>>>>> consider them to be bugs/anything to fix. In LLVM if they're known, they're
>>>>>>>> at least considered bugs and often/usually considered by someone to be
>>>>>>>> worth fixing at some point.
>>>>>>>>
>>>>>>>
>>>>>>> I think this is the same from the user's point of view. If LLVM is
>>>>>>> not crash-bug-free in the version you are using, you need some precaution
>>>>>>> such as forking in order to protect your program from crashing if you need
>>>>>>> 100% guarantee.
>>>>>>>
>>>>>>
>>>>>> Crashes seem very different from a user's point of view - does the
>>>>>> program execute undefined behavior (potentially silently producing output
>>>>>> and exiting 0) or does it have well defined behavior (even if that behavior
>>>>>> is "print an error and exit(1)").
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> - Dave
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> Rafael
>>>>>>>>> _______________________________________________
>>>>>>>>> LLVM Developers mailing list
>>>>>>>>> llvm-dev at lists.llvm.org
>>>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> LLVM Developers mailing list
>>>>>>>> llvm-dev at lists.llvm.org
>>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160321/95e748fc/attachment-0001.html>