[LLVMdev] ud2 and lack of warning messages

Wed Dec 4 10:22:55 PST 2013

On Tue, Dec 3, 2013 at 11:05 PM, Mikael Lyngvig <mikael at lyngvig.org> wrote:

> [This is a rather long mail because I feel the topic is extremely
> important.  I've never before experienced a C or C++ compiler that silently
> outputs crash-burn-and-die instructions and I think I've used more than 10
> of the sort.]
>
> Hmm, I am mostly thinking of this in terms of an LLVM IR generator who
> does not have the benefit of an expertly written front-end that can add
> run-time checks.
>

LLVM IR is not a place to put language-level diagnostics, and the users of
a compiler frontend are only interested in language-level diagnostics.

>  I just tried this command:
>
>    clang -c -S -O2 -fsanitize=undefined test.ll
>    a.out
>
> And it didn't change anything.  The ud2 instructions are still there and
> there are no checks.  And on Windows, this seems to yield nothing but a
> mouse cursor that blinks once and then the program exits as if nothing had
> happened.  This is possibly caused by the fact that I always operate with
> Windows Error Reporting disabled.
>

UBSan works at the C/C++ language level. It doesn't work at the IR level.

>
> I am in the process of reading Regehr's and Lattner's articles on the
> undefined behavior of C, C++, and Objective-C.  I just don't understand why
> a tool has to sort of work against you when it can easily work for you.
>

To a compiler writer, UB means "I don't have to handle that case". It's not
"I have to detect this case and emit a ud2". In many cases (most?), the
problem of deciding whether the program actually has undefined behavior
would require solving the halting problem. That is why UBSan is a runtime
check.

>
> Is there any way at all that I can be informed of the appearance of "ud2"
> (and similar on other platforms) code in my program by the compiler or an
> associated tool?  Okay, I could do two compilations, one with -S and then
> grep for "ud2", and then a second without -C, but that seems both
> non-portable and slow.  I tried the above command with -Wall and got no
> diagnostic whatsoever even though my hand-crafted program appears to be
> mostly useless junk, which can be seen by the fact that two "ud2"
> instructions are output, in different places, so that the program literally
> has no chance of running to completion no matter what input it gets (the
> first "ud2" appears right after the frame pointer has been set up in
> main()).
>

The program could have set up a SIGILL handler in a static initializer,
then catch SIGILL upon executing the ud2, and then patch its code to
replace that instruction with a nop and continue.

Yes, that's crazy, but notice that in order to reason that "the program
literally has no chance of running to completion", you had to *assume* that
the scenario I just described would not happen. That sort of assumption
making is basically the same thing that the optimizer does w.r.t. UB.

> A silently emitted and later executed "ud2" would make any program go
> astray and chances are that the programmer incorrectly believes that
> everything is almost okay, if for no other reason than that the compiler
> has not told him otherwise.  And, yes, I do know about module tests and so
> on, but they are not perfect either.  The "ud2" instruction sort of reduces
> the user of LLVM IR to a user of an interpreted language - the program must
> be checked as if it was written in Python or PHP, because you never know
> when an "ud2" instruction might be emitted, not as if it was statically
> checked by an advanced compiler.  Not exactly what he or she had in mind
> when adopting LLVM in the first place, I suspect.
>

It's not LLVM's job to ensure that your program has "correct semantics".
That is the language frontend's job; it's not the job of "an advanced
compiler": it's the most basic responsibility of a language frontend. All
LLVM knows is the correct semantics for its own IR, not your language.

>
> I am perfectly aware that *I* am the person to ultimately blame for "ud2"
> instructions in my code, but as a non-omniscient entity, I'd like to be
> told when my tool discovers something that I have missed.  That's the main
> reason I prefer statically checked languages - the assurance that I have
> been told about the little errors and that I can focus my attention on the
> big things when testing.  As far as I can tell, the "ud2" invalidates all
> sensible assumptions regarding the static checks of the compiler.  You
> might get a few "ud2"s or not.  Sort of like throwing a dice and hoping for
> the best.  The above is also part of why I am personally not very fond of C
> and C++, this to such an extent that I stay away from coding in these
> languages if I can at all do so.  But is LLVM IR C/C++ specific?
>

The optimizer is only obligated to preserve a set of defined behaviors. If
you stay within those defined behaviors, your program will function
correctly, if not, then all bets are off. It is a bug in your frontend (not
your users' code) if your language claims to be "safe" (i.e. has no
undefined behaviors) and yet the program runs into LLVM IR level UB.

>
> At the very least, I think that an early stage of the bitcode processor
> ought to issue a warning if any undefined behavior, whatsoever, is
> detected, so that people can learn to code differently and rest assured
> that all is not as bad as it could be.
>

You aren't understanding that detecting undefined behavior is basically
asking "will the program do X", and in general that is equivalent to the
halting problem, meaning that there is nothing that can be done statically
which will answer the question. Thus, you have to add runtime checks if you
want to detect it. The fact that it reduces to the halting problem means
that detecting it statically is not an issue of "detect or don't detect
it", but rather one of "how many heuristics can you tack on for detecting
certain limited cases of it".

>  It may be my years with Ada, but I definitely think you want to tell
> people about any and all undefined things they do in their source code.
>  The sooner, the better!
>

The only way to get rid of undefined behavior is to define the result of
all constructs in a language. For the reasons I mentioned above, that will
entail adding runtime checks in a Turing-complete language (i.e. any
interesting one). LLVM needs to be able to efficiently compile languages
like C/C++ without runtime checks, so it has to support the notion of
undefined behavior, which is basically an escape hatch for simplifying
certain static reasoning about the runtime behavior of a program, and
without which those languages cannot be competitively optimized.

-- Sean Silva

>
>
> -- Mikael
>
>
> 2013/12/4 Sean Silva <chisophugis at gmail.com>
>
>> Doing this would make clang's diagnostic output dependent on the
>> optimization level, which is absolutely verboten.
>>
>> Also, a ud2 doesn't mean your program has a bug, and I doubt an asm-level
>> diagnostic would be useful to anyone.
>>
>> A ud2 just means "if control flow ever reaches this point, the program
>> has undefined behavior"; in that sense, they don't even have to be emitted.
>>
>> btw, compiling with -fsanitize=undefined should turn most/all of those
>> ud2's into runtime checks that will give you a nice diagnostic if they are
>> violated.
>>
>> -- Sean Silva
>>
>>
>> On Tue, Dec 3, 2013 at 8:58 PM, Mikael Lyngvig <mikael at lyngvig.org>wrote:
>>
>>> Is it just me or would it be nifty if Clang emitted a warning message
>>> when it generates an "ud2" (UnDefined2) instruction.  I know this is
>>> x86-specific, but it would be sort of nice to know up front.  After all,
>>> the compiler knows perfectly well that it is outputting an "ud2"
>>> instruction and I'm pretty sure almost every programmer out there would
>>> like to share the unhappy news.
>>>
>>>
>>> -- Mikael
>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131204/bc38105f/attachment.html>