[LLVMdev] ud2 and lack of warning messages

Mikael Lyngvig mikael at lyngvig.org
Tue Dec 3 20:05:10 PST 2013


[This is a rather long mail because I feel the topic is extremely
important.  I've never before experienced a C or C++ compiler that silently
outputs crash-burn-and-die instructions and I think I've used more than 10
of the sort.]

Hmm, I am mostly thinking of this in terms of an LLVM IR generator who does
not have the benefit of an expertly written front-end that can add run-time
checks.  I just tried this command:

   clang -c -S -O2 -fsanitize=undefined test.ll
   a.out

And it didn't change anything.  The ud2 instructions are still there and
there are no checks.  And on Windows, this seems to yield nothing but a
mouse cursor that blinks once and then the program exits as if nothing had
happened.  This is possibly caused by the fact that I always operate with
Windows Error Reporting disabled.

I am in the process of reading Regehr's and Lattner's articles on the
undefined behavior of C, C++, and Objective-C.  I just don't understand why
a tool has to sort of work against you when it can easily work for you.

Is there any way at all that I can be informed of the appearance of "ud2"
(and similar on other platforms) code in my program by the compiler or an
associated tool?  Okay, I could do two compilations, one with -S and then
grep for "ud2", and then a second without -C, but that seems both
non-portable and slow.  I tried the above command with -Wall and got no
diagnostic whatsoever even though my hand-crafted program appears to be
mostly useless junk, which can be seen by the fact that two "ud2"
instructions are output, in different places, so that the program literally
has no chance of running to completion no matter what input it gets (the
first "ud2" appears right after the frame pointer has been set up in
main()).

A silently emitted and later executed "ud2" would make any program go
astray and chances are that the programmer incorrectly believes that
everything is almost okay, if for no other reason than that the compiler
has not told him otherwise.  And, yes, I do know about module tests and so
on, but they are not perfect either.  The "ud2" instruction sort of reduces
the user of LLVM IR to a user of an interpreted language - the program must
be checked as if it was written in Python or PHP, because you never know
when an "ud2" instruction might be emitted, not as if it was statically
checked by an advanced compiler.  Not exactly what he or she had in mind
when adopting LLVM in the first place, I suspect.

I am perfectly aware that *I* am the person to ultimately blame for "ud2"
instructions in my code, but as a non-omniscient entity, I'd like to be
told when my tool discovers something that I have missed.  That's the main
reason I prefer statically checked languages - the assurance that I have
been told about the little errors and that I can focus my attention on the
big things when testing.  As far as I can tell, the "ud2" invalidates all
sensible assumptions regarding the static checks of the compiler.  You
might get a few "ud2"s or not.  Sort of like throwing a dice and hoping for
the best.  The above is also part of why I am personally not very fond of C
and C++, this to such an extent that I stay away from coding in these
languages if I can at all do so.  But is LLVM IR C/C++ specific?

At the very least, I think that an early stage of the bitcode processor
ought to issue a warning if any undefined behavior, whatsoever, is
detected, so that people can learn to code differently and rest assured
that all is not as bad as it could be.  It may be my years with Ada, but I
definitely think you want to tell people about any and all undefined things
they do in their source code.  The sooner, the better!


-- Mikael


2013/12/4 Sean Silva <chisophugis at gmail.com>

> Doing this would make clang's diagnostic output dependent on the
> optimization level, which is absolutely verboten.
>
> Also, a ud2 doesn't mean your program has a bug, and I doubt an asm-level
> diagnostic would be useful to anyone.
>
> A ud2 just means "if control flow ever reaches this point, the program has
> undefined behavior"; in that sense, they don't even have to be emitted.
>
> btw, compiling with -fsanitize=undefined should turn most/all of those
> ud2's into runtime checks that will give you a nice diagnostic if they are
> violated.
>
> -- Sean Silva
>
>
> On Tue, Dec 3, 2013 at 8:58 PM, Mikael Lyngvig <mikael at lyngvig.org> wrote:
>
>> Is it just me or would it be nifty if Clang emitted a warning message
>> when it generates an "ud2" (UnDefined2) instruction.  I know this is
>> x86-specific, but it would be sort of nice to know up front.  After all,
>> the compiler knows perfectly well that it is outputting an "ud2"
>> instruction and I'm pretty sure almost every programmer out there would
>> like to share the unhappy news.
>>
>>
>> -- Mikael
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131204/5245a6be/attachment.html>


More information about the llvm-dev mailing list