[LLVMdev] C as used/implemented in practice: analysis of responses

Sean Silva chisophugis at gmail.com
Mon Jun 29 18:51:19 PDT 2015


On Sun, Jun 28, 2015 at 1:28 AM, Peter Sewell <Peter.Sewell at cl.cam.ac.uk>
wrote:

> On 27 June 2015 at 17:01, Duncan P. N. Exon Smith <dexonsmith at apple.com>
> wrote:
> >
> >> On 2015 Jun 26, at 17:02, Peter Sewell <Peter.Sewell at cl.cam.ac.uk>
> wrote:
> >>
> >> On 26 June 2015 at 22:53, Sean Silva <chisophugis at gmail.com <mailto:
> chisophugis at gmail.com>> wrote:
> >>> All of these seem to fall into the pattern of "The compiler is
> required to
> >>> do what you expect, as long as it can't prove X about your program".
> That
> >>> is, the only reasonable compilation in the absence of inferring some
> extra
> >>> piece of information about your program, is the one you expect. For
> example,
> >>> the only way to codegen a comparison between two random pointers has
> the
> >>> meaning you expect (on common computer architectures); but if the
> compiler
> >>> can figure something out that tells it that comparing those two
> pointers is
> >>> undefined by the language standard, then, well, technically it can do
> >>> whatever it wants.
> >>>
> >>> Many people interpret this as the compiler being somewhat malevolent,
> but
> >>> there's another interpretation in some cases.
> >>>
> >>> I have not looked in depth at the history in all the undefined
> behaviors
> >>> mentioned in the survey, but some of the undefined behaviors are there
> >>> because at some point in time the underlying system diversity made it
> >>> difficult or impossible to assign a meaning. So long as the diversity
> that
> >>> led to the desire to leave something undefined still exists, programs
> that
> >>> use those constructs with certain expectations *will* fail to behave as
> >>> "expected" on those targets (on a system where pointers are represented
> >>> differently, your program *may* actually format your hard disk if you
> do
> >>> so-and-so!).
> >>>
> >>> To put it another way, what is "expected" is actually dependent on the
> C
> >>> programmer's knowledge of the underlying system (computer architecture,
> >>> system architecture, etc.), and there will always be tension so long
> as the
> >>> programmer is not thinking about what the C language guarantees, but
> rather
> >>> (roughly speaking) how *they* would translate their code to assembly
> >>> language for the system or systems that they happen to know they're
> >>> targeting. An x86 programmer doesn't expect unaligned loads to invoke
> nasal
> >>> demons, but a SPARC programmer does.
> >>>
> >>> So if you unravel the thread of logic back through the undefined
> behaviors
> >>> made undefined for this reason, many of these cases of exploiting
> undefined
> >>> behavior are really an extension, on the compiler's part, of the logic
> >>> "there are some systems for which your code would invoke nasal demons,
> so I
> >>> might as well assume that it will invoke nasal demons on this system
> (since
> >>> the language standard doesn't say anything about specific systems)".
> Or to
> >>> put it another way, the compiler is effectively assuming that your
> code is
> >>> written to target all the systems taken into account by the C
> standard, and
> >>> if it would invoke nasal demons on any one of them then the compiler is
> >>> allowed to invoke nasal demons on all of them.
> >>
> >> Sure.  However, we think we have to take seriously the fact that a
> >> large body of critical code out there is *not* written to target what
> >> the C standard is now, and it is very unlikely to be rewritten to do
> >> so.
> >
> > In case you're not aware of it, here's a fairly relevant blog series on
> > the topic of undefined behaviour in C:
> >
> > http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html
> > http://blog.llvm.org/2011/05/what-every-c-programmer-should-know_14.html
> > http://blog.llvm.org/2011/05/what-every-c-programmer-should-know_21.html
>
> We're aware of those, thanks.
>
> >>
> >> At the end of the day, code is not written purely by "thinking about
> >> what the C language guarantees", but rather by test-and-debug cycles
> >> that test the code against the behaviour of particular C
> >> implementations.  The ISO C standard is a very loose specification,
> >> and we do not have good tools for testing code against all the
> >> behaviour it permits,
> >
> > *cough* -fsanitize=undefined *cough*
>
> That (and other such tools) is surely a *lot* better than what we had
> before, no doubt about that.  And its developers and those who use it
> heavily should be in a good position to comment on our survey
> questions, as they are up against the same basic problem, of
> reconciling what existing C code actually does vs what compilers
> assume about it, to detect errors without too many false positives.
> We had quite a few survey responses saying something like "sanitisers
> have to allow XYZ, despite the ISO standard, because code really does
> it"; in a sense, what we're doing is trying to clearly and precisely
> characterise all those cases.   If you or others can help with that,
> please do!
>
> But such tools are, useful and impressive though they are, aren't
> really testing code against all the behaviour ISO permits - as I
> understand it, they are essentially checking properties of single
> (instrumented) executions, while ISO is a very loose spec, e.g. when
> it comes to evaluation order choices and implementation-defined
> quantities, permitting many (potentially quite different) executions
> for the same source and inputs.  Running with -fsanitize=undefined
> will detect problems just on the executions that the current compiler
> implementation happens to generate.  Of course, checking against all
> allowed executions of a very loose spec quickly becomes
> combinatorially infeasible,


Actually, I would be very, very happy to have an O(2^N) (or worse)
algorithm for checking all allowed executions ;)

(the problem is actually undecidable; not just "infeasible")


> so this isn't unreasonable, but at lease
> we'd like to have that gold standard precisely defined, and to be able
> to pseudorandomly check against it.
>

It sounds like what you are wanting is to basically make a list of
undefined behaviors in the standard and find out which ones, in practice,
should be demoted to unspecified or implementation-defined?

-- Sean Silva


>
> thanks,
> Peter
>
>
>
>
> > http://clang.llvm.org/docs/UsersManual.html#controlling-code-generation
> >
> >> so that basic development technique does not -
> >> almost, cannot - result in code that is robust against compilers that
> >> sometimes exploit a wide range of that behaviour.
> >>
> >> It's also the case that some of the looseness of ISO C relates to
> >> platforms that are no longer relevant, or at least no longer
> >> prevalent.  We can at least identify C dialects that provide stronger
> >> guarantees for the rest.
> >>
> >> thanks,
> >> Peter
> >>
> >>
> >>> This is obviously sort of a twisted logic, and I think that a lot of
> the
> >>> "malevolence" attributed to compilers is due to this. It certainly
> removes
> >>> many target-dependent checks from the mid-level optimizer though.
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150629/ce3ecaee/attachment.html>


More information about the llvm-dev mailing list