[LLVMdev] C as used/implemented in practice: analysis of responses
Peter Sewell
Peter.Sewell at cl.cam.ac.uk
Wed Jul 1 08:06:45 PDT 2015
On 1 July 2015 at 13:29, Renato Golin <renato.golin at linaro.org> wrote:
> On 1 July 2015 at 13:10, Peter Sewell <Peter.Sewell at cl.cam.ac.uk> wrote:
>> while attractive from the compiler-writer point of view, is just not
>> realistic, given the enormous body of C code out there which does
>> depend on some particular properties which are not guaranteed by the
>> ISO standard.
>
> There is also an enormous body of code that is just wrong. Do we have
> to worry about getting that right, too? Trying to "understand" the
> authors' intentions and do that instead of what they asked?
>
> Where do we draw the line? What do we consider "a reasonable
> deviation" from just "plain wrong"?
It varies from case to case, and one has to be pragmatic. But from
what we see, in our survey results and in Table 1 of
http://www.cl.cam.ac.uk/~dc552/papers/asplos15-memory-safe-c.pdf,
there are a number of non-ISO idioms that really are used pervasively
and for good reasons in systems code.
- some are actually supported by mainstream compilers but not
documented as such, e.g. where the ISO standard forbade things for
now-obsolute h/w reasons. For those, we can identify a
stronger-than-ISO mainstream semantics. For example, our Q12, making
a null pointer by casting from an expression that isn't a constant but
that evaluates to 0, might be in this category.
- some are used more rarely but in important use-cases, and we could
have options to turn off the relevant optimisations, or perhaps
additional annotations in the source types, that guarantee they work.
For example, our Q3 "Can one use pointer arithmetic between separately
allocated C objects" may be like this.
- for some, OS developers are already routinely turning off
optimisations for the sake of more predictable semantics, e.g. with
fno-strict-aliasing.
- for a few (e.g. our Q1 and Q2, and maybe also Q9 and Q10), there are
real conflicts, and it's not clear how to reconcile the compiler and
systems-programmer views; there we're trying to understand what's
possible. That might involve restricting some optimisations (and one
should try to understand the cost thereof), or additional options, or
documenting what compilers already do more clearly.
> There is a large portion of non-standard documented behaviours in all
> compilers, and GCC and Clang are particularly important here. Most
> builtin functions, attributes, and extensions are supported by both
> compilers in a similar way, and people can somewhat rely on it.
> But
> the only true reliable sources are the standards.
Sadly the ISO standards are neither completely unambiguous nor a good
guide to what can be or is assumed about implementations. (I say
this having contributed to the C/C++11 standards.)
> However, the very definition of undefined behaviour is "here be
> dragons", and that's something that was purposely done to aid
> compilers at optimising code. You may try to unite the open source
> compilers in many ways (as I tried last year), but trying to regulate
> undefined behaviour is not one of them.
>
>
>> That code is not necessarily all gospel, of course, far from it - but
>> its existence does have to be taken seriously.
>
> And we do! Though, in a completely different direction than you would expect. :)
>
> You advocate for better consistent support, which is ok and I, for
> one, have gone down that path multiple times. But in this specific
> case, the way we take it seriously is by warning the users of the
> potential peril AND abuse of it for performance reasons. This is a
> sweet spot because novice users will learn the language and advanced
> users will like the performance.
What we see in discussions with at least some communities of advanced
users is not completely consistent with that, I'm afraid...
thanks,
Peter
> cheers,
> --renato
More information about the llvm-dev
mailing list