[cfe-dev] [llvm-dev] RFC: Enforcing pointer type alignment in Clang

Thu Jan 21 04:55:33 PST 2016

On 19 Jan 2016, at 17:51, John McCall <rjmccall at apple.com> wrote:
> 
>  uint32_t *pb = &packet->b;
> 
> Under my model, this code would still have undefined behavior and might trap on an alignment-enforcing system:
>  uint32_t b = *pb;

We already have a warning for almost this cast, but it’s flawed in two respects:

- It’s noisy, so people normally turn it off.
- It’s silenced by an explicit cast.

The latter is problematic again in your model, because programmers now get no warning when they do the dangerous thing.

> This code would still have undefined behavior, because the formal type of the access is still uint32_t here:
>  uint32_t b;
>  memcpy(&b, pb, sizeof(b));
> 
> This code is fine:
>  uint32_t b;
>  memcpy(&b, (const char*) pb, sizeof(b));

This makes me nervous, because presumably void* has an alignment of 1 and now we have different behaviour depending on whether we perform an implicit or explicit cast.

I’m also somewhat uncomfortable with the idea that assigning a uint32_t* temporary to a uint32_t* variable increases its alignment.  I’d be tempted to propose modelling the alignment more explicitly in the C type system, so that &pb is not a uint32_t*, it’s a __alignment__(1) uint32_t* and can’t be implicitly cast to a uint32_t*.  That would mean that we could explicitly warn on casts that increased the alignment and provide a type for b that would both preserve the (lack of) alignment.  For example:

typedef __attribute__((aligned(1))) uint32_t unaligned_uint32_t;

struct foo
{
    char a;
    uint32_t b;
}
__attribute__((packed));

struct foo packet;
...
uint32_t *pb = &packet.b;
unaligned_uint32_t *upb = &packet.b;

Currently, this code is accepted by clang.  I would propose that:

// This should not be allowed (or, if it is, with a warning that can be turned into an error)
uint32_t *pb = &packet.b;
// This should be permitted, but the static analyser should complain
uint32_t *pb = (uint32_t*)&packet.b;
// This should be permitted to silently work
unaligned_uint32_t *upb = &packet.b;

I believe that you would get most of this from clang by implicitly providing the alignment information to members of packed structs.  This would mean that the type of packet.b would implicitly be unaligned_uint32_t, not uint32_t.

> As is this code:
>  __attribute__((aligned(1))) typedef uint32_t unaligned_uint32_t;
>  uint32_t b = *(unaligned_uint32_t*) pb;
> 
> Note that, under the language standards, both of the last two examples have undefined behavior: there’s no concept of a valid unaligned object at all, and if you shoe-horned one in, it would be probably be undefined behavior to take its address.  Clang would be allowed to say “okay, you took the address of this, and we can assume it was actually properly aligned despite being the address of a less-aligned object” and then propagate that alignment assumption to the later accesses to promote the alignment assumption.  The goal of my model — and perhaps I’ve mis-formalized it, but I think the goal is quite clear — is just to forswear this capability in the compiler.

This is, as you say, a language extension, but it’s one that’s been supported by GCC since at least the 2.x days and existing code relies on it.  

David