[llvm-dev] [cfe-dev] RFC: Enforcing pointer type alignment in Clang

James Knight via llvm-dev llvm-dev at lists.llvm.org
Wed Jan 20 12:47:23 PST 2016


> On Jan 19, 2016, at 12:51 PM, John McCall <rjmccall at apple.com> wrote:
> 
>> On Jan 18, 2016, at 3:18 AM, David Chisnall <David.Chisnall at cl.cam.ac.uk> wrote:
>> Hi John,
>> 
>> On 15 Jan 2016, at 08:14, John McCall via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>>> 
>>> The question at hand is whether we should require the user to write this:
>>> misaligned_A_B *p = &a.b;
>>> instead of, say:
>>> A::B *p = &a.b;
>>> int x = *(misaligned_int*) &p->n;
>>> because we want to reserve the right to invoke undefined behavior and propagate our “knowledge" that p is 4-byte-aligned to “improve” the 1-byte-aligned access on the next line.
>>> 
>>> My contention is that this is a clean and elegantly simple formal model that is disastrously bad for actual users because it is no longer possible to locally work around a mis-alignment bug without tracking the entire history of the pointer.  It is the sort of compiler policy that gets people to roll their eyes and ask for new options called things like -fno-strict-type-alignment which gradually get adopted by 90% of projects.
>> 
>> I’ve had the misfortune to look at a lot of code that does unaligned access over the last few years.  By far the most common reason for it that I’ve seen is networking code that uses packed structures to represent packets.  For example:
>> 
>> __attribute__((packed))
>> struct somePacket
>> {
>> 	uint8_t a;
>> 	uint32_t b;
>> 	// ...
>> };
>> 
>> In your model, what happens when:
>> 
>> - I use field b directly?
> 
> The compiler recognizes that this access is to a valid but underaligned uint32_t object and generates code assuming a lower alignment.  This doesn’t change, except inasmuch as we gain a formal model that accepts the existence of valid-but-underaligned objects.
> 
>> - I take the address of field b and store it in an int* variable?
> 
> It’s not undefined behavior to form that pointer.  It is, however, still undefined behavior to access the object through that int*, because that type assumes a higher alignment. (The undefined behavior buys us a lot here: otherwise, LLVM would have to assume that all pointers are unaligned unless it could prove that they point to aligned memory. That’s prohibitive.)  However, if you don’t access the object as an int*, and instead access it in a less-aligned way, there’s no undefined behavior and the code is guaranteed to work.
> 
> For example, given this:
>  uint32_t *pb = &packet->b;
> 
> Under my model, this code would still have undefined behavior and might trap on an alignment-enforcing system:
>  uint32_t b = *pb;
> 
> This code would still have undefined behavior, because the formal type of the access is still uint32_t here:
>  uint32_t b;
>  memcpy(&b, pb, sizeof(b));
> 
> This code is fine:
>  uint32_t b;
>  memcpy(&b, (const char*) pb, sizeof(b));
> 
> As is this code:
>  __attribute__((aligned(1))) typedef uint32_t unaligned_uint32_t;
>  uint32_t b = *(unaligned_uint32_t*) pb;
> 
> Note that, under the language standards, both of the last two examples have undefined behavior: there’s no concept of a valid unaligned object at all, and if you shoe-horned one in, it would be probably be undefined behavior to take its address.  Clang would be allowed to say “okay, you took the address of this, and we can assume it was actually properly aligned despite being the address of a less-aligned object” and then propagate that alignment assumption to the later accesses to promote the alignment assumption. The goal of my model — and perhaps I’ve mis-formalized it, but I think the goal is quite clear — is just to forswear this capability in the compiler.


I think everything you said above is uncontroversial. I believe everyone here is in agreement that reinterpret_casts back and forth to arbitrary pointer types are currently allowed, and must continue to be, allowed by clang. Richard suggested a spec-diff to formalize that. As an implication of that being valid, the cast itself does not and must not in the future impart any knowledge of alignment (even though doing so would be theoretically okay per the current language spec).

That is, for clarity, clang does now and should always continue to allow the following, despite the spec saying it doesn't need to:

  char *x = malloc(10);
  int *y = (int*)&x[1]; // assign a misaligned int*.
  char c = *(char*)y; // but accessed only as char*, so no problem.

However, one part of your suggested rules that both I and Richard questioned was the requirement that the expression "&p->n" be valid, even if "p" is misaligned for its type. I still don't think that it's necessary or even particularly useful to start allowing that. And, note, that would be an actual change in behavior, not a clarification/formalization of existing behavior.

That is:
1) Is it valid to do "p->n", when p is not a valid object which is properly aligned for its type?
2) Assuming that's not valid, does adding a & cause it to then be valid, via some special case? E.g. the rule in C that states that "&a[n]" translates to "(a + n)", and "&*a" translates to "a", regardless of the value or validity of "a". (Without that rule, &*a would be invalid, too, if "a" was null or misaligned.)

Just to reiterate, I think the issue here is **not** about whether clang can make alignment assumptions in later code, it's about whether the member access expression *itself* is valid. (If it's not valid, then what happens in later code is irrelevant.)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160120/df563fa5/attachment.html>


More information about the llvm-dev mailing list