[PATCH] D67122: [UBSan][clang][compiler-rt] Applying non-zero offset to nullptr is undefined behaviour

Richard Smith - zygoloid via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Sep 26 14:08:14 PDT 2019


rsmith added inline comments.


================
Comment at: clang/docs/ReleaseNotes.rst:63
 
+* As per C++ and C Standards (C++: ``[expr.add]``; C17: 6.5.6p8), applying
+  non-zero offset to ``nullptr`` (or making non-``nullptr`` a ``nullptr``,
----------------
aaron.ballman wrote:
> rsmith wrote:
> > In C, even adding 0 to a null pointer is undefined. Did this change in C17?
> I don't see what words in the C standard make this UB. If it's UB-by-omission, I think it's unintentional. I mean, this language allows `&*some_null_pointer` without UB, so I would be really surprised if `some_null_ptr + 0` was UB. Do you have a source in the C standard for the UB?
C11 6.5.6/8: "If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined."

Pointer arithmetic in C11 is only defined if the pointer points to an array object (where per /7, a pointer to a non-array object is treated as a pointer to an array of length 1). So arithmetic on null pointers is undefined, even if you add zero. This rule was explicitly changed in C++ to permit adding zero to a null pointer. Similar analysis applies to taking the difference of two pointers, where C++ explicitly says `null_pointer - null_pointer` is zero and C11 says both pointers need to point to the same array object or you get undefined behavior (C11 6.5.6/9: "When two pointers are subtracted, both shall point to elements of the same array object, or one past the last element of the array object").

In contrast, C explicitly calls out the `&*null_pointer` case (C11 6.5.3.2/3: " If the operand is the result of a unary `*` operator, neither that operator nor the `&` operator is evaluated and the result is as if both were omitted"). The corresponding case is undefined in C++, but that's likely to be addressed by a defect report resolution (see http://wg21.link/cwg232).

I don't think these two things are all that closely related, though. The C model, as I understand it, is that the null pointer is not a pointer to any object, whereas the C++ model is as if there is an object of type `T[0]` at the null address for every object type `T`, and a null pointer to object type points to (== past the end of) that object. And the `&*` thing is just a very specific, weird C special case, and only applies to certain very specific syntactic forms (but that's OK in C, because there are only a very small number of syntactic forms for lvalues).


================
Comment at: clang/test/CodeGen/catch-nullptr-and-nonzero-offset-in-offsetof-idiom.c:14
+uintptr_t get_offset_of_y() {
+  // CHECK:      define i64 @{{.*}}() {{.*}} {
+  // CHECK-NEXT: [[ENTRY:.*]]:
----------------
lebedev.ri wrote:
> vsk wrote:
> > nit: CHECK-LABEL: define i64 @get_offset_of_y(
> No, because this runs both in C mode and in C++ mode, so the name may be mangled.
`CHECK-LABEL: define i64 @{{.*}}get_offset_of_y{{.*}}(` ?

(Even if you don't add the function names, please do add the `-LABEL`.)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D67122/new/

https://reviews.llvm.org/D67122





More information about the llvm-commits mailing list