[PATCH] D67122: [UBSan][clang][compiler-rt] Applying non-zero offset to nullptr is undefined behaviour

Tue Sep 3 16:47:08 PDT 2019

vsk added inline comments.

================
Comment at: clang/docs/UndefinedBehaviorSanitizer.rst:177
      ``-fsanitize=undefined``.
+  -  ``-fsanitize=pointer-offsetting``: Enables ``nullptr-and-nonzero-offset``
+     and ``pointer-overflow``.
----------------
lebedev.ri wrote:
> vsk wrote:
> > Why does this need to be a separate sanitizer, as opposed to being a part of -fsanitize=pointer-overflow? Is there a need for a new group?
> That ties in with the question i'm asking myself in last update:
> https://reviews.llvm.org/D67122#1656418
> I.e., would it be okay to produce *all* these checks for **every** `gep inbounds` we codegen, indiscriminantly,
> as opposed to only always producing this `nullptr-and-nonzero-offset`?
It's worth considering whether or not to emit a checked GEP on a case-by-case basis. For example, there are some GEPs emitted in the ABI lowering logic that are not likely to catch bugs in user code. I'm not sure what the point of instrumenting these would be.

Separately, the proposed 'nullptr-and-nonzero-offset' check is interesting only/exactly when the existing 'pointer-overflow' check is interesting, and vice versa. So I don't see the need to make them distinct.

================
Comment at: compiler-rt/lib/ubsan/ubsan_handlers.cpp:715
+         "applying non-zero offset to non-null pointer %0 producing null "
+         "pointer is undefined behaviour")
+        << (void *)Base;
----------------
lebedev.ri wrote:
> vsk wrote:
> > I don't think these diagnostics generally need to repeat the fact that there is UB: that seems implicit.
> > 
> > How about: 'non-zero offset %0 applied to null pointer', and 'non-zero offset %0 applied to non-null pointer %1 produced null'?
> > I don't think these diagnostics generally need to repeat the fact that there is UB: that seems implicit.
> 
> Sure, ok.
> 
> > 'non-zero offset %0 applied to non-null pointer %1 produced null'?
> 
> I don't want to spell any more info that it already does, it isn't really useful //here//.
> Also, we only receive the base pointer and the computed pointer.
> We don't know the actual offset, we may have done `ptr - intptr_t(ptr)`,
> or maybe it was `ptr + (0-intptr_t(ptr))`.
> (Sure, we could receive more info, but i'm not sure worth it here.)
Works for me.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D67122/new/

https://reviews.llvm.org/D67122