[PATCH] D67122: [UBSan][clang][compiler-rt] Applying non-zero offset to nullptr is undefined behaviour

Roman Lebedev via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Sep 4 14:36:51 PDT 2019


lebedev.ri added inline comments.


================
Comment at: clang/lib/CodeGen/CGExprScalar.cpp:4538
+                 llvm::Value * /*OffsetOverflows*/>
+EmitGEPOffsetInBytes(Value *BasePtr, Value *GEPVal,
+                     llvm::LLVMContext &VMContext, CodeGenModule &CGM,
----------------
vsk wrote:
> Please document this helper. Also, consider having it return a struct, so that the field names don't need to be specified as comments. It'd be great to move the documentation for the TotalOffset and OffsetOverflows fields in EmitCheckedInBoundsGEP into the struct definition.
Uh, comment got lost when it became `static`.


================
Comment at: clang/lib/CodeGen/CGExprScalar.cpp:4670
+          OffsetOverflows == Builder.getFalse()) &&
+         "If the offset got constant-folded, we expect that there was no "
+         "overflows.");
----------------
vsk wrote:
> nit: don't expect
I'm not sure what you mean here. Changed, not sure if this is what you meant.


================
Comment at: clang/lib/CodeGen/CGExprScalar.cpp:4703-4720
+    // 2) The sign of the difference between the computed address and the base
+    // pointer matches the sign of the total offset.
+    llvm::Value *ValidGEP;
+    auto *NoOffsetOverflow = Builder.CreateNot(OffsetOverflows);
+    if (SignedIndices) {
+      auto *PosOrZeroValid = Builder.CreateICmpUGE(ComputedGEP, IntPtr);
+      auto *PosOrZeroOffset = Builder.CreateICmpSGE(TotalOffset, Zero);
----------------
lebedev.ri wrote:
> This makes me ick every time i look at it.
> I wonder if this can be sanely rewritten via `.with.overflow` intrinsic..
Hm, 
```
Name: add unsigned
  %computed = add i8 %base, %offset
  %PosOrZeroValid = icmp uge i8 %computed, %base
=>
  %agg = uadd_overflow i8 %base, %offset
  %computed = extractvalue i8 %agg, 0
  %ov = extractvalue i1 %agg, 1
  %PosOrZeroValid = xor i1 %ov, -1

Name: sub unsigned
  %computed = add i8 %base, %offset
  %PosOrZeroValid = icmp ule i8 %computed, %base
=>
  %negated_offset = sub i8 0, %offset
  %agg = usub_overflow i8 %base, %negated_offset
  %computed = extractvalue i8 %agg, 0
  %ov = extractvalue i1 %agg, 1
  %PosOrZeroValid = xor i1 %ov, -1

Name: add signed
  %computed = add i8 %base, %offset
  %PosOrZeroValid = icmp uge i8 %computed, %base
  %PosOrZeroOffset = icmp sge i8 %offset, 0
  %NegValid = icmp ult i8 %computed, %base
  %OKay = select i1 %PosOrZeroOffset, i1 %PosOrZeroValid, i1 %NegValid
=>
  %uadd.agg = uadd_overflow i8 %base, %offset
  %uadd.ov = extractvalue i1 %uadd.agg, 1
  %negated_offset = sub i8 0, %offset
  %usub.agg = usub_overflow i8 %base, %negated_offset
  %usub.ov = extractvalue i1 %usub.agg, 1
  %computed = add i8 %base, %offset
  %PosOrZeroOffset = icmp sge i8 %offset, 0
  %NotOKay = select i1 %PosOrZeroOffset, i1 %uadd.ov, i1 %usub.ov
  %OKay = xor i1 %NotOKay, -1
```

That's not great, but i wonder what is more friendly to middle-end.
https://godbolt.org/z/ORSU_L


================
Comment at: clang/test/CodeGen/catch-nullptr-and-nonzero-offset-blacklist.c:1
+// RUN: %clang_cc1 -fsanitize=pointer-overflow -fsanitize-recover=pointer-overflow -emit-llvm %s -o - -triple x86_64-linux-gnu | FileCheck %s -implicit-check-not="call void @__ubsan_handle_pointer_overflow" --check-prefixes=CHECK,CHECK-NULL-IS-INVALID-PTR
+// RUN: %clang_cc1 -fno-delete-null-pointer-checks -fsanitize=pointer-overflow -fsanitize-recover=pointer-overflow -emit-llvm %s -o - -triple x86_64-linux-gnu | FileCheck %s -implicit-check-not="call void @__ubsan_handle_pointer_overflow" --check-prefixes=CHECK,CHECK-NULL-IS-VALID-PTR
----------------
vsk wrote:
> Should the test filename be "ignore-nullptr-and-nonzero-..."?
`-blacklist.c` is the filename i have i used for all similar testfiles in all previous sanitizers.
"black list" means here "list of cases where no sanitization will be emitted".


================
Comment at: clang/test/CodeGen/catch-nullptr-and-nonzero-offset-blacklist.c:5
+// CHECK-LABEL: @baseline
+char *baseline(char *base, unsigned long offset) {
+  // CHECK: call void @__ubsan_handle_pointer_overflow(
----------------
vsk wrote:
> The baseline case is tested elsewhere, why include it here?
So the rest won't have false-positive just because `-fsanitize=pointer-overflow` is omitted.


================
Comment at: clang/test/CodeGen/catch-nullptr-and-nonzero-offset-in-offsetof-idiom.c:1
+// RUN: %clang_cc1 -x c   -emit-llvm %s -o - -triple x86_64-linux-gnu | FileCheck %s
+// RUN: %clang_cc1 -x c   -fsanitize=pointer-overflow -fno-sanitize-recover=pointer-overflow -emit-llvm %s -o - -triple x86_64-linux-gnu | FileCheck %s
----------------
vsk wrote:
> Should the test filename be "ignore-nullptr-and-nonzero-..."?
Honestly, i'm using the consistent file naming scheme of `catch-<option-name-or-so>`, so that they are all checked by
`$ ninja clang && ./bin/llvm-lit  -v -v -vv /repositories/llvm-project/clang/test/CodeGen/catch-*`


================
Comment at: clang/test/CodeGen/catch-nullptr-and-nonzero-offset-in-offsetof-idiom.c:14
+uintptr_t get_offset_of_y() {
+  // CHECK:      define i64 @{{.*}}() {{.*}} {
+  // CHECK-NEXT: [[ENTRY:.*]]:
----------------
vsk wrote:
> nit: CHECK-LABEL: define i64 @get_offset_of_y(
No, because this runs both in C mode and in C++ mode, so the name may be mangled.


================
Comment at: clang/test/CodeGen/catch-nullptr-and-nonzero-offset.c:28
+
+// We just don't know, can be bad, can be ok.
+char *var_var(char *base, unsigned long offset) {
----------------
vsk wrote:
> I don't quite follow the comments. Perhaps it would be simpler to list the expected checks in the comment, e.g. "The base and offset are unknown. Check for pointer wrap and null status change."
> 
These comments are rather pointless now that the checks are merged, i'll just drop them.


================
Comment at: clang/test/CodeGen/catch-nullptr-and-nonzero-offset.c:66
+  // CHECK: define i8* @var_zero_OK(i8* %[[BASE:.*]])
+  // CHECK-NEXT: [[ENTRY:.*]]:
+  // CHECK-NEXT: %[[BASE_ADDR:.*]] = alloca i8*, align 8
----------------
vsk wrote:
> Does this simplify to:
> 
> ```
> CHECK-LABEL: define i8* @var_zero_ok(
> CHECK-NOT: !nosanitize
> ```
I find it easier to reason about readable check lines.


================
Comment at: clang/test/CodeGen/ubsan-pointer-overflow.m:48
 void pointer_array_unsigned_indices(int **arr, unsigned k) {
   // CHECK: icmp uge
   // CHECK-NOT: select
----------------
vsk wrote:
> I'm curious about what happens here with the new null-status-change check. If array indices are unsigned, there is no need for a separate null-status-change check: if the result of the GEP is indeed null, that will be detected by the pointer wraparound check. We just need to check that the base pointer itself isn't null.
> 
> That generalizes to addition of unsigned offsets to base pointers, I think. So e.g. I wouldn't expect the 'preinc' function to contain a null-status-change check either.
Ugh, i still hate those codegen-time optimizations :)
Not only do they result in a maze that is 'hard' to comprehend and argue about,
the middle-end isn't "forced" to know these folds so the other code stays unoptimized,
worse yet, unlike miscompile that will be detected, if the logic error is made here,
it will likely "only" result in false-negatives :(

> We just need to check that the base pointer itself isn't null.

I don't believe that is sound for `unsigned add`: https://rise4fun.com/Alive/2J5 - will consider that `base=0 offset=0` is invalid.
Likewise, does not seem valid for `unsigned sub`: https://rise4fun.com/Alive/VEr - will not detect when pointer becomes null.
Likewise, does not seem valid for `signed add/sub`: https://rise4fun.com/Alive/97B - will not detect when pointer becomes null.

So, i'm not going to touch this. Unless i'm getting the gist of your comment wrong, or wrote those checks wrong?


================
Comment at: clang/test/CodeGenCXX/catch-nullptr-and-nonzero-offset-in-offsetof-idiom.cpp:1
+// RUN: %clang_cc1 -x c++ -emit-llvm %s -o - -triple x86_64-linux-gnu | FileCheck %s
+// RUN: %clang_cc1 -x c++ -fsanitize=pointer-overflow -fno-sanitize-recover=pointer-overflow -emit-llvm %s -o - -triple x86_64-linux-gnu | FileCheck %s
----------------
vsk wrote:
> How is this different from the other offsetof test file?
It uses `nullptr` specifically, unlike `0`.


================
Comment at: compiler-rt/test/ubsan/TestCases/Pointer/nullptr-and-nonzero-offset-constants.cpp:22
+  result = base - 1;
+  // CHECK: {{.*}}.cpp:[[@LINE-1]]:17: runtime error: applying non-zero offset to non-null pointer 0x{{.*}} producing null pointer
+
----------------
vsk wrote:
> Should be 'produced null pointer' or 'gave null result'
Oh, i was looking for better spelling, but didn't think of `produced`. Thanks!


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D67122/new/

https://reviews.llvm.org/D67122





More information about the llvm-commits mailing list