[PATCH] D67122: [UBSan][clang][compiler-rt] Applying non-zero offset to nullptr is undefined behaviour

Vedant Kumar via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Thu Sep 5 13:03:26 PDT 2019


vsk added a comment.

Still think this looks good. Have you tried running this on the llvm test suite, or some other interesting corpus? Would be curious to see any pre/post patch numbers.



================
Comment at: clang/lib/CodeGen/CGExprScalar.cpp:4703-4720
+    // 2) The sign of the difference between the computed address and the base
+    // pointer matches the sign of the total offset.
+    llvm::Value *ValidGEP;
+    auto *NoOffsetOverflow = Builder.CreateNot(OffsetOverflows);
+    if (SignedIndices) {
+      auto *PosOrZeroValid = Builder.CreateICmpUGE(ComputedGEP, IntPtr);
+      auto *PosOrZeroOffset = Builder.CreateICmpSGE(TotalOffset, Zero);
----------------
lebedev.ri wrote:
> lebedev.ri wrote:
> > This makes me ick every time i look at it.
> > I wonder if this can be sanely rewritten via `.with.overflow` intrinsic..
> Hm, 
> ```
> Name: add unsigned
>   %computed = add i8 %base, %offset
>   %PosOrZeroValid = icmp uge i8 %computed, %base
> =>
>   %agg = uadd_overflow i8 %base, %offset
>   %computed = extractvalue i8 %agg, 0
>   %ov = extractvalue i1 %agg, 1
>   %PosOrZeroValid = xor i1 %ov, -1
> 
> Name: sub unsigned
>   %computed = add i8 %base, %offset
>   %PosOrZeroValid = icmp ule i8 %computed, %base
> =>
>   %negated_offset = sub i8 0, %offset
>   %agg = usub_overflow i8 %base, %negated_offset
>   %computed = extractvalue i8 %agg, 0
>   %ov = extractvalue i1 %agg, 1
>   %PosOrZeroValid = xor i1 %ov, -1
> 
> Name: add signed
>   %computed = add i8 %base, %offset
>   %PosOrZeroValid = icmp uge i8 %computed, %base
>   %PosOrZeroOffset = icmp sge i8 %offset, 0
>   %NegValid = icmp ult i8 %computed, %base
>   %OKay = select i1 %PosOrZeroOffset, i1 %PosOrZeroValid, i1 %NegValid
> =>
>   %uadd.agg = uadd_overflow i8 %base, %offset
>   %uadd.ov = extractvalue i1 %uadd.agg, 1
>   %negated_offset = sub i8 0, %offset
>   %usub.agg = usub_overflow i8 %base, %negated_offset
>   %usub.ov = extractvalue i1 %usub.agg, 1
>   %computed = add i8 %base, %offset
>   %PosOrZeroOffset = icmp sge i8 %offset, 0
>   %NotOKay = select i1 %PosOrZeroOffset, i1 %uadd.ov, i1 %usub.ov
>   %OKay = xor i1 %NotOKay, -1
> ```
> 
> That's not great, but i wonder what is more friendly to middle-end.
> https://godbolt.org/z/ORSU_L
I suggest splitting out changes to the existing pointer wraparound check in a follow-up patch. This may well be an improvement, but there are enough moving parts to this patch as-is.


================
Comment at: clang/test/CodeGen/catch-nullptr-and-nonzero-offset-blacklist.c:1
+// RUN: %clang_cc1 -fsanitize=pointer-overflow -fsanitize-recover=pointer-overflow -emit-llvm %s -o - -triple x86_64-linux-gnu | FileCheck %s -implicit-check-not="call void @__ubsan_handle_pointer_overflow" --check-prefixes=CHECK,CHECK-NULL-IS-INVALID-PTR
+// RUN: %clang_cc1 -fno-delete-null-pointer-checks -fsanitize=pointer-overflow -fsanitize-recover=pointer-overflow -emit-llvm %s -o - -triple x86_64-linux-gnu | FileCheck %s -implicit-check-not="call void @__ubsan_handle_pointer_overflow" --check-prefixes=CHECK,CHECK-NULL-IS-VALID-PTR
----------------
lebedev.ri wrote:
> vsk wrote:
> > Should the test filename be "ignore-nullptr-and-nonzero-..."?
> `-blacklist.c` is the filename i have i used for all similar testfiles in all previous sanitizers.
> "black list" means here "list of cases where no sanitization will be emitted".
This is minor. I only raised the nit as I felt that this test was about ignoring "foo", not about catching "foo", and that "blacklist" already means something else in the context of sanitizers. Happy to defer to your preference here.


================
Comment at: clang/test/CodeGen/catch-nullptr-and-nonzero-offset.c:66
+  // CHECK: define i8* @var_zero_OK(i8* %[[BASE:.*]])
+  // CHECK-NEXT: [[ENTRY:.*]]:
+  // CHECK-NEXT: %[[BASE_ADDR:.*]] = alloca i8*, align 8
----------------
lebedev.ri wrote:
> vsk wrote:
> > Does this simplify to:
> > 
> > ```
> > CHECK-LABEL: define i8* @var_zero_ok(
> > CHECK-NOT: !nosanitize
> > ```
> I find it easier to reason about readable check lines.
Ah, sure, having the IR written out explicitly can help. The downside is that it's more to read/maintain, and arguably the key idea here is that "no sanitizer stuff happens" since presumably IRGen for generic pointer arithmetic is tested elsewhere, but again this is minor & I'm happy to defer on this.


================
Comment at: clang/test/CodeGen/ubsan-pointer-overflow.m:48
 void pointer_array_unsigned_indices(int **arr, unsigned k) {
   // CHECK: icmp uge
   // CHECK-NOT: select
----------------
lebedev.ri wrote:
> vsk wrote:
> > I'm curious about what happens here with the new null-status-change check. If array indices are unsigned, there is no need for a separate null-status-change check: if the result of the GEP is indeed null, that will be detected by the pointer wraparound check. We just need to check that the base pointer itself isn't null.
> > 
> > That generalizes to addition of unsigned offsets to base pointers, I think. So e.g. I wouldn't expect the 'preinc' function to contain a null-status-change check either.
> Ugh, i still hate those codegen-time optimizations :)
> Not only do they result in a maze that is 'hard' to comprehend and argue about,
> the middle-end isn't "forced" to know these folds so the other code stays unoptimized,
> worse yet, unlike miscompile that will be detected, if the logic error is made here,
> it will likely "only" result in false-negatives :(
> 
> > We just need to check that the base pointer itself isn't null.
> 
> I don't believe that is sound for `unsigned add`: https://rise4fun.com/Alive/2J5 - will consider that `base=0 offset=0` is invalid.
> Likewise, does not seem valid for `unsigned sub`: https://rise4fun.com/Alive/VEr - will not detect when pointer becomes null.
> Likewise, does not seem valid for `signed add/sub`: https://rise4fun.com/Alive/97B - will not detect when pointer becomes null.
> 
> So, i'm not going to touch this. Unless i'm getting the gist of your comment wrong, or wrote those checks wrong?
Totally see your point and understand the frustration :), we all want a tidy/simple frontend. And thanks for sharing these counter-examples. The context for my comment here is that -O0 -g and -Os -g compile-time and run-time performance (yes! -O0 run-time performance, really) matters a fair amount for Xcode users, and tend to regress with sanitizer changes. If there are simple (and correct :) ways to avoid regressing these, it's a plus.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D67122/new/

https://reviews.llvm.org/D67122





More information about the cfe-commits mailing list