[cfe-dev] Detecting undefined pointer arithmetic
Johannes Doerfert via cfe-dev
cfe-dev at lists.llvm.org
Thu Jan 14 16:52:12 PST 2021
We would/should not exploit UB in such a case, at least not in the shown
example. The pointer computation might yield `poison` but that is it.
If you'd use the pointer afterwards, the situation is different though.
~ Johannes
On 1/14/21 10:03 AM, John Criswell via cfe-dev wrote:
> Dear All,
>
> Actually, I think this is technically undefined behavior as (IIRC) C allows a pointer to extend one byte past the end of the referent memory object but not any further than that.
>
> That said, I wouldn’t be surprised if the sanitizers do not catch this case because:
>
> Since the code doesn’t do anything useful, dead code elimination may remove the offending code before the sanitizers instrument the program.
> The out of bounds pointer is not used to read or write memory, so it can’t corrupt or leak memory. About the most danger it poses is that it may enable optimizations that the programmer is not expecting because the code is undefined.
>
> I’ve lost track of all the different sanitizers in LLVM and how they work, but the original Address Sanitizer just checks for out of bounds loads and stores; it doesn’t place any checks on pointer arithmetic operations (like the LLVM gep instruction). That makes it faster at the cost of not catching all pointer arithmetic errors.
>
> Regards,
>
> John Criswell
>
> --
> John Criswell
> Associate Professor
> University of Rochester
> jtcriswel at gmail.com
>
>
>
>
>
>> On Jan 14, 2021, at 9:39 AM, Marshall Clow via cfe-dev <cfe-dev at lists.llvm.org> wrote:
>>
>>> On Jan 10, 2021, at 11:39 PM, Demi M. Obenour via cfe-dev <cfe-dev at lists.llvm.org> wrote:
>>>
>>> I noticed that none of the sanitizers seems to support checking for
>>> out-of-bounds pointer arithmetic, even though my understanding of
>>> the C standard is that this is undefined behavior. In particular, I
>>> believe the following trivial program has undefined behavior (assuming
>>> malloc() succeeds), but none of the sanitizers flag any warnings:
>>>
>>> #include <stdlib.h>
>>> int main(void) {
>>> char *buf = malloc(1);
>>> if (buf) {
>>> char *this_is_ub = buf + 3;
>>> free(buf);
>>> }
>>> }
>>>
>>> Of course, I suspect this just has not been implemented yet, but
>>> it still leaves me at a loss for how to track this form of UB down.
>>> Is there a better solution than manual code review?
>>
>> This program does not have UB.
>> There’s nothing wrong with forming an "out-of-bound” pointer.
>> If you use it for anything, then that is UB - and address sanitizer will find such usages for you.
>>
>> Like this:
>>
>> #include <stdlib.h>
>> int main(void) {
>> int ret = 0;
>> char *buf = (char *) malloc(1);
>> if (buf) {
>> char *this_is_ub = buf + 3;
>> ret = *this_is_ub;
>> free(buf);
>> }
>> return ret;
>> }
>>
>> ——
>> % clang++ -fsanitize=address bug.cpp && ./a.out
>> =================================================================
>> ==25602==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6020000000d3 at pc 0x00010531ff18 bp 0x7ffeea8e0970 sp 0x7ffeea8e0968
>> READ of size 1 at 0x6020000000d3 thread T0
>> #0 0x10531ff17 in main (a.out:x86_64+0x100000f17)
>> #1 0x7fff6edd5cc8 in start (libdyld.dylib:x86_64+0x1acc8)
>>
>> 0x6020000000d3 is located 2 bytes to the right of 1-byte region [0x6020000000d0,0x6020000000d1)
>> allocated by thread T0 here:
>> #0 0x105e08abd in wrap_malloc (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x45abd)
>> #1 0x10531feaf in main (a.out:x86_64+0x100000eaf)
>> #2 0x7fff6edd5cc8 in start (libdyld.dylib:x86_64+0x1acc8)
>>
>> SUMMARY: AddressSanitizer: heap-buffer-overflow (a.out:x86_64+0x100000f17) in main
>> Shadow bytes around the buggy address:
>> 0x1c03ffffffc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x1c03ffffffd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x1c03ffffffe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x1c03fffffff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> 0x1c0400000000: fa fa fd fd fa fa 00 00 fa fa 00 00 fa fa 00 04
>> =>0x1c0400000010: fa fa 00 00 fa fa 00 06 fa fa[01]fa fa fa fa fa
>> 0x1c0400000020: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
>> 0x1c0400000030: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
>> 0x1c0400000040: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
>> 0x1c0400000050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
>> 0x1c0400000060: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
>> Shadow byte legend (one shadow byte represents 8 application bytes):
>> Addressable: 00
>> Partially addressable: 01 02 03 04 05 06 07
>> Heap left redzone: fa
>> Freed heap region: fd
>> Stack left redzone: f1
>> Stack mid redzone: f2
>> Stack right redzone: f3
>> Stack after return: f5
>> Stack use after scope: f8
>> Global redzone: f9
>> Global init order: f6
>> Poisoned by user: f7
>> Container overflow: fc
>> Array cookie: ac
>> Intra object redzone: bb
>> ASan internal: fe
>> Left alloca redzone: ca
>> Right alloca redzone: cb
>> Shadow gap: cc
>> ==25602==ABORTING
>> zsh: abort ./a.out
>>
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
More information about the cfe-dev
mailing list