[libc-commits] [PATCH] D78612: [libc] Add sanitizer instrumentation to the SIMD strlen implementation.

Wed Apr 22 12:30:50 PDT 2020

hctim added a comment.

>> My view is that this just defeats the purpose of sanitizing. Also, we will be setting an example which says, "it is OK to keep fast paths unsanitized."
> 
> I agree, sanitizers are useful for detecting problems in programs using libc and detecting problems with the libc itself. It's really the "fast path" where we want the second kind, because it's more complicated and harder to see if it's safe. @sivachandra is right, and thats not an example I would want to set here.

>From a libc-user's perspective - I want to know if my call to libc was buggy. How that function is implemented is not really important (i.e. whether I get one implementation or another) - and this is how the interceptors work currently with glibc. I don't expect that a sanitizer-friendly implementation would be more than 2x slower than a "fast variant".

>From a libc-developer's perspective - I completely understand that having a sanitizer-friendly implementation and a sanitizer-unfriendly implementation is undesirable. The tradeoff here is an enormous maintenance burden. By emulating ASan's behaviour here - it's creating the expectation to emulate all of MSan, TSan, HWASan, UBSan, MTE and DFSan's behaviours as well. This turns strlen() into hundreds of lines of sanitizer-specific behaviour and tricks.

>> There is an alternate approach we can take here: We can keep the `safe_word_check` function unsanitized without any instrumentation. Since the main function is sanitized, it will catch bad memory reads as it does the reading again anyway. This is probably a nice compromise, but I would like to hear about the spirit of such an approach: we are not sanitizing one function at all.
> 
> This makes a lot of sense, I am in strong favor of this approach. We keep both the performance of D77949 <https://reviews.llvm.org/D77949> and we achieve the goal of easy sanitization. Theres also a lot of merit in pulling out the unsafe bits of a function and explicitly marking them as such, it documents we understand it is UB and is a good starting point if there are ever any bugs.

This approach works well here. Bearing in mind that it doesn't work trivially for anything that doesn't "double back" on the sanitizer-friendly version. Think SIMD strcpy() + MSan.

================
Comment at: libc/test/src/string/strlen_test.cpp:70
+  const char hello[8] = {'h', 'e', 'l', 'l', 'o', '\0', '0', '0'};
+#if __has_feature(address_sanitizer) || defined(__SANITIZE_ADDRESS__)
+  __asan_poison_memory_region(hello + 6, 2);
----------------
sivachandra wrote:
> hctim wrote:
> > Wouldn't this be simpler just as `const char *hello = "hello"`? We definitely shouldn't be manually doing asan-ish things in the test.
> Other tests do the equivalent. The literal string ends up in the data section which is properly aligned. So, will not trigger an ASAN error as the word read is still within the data section?
ASan doesn't care about alignment. It stores a description of the real size of the global - which is caught by partially addressable dwords.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78612/new/

https://reviews.llvm.org/D78612