[libc-commits] [PATCH] D78612: [libc] Add sanitizer instrumentation to the SIMD strlen implementation.

Wed Apr 22 10:52:56 PDT 2020

sivachandra marked an inline comment as done.
sivachandra added a comment.

In D78612#1997305 <https://reviews.llvm.org/D78612#1997305>, @hctim wrote:

> I personally think this is too clever. The whole templating thing seems like a good overall QOL improvement - but the whole unsanitized `safe_check_word` that we have to manually sanitize seems challenging to maintain. You would need to do sanitizer instrumentation manually for ASan, MSan, UBSan, HWASan, MTE, DFSan, and any other fun toys we come up with. Each of these has a different interface, and different quirks. Why not just let the compiler do these quirks?

Sorry, the templating thing is probably a distraction at this point.

With respect to the sanitizer parts: In this patch, `strlen` is an example. With the present sanitizer API, wouldn't hand-crafting instrumentation mean that we have to manually deal with multiple sanitizers in general anyway?

> Instead, can't we just use the slow path always under sanitizers? i.e.:
> 
>   #if __has_feature(address_sanitizer) || defined(__SANITIZE_ADDRESS__)
>     static constexpr bool word_optimized = false;
>   #else // ASan
>     static constexpr bool word_optimized = Mask<sizeof(uintptr_t)>::opt;
>   #endif
> 
> 
> Then, the whole dancing-around-managing sanitizers is solved as the compiler just does normal instrumentation of the slow path.

My view is that this just defeats the purpose of sanitizing. Also, we will be setting an example which says, "it is OK to keep fast paths unsanitized."

There is an alternate approach we can take here: We can keep the `safe_word_check` function unsanitized without any instrumentation. Since the main function is sanitized, it will catch bad memory reads as it does the reading again anyway. This is probably a nice compromise, but I would like to hear about the spirit of such an approach: we are not sanitizing one function at all.

================
Comment at: libc/test/src/string/strlen_test.cpp:70
+  const char hello[8] = {'h', 'e', 'l', 'l', 'o', '\0', '0', '0'};
+#if __has_feature(address_sanitizer) || defined(__SANITIZE_ADDRESS__)
+  __asan_poison_memory_region(hello + 6, 2);
----------------
hctim wrote:
> Wouldn't this be simpler just as `const char *hello = "hello"`? We definitely shouldn't be manually doing asan-ish things in the test.
Other tests do the equivalent. The literal string ends up in the data section which is properly aligned. So, will not trigger an ASAN error as the word read is still within the data section?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78612/new/

https://reviews.llvm.org/D78612