[libc-commits] [PATCH] D78612: [libc] Add sanitizer instrumentation to the SIMD strlen implementation.

Wed Apr 22 11:26:15 PDT 2020

abrachet added a comment.

In D78612#1997364 <https://reviews.llvm.org/D78612#1997364>, @sivachandra wrote:

> In D78612#1997305 <https://reviews.llvm.org/D78612#1997305>, @hctim wrote:
>
> > I personally think this is too clever. The whole templating thing seems like a good overall QOL improvement - but the whole unsanitized `safe_check_word` that we have to manually sanitize seems challenging to maintain. You would need to do sanitizer instrumentation manually for ASan, MSan, UBSan, HWASan, MTE, DFSan, and any other fun toys we come up with. Each of these has a different interface, and different quirks. Why not just let the compiler do these quirks?
>
>
> Sorry, the templating thing is probably a distraction at this point.
>
> With respect to the sanitizer parts: In this patch, `strlen` is an example. With the present sanitizer API, wouldn't hand-crafting instrumentation mean that we have to manually deal with multiple sanitizers in general anyway?
>
> > Instead, can't we just use the slow path always under sanitizers? i.e.:
> > 
> >   #if __has_feature(address_sanitizer) || defined(__SANITIZE_ADDRESS__)
> >     static constexpr bool word_optimized = false;
> >   #else // ASan
> >     static constexpr bool word_optimized = Mask<sizeof(uintptr_t)>::opt;
> >   #endif
> > 
> > 
> > Then, the whole dancing-around-managing sanitizers is solved as the compiler just does normal instrumentation of the slow path.
>
> My view is that this just defeats the purpose of sanitizing. Also, we will be setting an example which says, "it is OK to keep fast paths unsanitized."

I agree, sanitizers are useful for detecting problems in programs using libc and detecting problems with the libc itself. It's really the "fast path" where we want the second kind, because it's more complicated and harder to see if it's safe. @sivachandra is right, and thats not an example I would want to set here.

> There is an alternate approach we can take here: We can keep the `safe_word_check` function unsanitized without any instrumentation. Since the main function is sanitized, it will catch bad memory reads as it does the reading again anyway. This is probably a nice compromise, but I would like to hear about the spirit of such an approach: we are not sanitizing one function at all.

This makes a lot of sense, I am in strong favor of this approach. We keep both the performance of D77949 <https://reviews.llvm.org/D77949> and we achieve the goal of easy sanitization. Theres also a lot of merit in pulling out the unsafe bits of a function and explicitly marking them as such, it documents we understand it is UB and is a good starting point if there are ever any bugs.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78612/new/

https://reviews.llvm.org/D78612