[PATCH] D30384: [asan] Add an interceptor for strtok

Yury Gribov via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Mar 15 14:56:22 PDT 2017


ygribov added inline comments.


================
Comment at: lib/sanitizer_common/sanitizer_common_interceptors.inc:494
+    uptr del_length = REAL(strlen)(delimiters);
+    COMMON_INTERCEPTOR_READ_STRING_OF_LEN(ctx, delimiters, del_length, 1);
+    return result;
----------------
alekseyshl wrote:
> mrigger wrote:
> > alekseyshl wrote:
> > > mrigger wrote:
> > > > alekseyshl wrote:
> > > > > First, it should be ..., del_length, del_length + 1); and second, it does not make sense to use COMMON_INTERCEPTOR_READ_STRING_OF_LEN here, 
> > > > > 
> > > > >    COMMON_INTERCEPTOR_READ_RANGE(ctx, delimiters, REAL(strlen)(delimiters) + 1);
> > > > > 
> > > > > would be enough.
> > > > > 
> > > > I implemented it like you suggested in my first revision. The current implementation is based on @ygribov's feedback and his explanations about `strict_strings`. Did I understand them wrong?
> > > You're explicitly calling strlen on that string, it will look for the \0 anyway. Check what COMMON_INTERCEPTOR_READ_STRING_OF_LEN is doing, substitute your arguments.
> > > 
> > > Your last argument being just 1 should definitely be fixed, now you're checking just the first char, right?
> > Note, that we use the not-intercepted `strlen`. Following, an out-of-bounds access might not result in an error here.
> > 
> > Yes, we only check the first delimiter (for `strict_string_checks=0`), as @ygribov suggested, if I understood him right. @ygribov, can you please comment to clear things up?
> > 
> > Here is a demonstration of the behavior on an example:
> > ```
> > #include <stdio.h>
> > #include <string.h>
> > 
> > int main() {
> >     char buf[] = "abcb";
> >     char del[] = {'b'}; // missing NULL terminator
> >     char* result = strtok(buf, del);
> >     printf("%s\n", result);
> > }
> > ```
> > Executing the program with `ASAN_OPTIONS=strict_string_checks=1 ./a.out` lets ASan output an error. However, executing with `ASAN_OPTIONS=strict_string_checks=0 ./a.out` does not result in an error.
> > 
> > Here is also the documentation for `strict_string_checks`:
> > ```
> > 	strict_string_checks
> > 		- If set check that string arguments are properly null-terminated
> > ```
> > 
> > I'm happy to change it, but I want to be sure that we implement the expected behavior.
> > 
> > 
> The code I proposed uses non-intercepted strlen as well:
> 
>   COMMON_INTERCEPTOR_READ_RANGE(ctx, delimiters, REAL(strlen)(delimiters) + 1);
> 
> The problem I'm trying to point out is not about the first delimiter (that's fine, but it's about the other COMMON_INTERCEPTOR_READ_STRING_OF_LEN use, the one above), it's that you pass n=1 to your second macro, this one:
> 
>   COMMON_INTERCEPTOR_READ_STRING_OF_LEN(ctx, delimiters, del_length, 1);
> 
> which means "1 char". I believe, it was not your intention.
Sorry guys, I was out. Frankly I agree with @alekseyshl in that we shouldn't unconditionally call `strlen` on `str` and `delimeters`. They are not necessarily null-terminated so by calling `REAL(strlen)` when strict_strings is off, you risk triggering a runtime error.

See my suggestions below for how to change the code to avoid strlens.


================
Comment at: lib/sanitizer_common/sanitizer_common_interceptors.inc:500
+INTERCEPTOR(char*, strtok, char *str, const char *delimiters) {
+  void *ctx;
+  COMMON_INTERCEPTOR_ENTER(ctx, strtok, str, delimiters);
----------------
Bind star to `ctx`?


================
Comment at: lib/sanitizer_common/sanitizer_common_interceptors.inc:502
+  COMMON_INTERCEPTOR_ENTER(ctx, strtok, str, delimiters);
+  if (common_flags()->intercept_strtok) {
+    char* result;
----------------
I'd suggest to handle `!intercept_strtok` case here as this will allow to reduce function nesting. It's minor though.


================
Comment at: lib/sanitizer_common/sanitizer_common_interceptors.inc:508
+      // static buffer for subsequent invocations.
+      result = REAL(strtok)(str, delimiters);
+    } else {
----------------
`Strtok` will return NULL or zero-terminated string so you should be able to do plain `COMMON_INTERCEPTOR_READ_RANGE` on `str` (probably only when strict_strings is off as otherwise `str` has been completely checked on the first call).


================
Comment at: lib/sanitizer_common/sanitizer_common_interceptors.inc:510
+    } else {
+      uptr str_length = REAL(strlen)(str);
+      result = REAL(strtok)(str, delimiters);
----------------
I agree with Alex - we shouldn't call `strlen` here, because if `str` isn't zero-terminated we risk to trigger runtime error. The best we can do is

    COMMON_INTERCEPTOR_READ_STRING(ctx, str, 1);

Note that 1 is suboptimal here - we could have used strspn to find the first occurence of delimeters and use that to better estimate the length.


================
Comment at: lib/sanitizer_common/sanitizer_common_interceptors.inc:515
+        COMMON_INTERCEPTOR_READ_STRING_OF_LEN(ctx, str, str_length,
+                                              result_length + 1);
+      }
----------------
Same to line 508

    COMMON_INTERCEPTOR_READ_RANGE(ctx, result, REAL(strlen)(result) + 1);

?


================
Comment at: lib/sanitizer_common/sanitizer_common_interceptors.inc:516
+                                              result_length + 1);
+      }
+    }
----------------
I think else-branch shouldn't be neglected - you can safely do

    COMMON_INTERCEPTOR_READ_RANGE(ctx, str, REAL(strlen) + 1)

(we know that none of delimeters has been found so `str` must have been scanned to the terminating 0).


================
Comment at: lib/sanitizer_common/sanitizer_common_interceptors.inc:519
+    uptr del_length = REAL(strlen)(delimiters);
+    COMMON_INTERCEPTOR_READ_STRING_OF_LEN(ctx, delimiters, del_length, 1);
+    return result;
----------------
Same issue with `strlen` here. Just do

    COMMON_INTERCEPTOR_READ_STRING(ctx, delimeters, 1);

(1 is again suboptimal).


https://reviews.llvm.org/D30384





More information about the llvm-commits mailing list