[libc-commits] [PATCH] D140178: [libc] add fuzz target for strtointeger functions

Mon Dec 19 15:21:54 PST 2022

michaelrj added inline comments.

================
Comment at: libc/fuzzing/stdlib/strtointeger_differential_fuzz.cpp:48
+    container[i] = data[i];
+    cleaner[i] = valid_chars[data[i] % sizeof(valid_chars)];
+  }
----------------
sivachandra wrote:
> michaelrj wrote:
> > sivachandra wrote:
> > > Why should we make the input clean? It defeats the purpose of fuzzing in a way. Since this is a differential fuzz setup, I am not really worried about garbage. Also, valid ASCII characters make half of the `uint8_t` range anyway. So, we don't have to worry about most of the `data` inputs being just invalid numbers. Also, the above list of valid chars does not have the `+` and `-` signs.
> > The reason I want to clean the input is because otherwise almost all of the inputs won't actually exercise the function at all. If the first character is a question mark (for example), then the function won't actually parse anything and it will return 0. Testing that isn't bad, but given that more than half of the ascii range falls into that category there will be a lot of cycles spent on those obvious cases instead of actually testing the branches.
> Your argument is fair. My point is that it might not help in exercising all code paths. Take decimal numbers for example. The set of valid characters is only 10 out of the 62 there are in `VALID_CHARS` now. But, there are more valid numbers as you progress to higher bases. For base 36, I think there are no invalid inputs at all? You can choose to implement two modes - one for human testing and another for continuous testing. For human testing, clean the input like you do in this patch currently. For continuous testing, don't clean the input. You will have to add two targets with a distinguishing pre-processor macro.
> 
> You should add `-` and `+` to the set of valid chars. Or, you can prepend a `-` and `+` to each clean input in the human mode and test again.
this sounds like a good compromise to me.

================
Comment at: libc/src/stdlib/atoi.cpp:16
 LLVM_LIBC_FUNCTION(int, atoi, (const char *str)) {
-  auto result = internal::strtointeger<int>(str, 10);
+  auto result = internal::strtointeger<long>(str, 10);
   if (result.has_error())
----------------
sivachandra wrote:
> I think this change is trying to match the standard prescribed behavior. Can you please do this separately and add unit tests with inputs which after affected by this behavior if any?
I've uploaded a separate patch here: https://reviews.llvm.org/D140350

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140178/new/

https://reviews.llvm.org/D140178