[PATCH] D50839: [llvm] Optimize YAML::isNumeric

Thu Aug 16 06:47:43 PDT 2018

ilya-biryukov added a comment.

Just drive-by comments, the maintainers of the code are in a much better position to give feedback, of course.
Nevertheless, my few cents:

- Getting rid of a regex in favor of the explicit loop is definitely a good thing. It's incredibly how much time is spent there when serializing big chunks of YAML (our case in clangd).
- Other changes are definitely less of a win performance-wise and I personally find the new code a bit harder to read than before, so don't feel as confident about those.
- Actual correctness fixes are a good thing, but could go in as a separate patch.

I would suggest splitting the patch into two: (1) rewriting the regex, (2) other changes. (1) is such a clear win, that it would be a pitty to have it delayed by reviewing other parts of the patch :-)

================
Comment at: llvm/include/llvm/Support/YAMLTraits.h:466
+  static const char HexChars[] = "0123456789abcdefABCDEF";
+  static const char OctalChars[] = "01234567";
+  bool ParseHex = S.startswith("0x");
----------------
I would argue the previous version of parsing hex and octal chars was more readable.
And I'm not sure the new heavily optimized version is more performant too.

================
Comment at: llvm/include/llvm/Support/YAMLTraits.h:494
+  bool FoundExponent = false;
+  for (size_t I = 0; I < Tail.size(); ++I) {
+    char Symbol = Tail[I];
----------------
A more structured way to parse a floating numbers would be to:

1. skip digits until we find `.` or exponent (`e`)
2. if we find `.`, skip digits until we find an exponent.
3. if we find an exponent, skip the next symbol if it's `'+'` or `'-'`, then skip digits until the end of the string.

having a code structure that mirrors that would make the code more readable.

https://reviews.llvm.org/D50839