[PATCH] D112105: [ADT] Simplifying hex string parsing so it runs faster in debug modes.

Thu Oct 21 11:29:33 PDT 2021

dblaikie added inline comments.

================
Comment at: llvm/include/llvm/ADT/StringExtras.h:72-87
+    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+     0,  1,  2,  3,  4,  5,  6,  7,  8,  9, -1, -1, -1, -1, -1, -1,  // '0'..'9'
+    -1, 10, 11, 12, 13, 14, 15, -1, -1, -1, -1, -1, -1, -1, -1, -1,  // 'A'..'F'
+    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+    -1, 10, 11, 12, 13, 14, 15, -1, -1, -1, -1, -1, -1, -1, -1, -1,  // 'a'..'f'
----------------
benvanik wrote:
> dblaikie wrote:
> > Does this need to be written out explicitly? How much does this/why does this change the performance (of what situation) significantly?
> > 
> > I guess it's enabling some optimization opportunities in some way by making the constant values more obvious? Perhaps we could write the code in some other way that'd satisfy that?
> > 
> > I wonder what it'd be like if 'LUT' were a constexpr std::array returned from a constexpr function that built it?
> Yes, unfortunately it does. The constexpr version is not guaranteed to expand statically and it in fact does not in MSVC (and may not in other compilers either) in both release and debug. This meant that every single hex nibble lookup was reinitializing the entire lookup table. You can find a deeper discussion about this on discord: https://discord.com/channels/636084430946959380/642426447167881246/900079712930496522
> TLDR: https://cdn.discordapp.com/attachments/642426447167881246/900079709486997624/unknown.png
> 
> Compiler explorer: https://godbolt.org/z/Pb48Mz34K
> Note that even clang in debug mode includes a memcpy of the lookup table to the stack for every single character lookup so it's also able to benefit here (that could be fixed in the old code by making it `constexpr static`, however that doesn't help MSVC and doesn't solve the assert performance of the loop that calls this).
> 
> For background, see https://stackoverflow.com/questions/14248235/when-does-a-constexpr-function-get-evaluated-at-compile-time - basically, constexpr is not guaranteed to be evaluated at compile time, and especially not when used as part of a non-constexpr function (like this). You only get the guarantee when trying to use a constexpr expression within something that must be compile-time evaluated (like another constexpr).
> 
> As for std::array, some STL implementations assert on element lookup and would also have suboptimal performance characteristics here: https://github.com/microsoft/STL/blob/d8f03cf399d730780b6ca0e5321a9ff4fc76bb0f/stl/inc/array#L568, https://github.com/gcc-mirror/gcc/blob/2606dfea12dbe9692c9d576155e63bf3acebd1bf/libstdc%2B%2B-v3/include/std/array#L217 (libc++ doesn't, but I consider that a bug not a feature)
I think I'd be in favor of addressing the clang perf issue with `constexpr static` and letting MSVC be what it is - performance of LLVM is generally tuned for LLVM on LLVM, with clang-cl that's feasible even on an MSVC platform. Seems like if this is in the hot-path for MLIR in a particularly extreme way, it isn't for an LLVM bootstrap, so not tuning for MSVC perf wouldn't be a huge impediment to bootstrapping then using clang-cl?

================
Comment at: llvm/include/llvm/ADT/StringExtras.h:218-243
+  // If the input string is not properly aligned on 2 nibbles we pad out the
+  // front with a 0 prefix; e.g. `ABC` -> `0ABC`.
+  Output.resize((Input.size() + 1) / 2);
+  char *OutputPtr = const_cast<char *>(Output.data());
   if (Input.size() % 2 == 1) {
     uint8_t Hex = 0;
     if (!tryGetHexFromNibbles('0', Input.front(), Hex))
----------------
Might be worth a comment explaining why this is using lower level operations/raw string manipulation so this optimization doesn't get deoptimized/cleaned up by someone else in the future.

================
Comment at: llvm/include/llvm/ADT/StringExtras.h:221
+  Output.resize((Input.size() + 1) / 2);
+  char *OutputPtr = const_cast<char *>(Output.data());
   if (Input.size() % 2 == 1) {
----------------
I'd avoid `const_cast` and use `&s[0]` or similar - `const_cast` tends to require a second look to validate that it's safe/correct (which, I agree, it is in this case & working around missing std::string::data which is added in C++17 which we aren't using in LLVM yet).

================
Comment at: llvm/unittests/ADT/StringExtrasTest.cpp:93

-  std::string InvalidStr = "A5ZX";
+  std::string InvalidStr = "A50\xFF";
   std::string IgnoredOutput;
----------------
benvanik wrote:
> benvanik wrote:
> > dblaikie wrote:
> > > If this is purely a performance improvement - what's motivating this test change?
> > I don't like undefined behavior and wanted to verify my new code handled this case correctly. Prior to this the behavior was undefined if you had any ascii character with value 255 in the input file, which I think can be agreed is not good regardless of whether most normal inputs include it or not.
> > 
> > (I'm curious why the question: is updating tests to improve coverage for fixed bugs something that needs extrinsic motivation? I did call out in the description that I fixed this bug as part of this refactoring)
> (to clarify: this does not reduce test coverage as this was just testing that invalid characters were correctly identified as invalid - I just changed it so the invalid character it was testing was 0xFF)
Sorry, I missed the mention of the bug fix in the commit - agreed that bug fixes should have tests, and agreed that it is a bug/fix.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D112105/new/

https://reviews.llvm.org/D112105