[PATCH] D112105: [ADT] Simplifying hex string parsing so it runs faster in debug modes.

Thu Oct 21 09:04:04 PDT 2021

benvanik added a comment.

In D112105#3076917 <https://reviews.llvm.org/D112105#3076917>, @dblaikie wrote:

> We don't usually optimize for debug performance - is there a particular case where this comes up that's worth prioritizing/changing code for (& is there a good heuristic we can use about which debug performance is going to matter and which isn't?)

This matters when parsing nearly any non-unit-test MLIR file as constant values over a small size are encoded as hex strings - and in ML models the constants get quite large. The numbers included show that this dramatically improves release performance as well. But what made me look into this is that parsing these files in debug mode is a normal activity as usually one is using a crash reproducer to debug some intermediate stage of compilation where lots of data is encoded in the file - waiting up to several minutes for a hex to parse (note the above tests **mini**lm, there are much larger examples) on a 4ghz machine in 2021 before being able to even step into the first line of code is not good :)

Agreed that minor adjustments for debug only performance improvements aren't usually worth it but in this case the code was near unusable in its previous form - orders of magnitude matter. For something as foundational as parsing hex strings - where this particular function is invoked in the critical path potentially billions of times - there's little worth in being needlessly slow. One reason to be slow would be to gain additional protection or verification, however there was no meaningful additional verification being performed: the hidden asserts that are avoided here were already verified higher up by way of the input string length checks and asserts outside the hot inner loop. Sometimes a conceptual for loop indexing into an array should really just be a for loop indexing into an array :)

================
Comment at: llvm/include/llvm/ADT/StringExtras.h:71
+  /* clang-format off */
+  static const int16_t LUT[256] = {
+    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
----------------
dblaikie wrote:
> Can we drop the array dimension here and rely on the array contents to be the right size?
For lookup tables where the size of the array is meaningful it's useful to have the size specified - here we must have exactly the range of a byte of elements and being able to communicate that to readers and the compiler (to get compile-time errors if there is an element missing) is important. Having the size specified also made it easy for me to spot the bug here (that before it did not have enough elements). Omitting array dimensions and allowing it to be inferred from the contents is useful when working with dynamic lists that may be modified frequently - if the size of the list must be a precise value it's usually harmful to elide it.

================
Comment at: llvm/include/llvm/ADT/StringExtras.h:72-87
+    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+     0,  1,  2,  3,  4,  5,  6,  7,  8,  9, -1, -1, -1, -1, -1, -1,  // '0'..'9'
+    -1, 10, 11, 12, 13, 14, 15, -1, -1, -1, -1, -1, -1, -1, -1, -1,  // 'A'..'F'
+    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+    -1, 10, 11, 12, 13, 14, 15, -1, -1, -1, -1, -1, -1, -1, -1, -1,  // 'a'..'f'
----------------
dblaikie wrote:
> Does this need to be written out explicitly? How much does this/why does this change the performance (of what situation) significantly?
> 
> I guess it's enabling some optimization opportunities in some way by making the constant values more obvious? Perhaps we could write the code in some other way that'd satisfy that?
> 
> I wonder what it'd be like if 'LUT' were a constexpr std::array returned from a constexpr function that built it?
Yes, unfortunately it does. The constexpr version is not guaranteed to expand statically and it in fact does not in MSVC (and may not in other compilers either) in both release and debug. This meant that every single hex nibble lookup was reinitializing the entire lookup table. You can find a deeper discussion about this on discord: https://discord.com/channels/636084430946959380/642426447167881246/900079712930496522
TLDR: https://cdn.discordapp.com/attachments/642426447167881246/900079709486997624/unknown.png

Compiler explorer: https://godbolt.org/z/Pb48Mz34K
Note that even clang in debug mode includes a memcpy of the lookup table to the stack for every single character lookup so it's also able to benefit here (that could be fixed in the old code by making it `constexpr static`, however that doesn't help MSVC and doesn't solve the assert performance of the loop that calls this).

For background, see https://stackoverflow.com/questions/14248235/when-does-a-constexpr-function-get-evaluated-at-compile-time - basically, constexpr is not guaranteed to be evaluated at compile time, and especially not when used as part of a non-constexpr function (like this). You only get the guarantee when trying to use a constexpr expression within something that must be compile-time evaluated (like another constexpr).

As for std::array, some STL implementations assert on element lookup and would also have suboptimal performance characteristics here: https://github.com/microsoft/STL/blob/d8f03cf399d730780b6ca0e5321a9ff4fc76bb0f/stl/inc/array#L568, https://github.com/gcc-mirror/gcc/blob/2606dfea12dbe9692c9d576155e63bf3acebd1bf/libstdc%2B%2B-v3/include/std/array#L217 (libc++ doesn't, but I consider that a bug not a feature)

================
Comment at: llvm/unittests/ADT/StringExtrasTest.cpp:93

-  std::string InvalidStr = "A5ZX";
+  std::string InvalidStr = "A50\xFF";
   std::string IgnoredOutput;
----------------
dblaikie wrote:
> If this is purely a performance improvement - what's motivating this test change?
I don't like undefined behavior and wanted to verify my new code handled this case correctly. Prior to this the behavior was undefined if you had any ascii character with value 255 in the input file, which I think can be agreed is not good regardless of whether most normal inputs include it or not.

(I'm curious why the question: is updating tests to improve coverage for fixed bugs something that needs extrinsic motivation? I did call out in the description that I fixed this bug as part of this refactoring)

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D112105/new/

https://reviews.llvm.org/D112105