[PATCH] D112105: [ADT] Simplifying hex string parsing so it runs faster in debug modes.

Thu Oct 21 18:30:12 PDT 2021

benvanik added inline comments.

================
Comment at: llvm/include/llvm/ADT/StringExtras.h:72-87
+    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+     0,  1,  2,  3,  4,  5,  6,  7,  8,  9, -1, -1, -1, -1, -1, -1,  // '0'..'9'
+    -1, 10, 11, 12, 13, 14, 15, -1, -1, -1, -1, -1, -1, -1, -1, -1,  // 'A'..'F'
+    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+    -1, 10, 11, 12, 13, 14, 15, -1, -1, -1, -1, -1, -1, -1, -1, -1,  // 'a'..'f'
----------------
dblaikie wrote:
> benvanik wrote:
> > dblaikie wrote:
> > > Does this need to be written out explicitly? How much does this/why does this change the performance (of what situation) significantly?
> > > 
> > > I guess it's enabling some optimization opportunities in some way by making the constant values more obvious? Perhaps we could write the code in some other way that'd satisfy that?
> > > 
> > > I wonder what it'd be like if 'LUT' were a constexpr std::array returned from a constexpr function that built it?
> > Yes, unfortunately it does. The constexpr version is not guaranteed to expand statically and it in fact does not in MSVC (and may not in other compilers either) in both release and debug. This meant that every single hex nibble lookup was reinitializing the entire lookup table. You can find a deeper discussion about this on discord: https://discord.com/channels/636084430946959380/642426447167881246/900079712930496522
> > TLDR: https://cdn.discordapp.com/attachments/642426447167881246/900079709486997624/unknown.png
> > 
> > Compiler explorer: https://godbolt.org/z/Pb48Mz34K
> > Note that even clang in debug mode includes a memcpy of the lookup table to the stack for every single character lookup so it's also able to benefit here (that could be fixed in the old code by making it `constexpr static`, however that doesn't help MSVC and doesn't solve the assert performance of the loop that calls this).
> > 
> > For background, see https://stackoverflow.com/questions/14248235/when-does-a-constexpr-function-get-evaluated-at-compile-time - basically, constexpr is not guaranteed to be evaluated at compile time, and especially not when used as part of a non-constexpr function (like this). You only get the guarantee when trying to use a constexpr expression within something that must be compile-time evaluated (like another constexpr).
> > 
> > As for std::array, some STL implementations assert on element lookup and would also have suboptimal performance characteristics here: https://github.com/microsoft/STL/blob/d8f03cf399d730780b6ca0e5321a9ff4fc76bb0f/stl/inc/array#L568, https://github.com/gcc-mirror/gcc/blob/2606dfea12dbe9692c9d576155e63bf3acebd1bf/libstdc%2B%2B-v3/include/std/array#L217 (libc++ doesn't, but I consider that a bug not a feature)
> I think I'd be in favor of addressing the clang perf issue with `constexpr static` and letting MSVC be what it is - performance of LLVM is generally tuned for LLVM on LLVM, with clang-cl that's feasible even on an MSVC platform. Seems like if this is in the hot-path for MLIR in a particularly extreme way, it isn't for an LLVM bootstrap, so not tuning for MSVC perf wouldn't be a huge impediment to bootstrapping then using clang-cl?
I agree that addressing the clang perf issue would be good, but as a user of MSVC who is severely affected by this I have to disagree that letting it be is a good idea - even if only as a point of pride in engineering of what we //can// control. It's important to keep in mind that we're talking about an interview question-level for loop and a 256 element lookup table and that it's slow in any compiler of any era from any vendor is not something //anyone// should feel good about including the owner of the code.

Though MLIR and LLVM share the ADT they are not the same thing: LLVM-bias may make sense for the core tip-of-tree LLVM compiler tools themselves (and to be clear I'm a big fan of clang-cl), but MLIR is deployed independently of LLVM and is used with many other toolchains (gcc/msvc/etc). But even if a user is on a Clang/LLVM toolchain there are many production pipelines that may not pick that up for years - and the asserts still remain and will always be slow anywhere NDEBUG is undefined. Forcing users to change their toolchain is not possible or practical in all cases (most python wheel building infra runs on gcc on old centos distros, for example), nor is it appealing telling them that the reason they'd need to change is for non-technical politics. A bit of empathy for the people running this code would be appreciated :)

================
Comment at: llvm/include/llvm/ADT/StringExtras.h:221
+  Output.resize((Input.size() + 1) / 2);
+  char *OutputPtr = const_cast<char *>(Output.data());
   if (Input.size() % 2 == 1) {
----------------
dblaikie wrote:
> I'd avoid `const_cast` and use `&s[0]` or similar - `const_cast` tends to require a second look to validate that it's safe/correct (which, I agree, it is in this case & working around missing std::string::data which is added in C++17 which we aren't using in LLVM yet).
good call - will do!

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D112105/new/

https://reviews.llvm.org/D112105