[clang] [clang] Provide an SSE4.2 implementation of identifier token lexer (PR #68962)

Timm Baeder via cfe-commits cfe-commits at lists.llvm.org
Fri Oct 13 02:09:00 PDT 2023


================
@@ -1847,19 +1851,46 @@ bool Lexer::LexUnicodeIdentifierStart(Token &Result, uint32_t C,
   return true;
 }
 
+static const char *fastParseASCIIIdentifier(const char *CurPtr, const char* BufferEnd) {
+#ifdef __SSE4_2__
+  static constexpr char AsciiIdentifierRange[16] = {
+      '_', '_', 'A', 'Z', 'a', 'z', '0', '9',
+  };
+  constexpr ssize_t BytesPerRegister = 16;
+
+  while (LLVM_LIKELY(BufferEnd - CurPtr >= BytesPerRegister)) {
+    __m128i AsciiIdentifierRangeV = _mm_loadu_si128((const __m128i *)AsciiIdentifierRange);
----------------
tbaederr wrote:

Can't you pull this out of the loop?

And can't you do an aligned load here (if you add the right `alignas()` to the declaration)?

https://github.com/llvm/llvm-project/pull/68962


More information about the cfe-commits mailing list