[clang] [Clang][Lexer][Performance] Optimize Lexer whitespace skipping logic (PR #180819)
Thibault Monnier via cfe-commits
cfe-commits at lists.llvm.org
Tue Feb 10 11:50:02 PST 2026
https://github.com/Thibault-Monnier created https://github.com/llvm/llvm-project/pull/180819
... by extracting the check for space character and marking it as `LLVM_LIKELY`. This increases performance because the space is by far the most common horizontal character, so in most cases, this change allows to replace a lookup table check with a simple comparison, reducing latency and helping the cache.
This does not reduce instruction count, as a lookup table and a comparison are both a single instruction. However, it _does_ reduce cycles in a consistent manner, around `0.2` - `0.3`%: [benchmark](https://llvm-compile-time-tracker.com/compare.php?from=3192fe2c7b08912cc72c86471a593165b615dc28&to=faa899a6ce518c1176f2bf59f199eb42e59d840e&stat=cycles). I tested this locally and am able to confirm this is not noise (at least not entirely, it does feel weird that this impacts `O3` more than `O0`...), as I achieved almost `2`% faster PP speed in my tests.
>From faa899a6ce518c1176f2bf59f199eb42e59d840e Mon Sep 17 00:00:00 2001
From: Thibault-Monnier <thibaultmonni at gmail.com>
Date: Tue, 10 Feb 2026 19:41:47 +0100
Subject: [PATCH] Try prioritizing skipping space
---
clang/lib/Lex/Lexer.cpp | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/clang/lib/Lex/Lexer.cpp b/clang/lib/Lex/Lexer.cpp
index 1498657047bd6..483cca32e08a2 100644
--- a/clang/lib/Lex/Lexer.cpp
+++ b/clang/lib/Lex/Lexer.cpp
@@ -2533,8 +2533,8 @@ bool Lexer::SkipWhitespace(Token &Result, const char *CurPtr,
// Skip consecutive spaces efficiently.
while (true) {
- // Skip horizontal whitespace very aggressively.
- while (isHorizontalWhitespace(Char))
+ // Skip horizontal whitespace, especially space, very aggressively.
+ while (LLVM_LIKELY(Char == ' ') || isHorizontalWhitespace(Char))
Char = *++CurPtr;
// Otherwise if we have something other than whitespace, we're done.
@@ -3756,10 +3756,10 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
const char *CurPtr = BufferPtr;
// Small amounts of horizontal whitespace is very common between tokens.
- if (isHorizontalWhitespace(*CurPtr)) {
+ if (LLVM_LIKELY(*CurPtr == ' ') || isHorizontalWhitespace(*CurPtr)) {
do {
++CurPtr;
- } while (isHorizontalWhitespace(*CurPtr));
+ } while (LLVM_LIKELY(*CurPtr == ' ') || isHorizontalWhitespace(*CurPtr));
// If we are keeping whitespace and other tokens, just return what we just
// skipped. The next lexer invocation will return the token after the
More information about the cfe-commits
mailing list