[clang] [Clang][Lexer][Performance] Optimize Lexer whitespace skipping logic (PR #180819)
Oliver Hunt via cfe-commits
cfe-commits at lists.llvm.org
Tue Feb 17 12:41:39 PST 2026
ojhunt wrote:
> I think we are splitting hairs here.
>
> It does make sense to optimize for SPACE as it is the most frequent space separators. tabs are not unusual but much less frequent on average. Anything else is not worth even considering. if you use vertical tabulations, you deserve a 2% performance penalty. I liked that the original approach was simple enough that it was justify for a small performance improvement, even if that it is certainly architecture dependent. I don't think it would ever be worse than status quo.
Yeah, I've come to the conclusion that the trade offs warrant this, though I do believe that the table lookup should be replaced with a single condition.
I would like to know if there's any impact from moving the branch into `isHorizontalWhitespace` or whether it's only beneficial in those locations. I'd also like to know if the LLVM_LIKELY is strictly necessary as I don't really like seeing them without a solid justification.
Regardless of what happens there, it might still be worth investigating having isHorizontalWhitespace adopt the condition path rather than falling back on a table lookup, but I would be worried about clang coalescing the branches, e.g.
```cpp
if (Ch==' ') return true;
return C=='\t' || ...;
```
becoming
```cpp
return Ch==' ' || C=='\t' || ...;
```
which a `LIKELY(Ch==' ')` might resolve.
That said, it would be funny if we need to maintain the slow table lookup to retain speed :D
https://github.com/llvm/llvm-project/pull/180819
More information about the cfe-commits
mailing list