[clang] [Clang][Lexer][Performance] Optimize Lexer whitespace skipping logic (PR #180819)

Oliver Hunt via cfe-commits cfe-commits at lists.llvm.org
Sun Feb 15 15:23:30 PST 2026


ojhunt wrote:

@Thibault-Monnier here's what I did:

Generated 500k of random data, consisting of alternating sequences of whitespace or a-z, the whitespace sequences are x% tab character, (100-x)% space character, biased in favor of sequences of repeated whitespace characters at the beginning of each sequence. The length of each white space sequence is biased to be longer at the beginning of every (random) N-character "line".

This entire change is predicated on the impact of successful branch prediction due to the prevalence of spaces vs tabs.

So even though my tests attempt to bias in favor of reasonable bias in terms of whitespace sequence length, it does not specifically try to model "more spaces at the beginning of a line".

But that does not actually matter: the loop we're in is the same loop that occurs irrespective of where we are in a line, so a sequence of tabs only occurring at the beginning of a line is still impacting the same branch. So my measure of the impact of x% tabs (I didn't consider include \f or \v sorry :D).

I think I've managed to break something in my terrible code now in a way that means it reports the conditional behavior being consistently 2%+ faster. I think in my attempt to refactor the data generation to be remotely readable I've probably caused it to always use enough tabs to screw with behavior, irrespective of the stated % goal.

However what I think you should be doing as part of your testing is to include projects where indentation is done with tabs. Even if we do decide it does not matter we should know if there is a different impact.

Alternatively you could take your existing test case, and do a sed replacement of all leading spaces with tabs - I think in the interest of plausibility it should be something like every 4 leading space being converted to a single tab character (rounding up I guess?) and see the impact of your change on that.

https://github.com/llvm/llvm-project/pull/180819


More information about the cfe-commits mailing list