redstar wrote: I removed all the syntactic sugar inside the loop. Now I get the expected result: with this change, clang is faster than the unmodified version (on CTMark). Please have another look. Thanks. https://github.com/llvm/llvm-project/pull/88040