[libc] [llvm] Add vector-based strlen implementation for x86_64 and aarch64 (PR #152389)

Thu Aug 7 11:59:42 PDT 2025

Sterling-Augustine wrote:

> You're doing the benchmarks single source I guess?
> 
The glibc benchmark suite is pretty nice. You just #include the .c file you want to use and tag as many function names as you want. That all breaks down if it isn't c and can't be built with gcc though.

I hacked your code using builtin vector types into something more comparable to my original implemenation, and hacked a bit more to build it with clang and insert into the benchmark at link time. So here is a graph of the results on my local machine. With the clang-based vectors, ss2 is somewhat faster than the intrinsic version. avx2 is pretty much the same, and avx512 is somewhat slower. 

char_read is the blue line, and the default today. Ouch. wide_read is the most optimal version we ship today. Better, but still. 

The __strlen_* implementations are glibc hand-coded implementations.

Note the logarithmic scale. You will want to display this on a large monitor. It's all about tradeoffs.

I don't know when the ext_vector_types were added to clang, but at least clang-19 as shipped by debian testing doesn't support them. So if we care about older compilers, we probably shouldn't go this route.

I'll wait to clean all this up and upload it the actual changes until we figure out what to do.

<img width="2554" height="1429" alt="strlen-" src="https://github.com/user-attachments/assets/50e0eb2b-dc37-4992-8b0b-2c3f269c4610" />

https://github.com/llvm/llvm-project/pull/152389