[libc-commits] [libc] [libc][wctype] Add gen script for conversion functions (PR #198645)

Tue Jun 30 13:01:52 PDT 2026

michaelrj-google wrote:

> I'm sorry about the delay in getting to this - my worklist script had a bug.
> 
> The first thing that jumps out at me is that it's hard to tell what's happening and what I'm reviewing for. A commit message like the following would really help:
> 
> [libc][wctype] Use perfect hash for case conversions (#198645)
> 
> Updated wctype generation scripts to produce PerfectHashMap headers instead of raw array data.
> 
> * Integrated cppyy to execute C++ perfect hash generation during the build process.
> * Split tables into 16-bit and 32-bit segments to optimize for different WINT_MAX sizes.
> 
> But even this is missing the why - is it more performant? What's the memory tradeoff? Is that the right tradeoff for all environments, etc? Without this, coming into this review cold makes it difficult to know whether it's right or not.
> 
> Please let me know if that doesn't make sense! I'm happy to give more details. =)

This PR was part of https://github.com/llvm/llvm-project/pull/187670. Specifically this modifies the python script to generate `lower_to_upper.inc` and `upper_to_lower.inc`. This was split off because the easiest way to generate the tables for the perfect hash function is to use the same code. Given that there was already a python script, @bassiounix used cppyy to include the C++ in the python script. That was good for getting the table done quickly, but it's inconvenient to land in tree. cppyy is a heavy dependency, and it would be a new python dependency for the LLVM project. It's also not super urgent for this PR to land since the tables will only need to be updated when unicode updates, which is not very often.

https://github.com/llvm/llvm-project/pull/198645