<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/109841>109841</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[libc] Make char parsing code encoding independent
</td>
</tr>
<tr>
<th>Labels</th>
<td>
code-cleanup,
libc
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
michaelrj-google
</td>
</tr>
</table>
<pre>
What: Some of our string parsing code assumes ASCII, for example the string to integer code: https://github.com/llvm/llvm-project/blob/main/libc/src/__support/str_to_integer.h#L37 We should move this to be encoding independant, likely using switch statements.
Why change it: Easier support for non-ASCII character encodings (e.g. wide character, EBCDIC).
Will be bad for performance: No, it might actually be better. Clang is very good at optimizing this sort of switch statement: https://godbolt.org/z/qvrebqvvr
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJyUU7FuwzgM_Rp5IWLIdBIng4c0qYECd7d06BjIFmOrlSVXotNLv_4gt72i6C23WIP5yPceH1WMpndEtdjcic0pUzMPPtSj6QZFNjyveu97S1nr9a1-GhSL8gCPfiTwF_BzgMjBuB4mFWJ6O68JVIzzSBEOj8eHB4FHuPgA9LcaJ0vAA32B2INxTD2FBZdaD8xTFOVBYCOw6Q0Pc5t3fhTYWHv9elZT8M_UscCmtb4V2IzKuPTTtJ3AJob0PZ_jPE0-pLLI4cz-_DkuHwSWf5QVPBHEwc9Ww-iviZuJiVVLQK7zOpE0TtNETivHSYo1L2RvMC9q45vhboDIimkkxzEX8iTk4eP7NNygG5TrCczi272KhgJ8slpccd6tFptSZVAdU_h3dASBO8r7HN6Mpu-CROP-7nh6OArc_5xorE3kW6WX7hOFiw-jct1i7l8-QQ3DaPqBQXU8K2tvC4KYKeRwtCppjnClcIPeew2KwU9sRvO-7CxZFBN9f_ml_z826HXrLec-9AKbd4HN6zVQ-3q9hkzXpd6Xe5VRXVRYoZSb3S4b6rKQiGori92-arXGbbUuKrlX5aUttlJSZmqUuJZ7XBfVZotVLjed7C6XCrdFt9VrFGtJozI2T2FJszMT40x1Ife7dZFZ1ZKNS-QRU_JWnSXl5kkgCjwKxI8cYbqIUC-Ja-c-irW0JnL8bsuG7XI7C2Bzgj_Vy8emfl7ErzSR42wOtv7feV-ERIHNp5Zrjf8EAAD__977P_w">