[libc-commits] [libc] [llvm] [libc] Change ctype to be encoding independent (PR #110574)
Nick Desaulniers via libc-commits
libc-commits at lists.llvm.org
Mon Dec 2 11:27:55 PST 2024
================
@@ -15,44 +15,556 @@
namespace LIBC_NAMESPACE_DECL {
namespace internal {
-// ------------------------------------------------------
-// Rationale: Since these classification functions are
-// called in other functions, we will avoid the overhead
-// of a function call by inlining them.
-// ------------------------------------------------------
+// -----------------------------------------------------------------------------
+// ****************** WARNING ******************
+// ****************** DO NOT TRY TO OPTIMIZE THESE FUNCTIONS! ******************
+// -----------------------------------------------------------------------------
+// This switch/case form is easier for the compiler to understand, and is
+// optimized into a form that is almost always the same as or better than
+// versions written by hand (see https://godbolt.org/z/qvrebqvvr). Also this
+// form makes these functions encoding independent. If you want to rewrite these
+// functions, make sure you have benchmarks to show your new solution is faster,
+// as well as a way to support non-ASCII character encodings.
----------------
nickdesaulniers wrote:
Perhaps worth a note (either in this comment block, or in the commit message) that the GNU C extension "Case Ranges" were suggested/considered, but the issue is that for EBCDIC, these ranges aren't contiguous as they would be for ASCII. So even if we did use that GNU C extension, we'd still have subranges specifically for EBCDIC.
https://gcc.gnu.org/onlinedocs/gcc/Case-Ranges.html
Example:
```c
bool islower(int ch) {
switch (ch) {
// LOL what is going on here? ugly!
case 'a'...'i':
case 'j'...'r':
case 's'...'z':
return true;
default:
return false;
}
}
```
https://github.com/llvm/llvm-project/pull/110574
More information about the libc-commits
mailing list