[libc-commits] [libc] [llvm] [libc] Change ctype to be encoding independent (PR #110574)

Mon Dec 2 11:27:55 PST 2024

================
@@ -15,44 +15,556 @@
 namespace LIBC_NAMESPACE_DECL {
 namespace internal {
 
-// ------------------------------------------------------
-// Rationale: Since these classification functions are
-// called in other functions, we will avoid the overhead
-// of a function call by inlining them.
-// ------------------------------------------------------
+// -----------------------------------------------------------------------------
+// ******************                 WARNING                 ******************
+// ****************** DO NOT TRY TO OPTIMIZE THESE FUNCTIONS! ******************
+// -----------------------------------------------------------------------------
+// This switch/case form is easier for the compiler to understand, and is
+// optimized into a form that is almost always the same as or better than
+// versions written by hand (see https://godbolt.org/z/qvrebqvvr). Also this
+// form makes these functions encoding independent. If you want to rewrite these
+// functions, make sure you have benchmarks to show your new solution is faster,
+// as well as a way to support non-ASCII character encodings.
----------------
nickdesaulniers wrote:

Perhaps worth a note (either in this comment block, or in the commit message) that the GNU C extension "Case Ranges" were suggested/considered, but the issue is that for EBCDIC, these ranges aren't contiguous as they would be for ASCII.  So even if we did use that GNU C extension, we'd still have subranges specifically for EBCDIC.

https://gcc.gnu.org/onlinedocs/gcc/Case-Ranges.html

Example:
```c
bool islower(int ch) {
  switch (ch) {
  // LOL what is going on here? ugly!
  case 'a'...'i':
  case 'j'...'r':
  case 's'...'z':
    return true;
  default:
    return false;
  }
}
```

https://github.com/llvm/llvm-project/pull/110574