[libcxx-commits] [libcxx] [libcxx] Work around picolibc argv handling in tests. (PR #127662)

via libcxx-commits libcxx-commits at lists.llvm.org
Tue Feb 18 08:30:27 PST 2025


llvmbot wrote:


<!--LLVM PR SUMMARY COMMENT-->

@llvm/pr-subscribers-libcxx

Author: Simon Tatham (statham-arm)

<details>
<summary>Changes</summary>

This fixes some test failures when the libcxx tests are run against an up-to-date picolibc on embedded Arm, because those tests depend on an unsupported locale but the `hasAnyLocale` preliminary check wrongly concluded that the locale _was_ supported.

`hasAnyLocale` passes a set of locale strings to a test program via the command line, and checks if the libc under test reports that any of the locals can be successfully set via setlocale(). In some invocations one of the locale names contains a space, e.g. the Windows-style locale name "English_United States.1252".

Unfortunately picolibc's crt0, when running under Arm semihosting, fetches the single command string from the host and then splits it up at spaces without implementing any kind of quoting. So it simply isn't possible to get a space into an argv word. As a result, we end up testing for the locale (in this example) "English_United". In up-to-date versions of picolibc, this is actually accepted, since it contains no objectionable character set specification (or indeed any at all). So the lit check wrongly concludes that libc supports that locale, and enables some locale tests, which fail.

This patch works around the issue entirely within `hasAnyLocale()`, by applying URL %-encoding to each locale name to remove spaces and any likely command-line quote characters. Then the test program (included as a string literal within the same Python function) undoes that quoting. This should make no difference in cases where the argv quoting worked anyway: the quoting and unquoting is unnecessary in that situation, but should still be reliable. But in cases where argv handling is extremely simplistic, this works around it.

---
Full diff: https://github.com/llvm/llvm-project/pull/127662.diff


1 Files Affected:

- (modified) libcxx/utils/libcxx/test/dsl.py (+26-1) 


``````````diff
diff --git a/libcxx/utils/libcxx/test/dsl.py b/libcxx/utils/libcxx/test/dsl.py
index 7ff4be0ee7cf9..81db5d099e903 100644
--- a/libcxx/utils/libcxx/test/dsl.py
+++ b/libcxx/utils/libcxx/test/dsl.py
@@ -274,14 +274,38 @@ def hasAnyLocale(config, locales):
     %{exec} -- this means that the command may be executed on a remote host
     depending on the %{exec} substitution.
     """
+
+    # Some locale names contain spaces, and some embedded use cases
+    # (in particular picolibc under Arm semihosting) can't support
+    # spaces in argv words. Work around this by applying URL-style
+    # %-encoding to the locale names, and undoing it in our test
+    # program.
+    percent_encode = lambda s: ''.join(
+        "%{:02x}".format(c) if c in b' %\\"\'' or c >= 128 else chr(c)
+        for c in s.encode())
+
     program = """
     #include <stddef.h>
     #if defined(_LIBCPP_VERSION) && !_LIBCPP_HAS_LOCALIZATION
       int main(int, char**) { return 1; }
     #else
+      #include <stdlib.h>
       #include <locale.h>
       int main(int argc, char** argv) {
         for (int i = 1; i < argc; i++) {
+          // %-decode argv[i] in place
+          for (char *p = argv[i], *q = argv[i]; *p; p++) {
+            if (*p == '%') {
+              char hex[3];
+              hex[0] = *++p;
+              hex[1] = *++p;
+              hex[2] = '\\0';
+              *q++ = (char)strtoul(hex, NULL, 0);
+            } else {
+              *q++ = *p;
+            }
+          }
+          // Try to set the locale
           if (::setlocale(LC_ALL, argv[i]) != NULL) {
             return 0;
           }
@@ -290,7 +314,8 @@ def hasAnyLocale(config, locales):
       }
     #endif
   """
-    return programSucceeds(config, program, args=[shlex.quote(l) for l in locales])
+    return programSucceeds(config, program, args=[
+        shlex.quote(percent_encode(l)) for l in locales])
 
 
 @_memoizeExpensiveOperation(lambda c, flags="": (c.substitutions, c.environment, flags))

``````````

</details>


https://github.com/llvm/llvm-project/pull/127662


More information about the libcxx-commits mailing list