[libcxx-commits] [libcxx] [libcxx] Work around picolibc argv handling in tests. (PR #127662)

Simon Tatham via libcxx-commits libcxx-commits at lists.llvm.org
Tue Feb 18 08:46:43 PST 2025


https://github.com/statham-arm updated https://github.com/llvm/llvm-project/pull/127662

>From 8b31aa204ce3c3e8601501b9d0d2d193f5741b26 Mon Sep 17 00:00:00 2001
From: Simon Tatham <simon.tatham at arm.com>
Date: Tue, 18 Feb 2025 15:01:53 +0000
Subject: [PATCH 1/2] [libcxx] Work around picolibc argv handling in tests.

This fixes some test failures when the libcxx tests are run against an
up-to-date picolibc on embedded Arm, because those tests depend on an
unsupported locale but the `hasAnyLocale` preliminary check wrongly
concluded that the locale _was_ supported.

`hasAnyLocale` passes a set of locale strings to a test program via
the command line, and checks if the libc under test reports that any
of the locals can be successfully set via setlocale(). In some
invocations one of the locale names contains a space, e.g. the
Windows-style locale name "English_United States.1252".

Unfortunately picolibc's crt0, when running under Arm semihosting,
fetches the single command string from the host and then splits it up
at spaces without implementing any kind of quoting. So it simply isn't
possible to get a space into an argv word. As a result, we end up
testing for the locale (in this example) "English_United". In
up-to-date versions of picolibc, this is actually accepted, since it
contains no objectionable character set specification (or indeed any
at all). So the lit check wrongly concludes that libc supports that
locale, and enables some locale tests, which fail.

This patch works around the issue entirely within `hasAnyLocale()`, by
applying URL %-encoding to each locale name to remove spaces and any
likely command-line quote characters. Then the test program (included
as a string literal within the same Python function) undoes that
quoting. This should make no difference in cases where the argv
quoting worked anyway: the quoting and unquoting is unnecessary in
that situation, but should still be reliable. But in cases where argv
handling is extremely simplistic, this works around it.
---
 libcxx/utils/libcxx/test/dsl.py | 27 ++++++++++++++++++++++++++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/libcxx/utils/libcxx/test/dsl.py b/libcxx/utils/libcxx/test/dsl.py
index 7ff4be0ee7cf9..81db5d099e903 100644
--- a/libcxx/utils/libcxx/test/dsl.py
+++ b/libcxx/utils/libcxx/test/dsl.py
@@ -274,14 +274,38 @@ def hasAnyLocale(config, locales):
     %{exec} -- this means that the command may be executed on a remote host
     depending on the %{exec} substitution.
     """
+
+    # Some locale names contain spaces, and some embedded use cases
+    # (in particular picolibc under Arm semihosting) can't support
+    # spaces in argv words. Work around this by applying URL-style
+    # %-encoding to the locale names, and undoing it in our test
+    # program.
+    percent_encode = lambda s: ''.join(
+        "%{:02x}".format(c) if c in b' %\\"\'' or c >= 128 else chr(c)
+        for c in s.encode())
+
     program = """
     #include <stddef.h>
     #if defined(_LIBCPP_VERSION) && !_LIBCPP_HAS_LOCALIZATION
       int main(int, char**) { return 1; }
     #else
+      #include <stdlib.h>
       #include <locale.h>
       int main(int argc, char** argv) {
         for (int i = 1; i < argc; i++) {
+          // %-decode argv[i] in place
+          for (char *p = argv[i], *q = argv[i]; *p; p++) {
+            if (*p == '%') {
+              char hex[3];
+              hex[0] = *++p;
+              hex[1] = *++p;
+              hex[2] = '\\0';
+              *q++ = (char)strtoul(hex, NULL, 0);
+            } else {
+              *q++ = *p;
+            }
+          }
+          // Try to set the locale
           if (::setlocale(LC_ALL, argv[i]) != NULL) {
             return 0;
           }
@@ -290,7 +314,8 @@ def hasAnyLocale(config, locales):
       }
     #endif
   """
-    return programSucceeds(config, program, args=[shlex.quote(l) for l in locales])
+    return programSucceeds(config, program, args=[
+        shlex.quote(percent_encode(l)) for l in locales])
 
 
 @_memoizeExpensiveOperation(lambda c, flags="": (c.substitutions, c.environment, flags))

>From a343def2bc351d89b1a7f5df13322023ea8574ad Mon Sep 17 00:00:00 2001
From: Simon Tatham <simon.tatham at arm.com>
Date: Tue, 18 Feb 2025 16:45:20 +0000
Subject: [PATCH 2/2] formatting fixes from CI

---
 libcxx/utils/libcxx/test/dsl.py | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/libcxx/utils/libcxx/test/dsl.py b/libcxx/utils/libcxx/test/dsl.py
index 81db5d099e903..6c1d430b86f65 100644
--- a/libcxx/utils/libcxx/test/dsl.py
+++ b/libcxx/utils/libcxx/test/dsl.py
@@ -280,9 +280,10 @@ def hasAnyLocale(config, locales):
     # spaces in argv words. Work around this by applying URL-style
     # %-encoding to the locale names, and undoing it in our test
     # program.
-    percent_encode = lambda s: ''.join(
-        "%{:02x}".format(c) if c in b' %\\"\'' or c >= 128 else chr(c)
-        for c in s.encode())
+    percent_encode = lambda s: "".join(
+        "%{:02x}".format(c) if c in b" %\\\"'" or c >= 128 else chr(c)
+        for c in s.encode()
+    )
 
     program = """
     #include <stddef.h>
@@ -314,8 +315,9 @@ def hasAnyLocale(config, locales):
       }
     #endif
   """
-    return programSucceeds(config, program, args=[
-        shlex.quote(percent_encode(l)) for l in locales])
+    return programSucceeds(
+        config, program, args=[shlex.quote(percent_encode(l)) for l in locales]
+    )
 
 
 @_memoizeExpensiveOperation(lambda c, flags="": (c.substitutions, c.environment, flags))



More information about the libcxx-commits mailing list