[libcxx-commits] [libcxx] [llvm] [libc++] Implement P1885R12: `<text_encoding>` (PR #141312)

Fri Mar 27 12:57:59 PDT 2026

================
@@ -0,0 +1,61 @@
+//===----------------------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+// REQUIRES: std-at-least-c++26
+// REQUIRES: locale.en_US.UTF-8
+
+// UNSUPPORTED: no-localization
+// UNSUPPORTED: availability-te-environment-missing
+
+// <text_encoding>
+
+// text_encoding text_encoding::environment();
+
+#include <cassert>
+#include <clocale>
+#include <text_encoding>
+
+#include "../test_text_encoding.h"
+#include "platform_support.h"
+
+int main(int, char**) {
+#if !defined(__ANDROID__) || (defined(__ANDROID__) && __ANDROID_API__ >= 26)
+  std::text_encoding te = std::text_encoding::environment();
+  // 1. Depending on the platform's default, verify that environment() returns the corresponding text encoding.
+  {
+#  if defined(__ANDROID__)
+    assert(te.mib() == std::text_encoding::UTF8);
----------------
enh-google wrote:

> With that being said, I've basically implemented the following:

like i said, it really depends on what you mean by "encoding" here...

bionic's encoding is _always_ utf8 in the sense of "if you call a libc function to go to/from mb to wc, it will use utf8 for the mb side". and without a specific explanation why not, i think that's the most [only?] _useful_ interpretation for you to use here?

("technically" -- but probably not usefully to anyone, and i should probably just fix this in a future OS release by removing the code -- only those locales with "UTF-8" in the name [case-sensitive] are "utf8" in the sense of "`MB_LEN_MAX` will return 4 rather than 1". but i'm struggling to argue that that's anything but a bug, and it would seem crazy to duplicate that into libc++.)

> The other solution way I see is: since nl_langinfo_l seems to be an alias to nl_langinfo we could pass a bogus locale_t to the former and check if ASCII or UTF-8.

both functions were introduced in the same api level, so there's no benefit there? and like i said above --- i don't think that what that function says is useful in any way.

----

i think the key point is: "android's mb functions take/return utf8".

i mean, i read the first sentence of the PR ("For historical reasons, all text encodings mentioned in the standard are derived from a locale
object, which does not necessarily match the reality of how programs and systems interact") as "we want to get away from the existing api, and what's encoded in locale names, because it's not helpful". their motivating code examples also seem like they'd all just want you to always return utf8. and their commentary about performance concerns (and the traditional stroustrup "you don't pay for what you don't use" philosophy) seems like android should just be a constant utf8.


https://github.com/llvm/llvm-project/pull/141312