[libcxx-commits] [PATCH] D143349: [libc++] Fix UTF-8 decoding in codecvts. Fix #60177.
Mark de Wever via Phabricator via libcxx-commits
libcxx-commits at lists.llvm.org
Mon Feb 6 04:57:40 PST 2023
Mordante added a comment.
Thanks for working on this! I just did a quick scan over the code. I really want to review it after the formatting changes are undone and the CI passes.
================
Comment at: libcxx/src/locale.cpp:2024
else if (c1 < 0xF0)
{
+ if (frm_end - frm_nxt < 2)
----------------
Can you undo the formatting changes in this hunk? It makes finding the real changes quite hard.
(I know the format CI will probably complain about it, but you can ignore that. We are tuning the CI to not give these unwanted messages in the future.)
================
Comment at: libcxx/test/std/localization/codecvt_unicode.h:8
+//===----------------------------------------------------------------------===//
+
+#include <algorithm>
----------------
Please add include guards.
================
Comment at: libcxx/test/std/localization/codecvt_unicode.h:29
+template <class T, size_t N>
+auto constexpr array_size(const T (&)[N]) -> size_t {
+ return N;
----------------
There's no real reason to use trailing return types here.
================
Comment at: libcxx/test/std/localization/codecvt_unicode.h:35
+void utf8_to_utf32_in_ok(const std::codecvt<CharT, char, mbstate_t>& cvt) {
+ using namespace std;
+ // UTF-8 string of 1-byte CP, 2-byte CP, 3-byte CP and 4-byte CP
----------------
We normally don't do this, it doesn't improve the readability of the code.
================
Comment at: libcxx/test/std/localization/codecvt_unicode.h:36
+ using namespace std;
+ // UTF-8 string of 1-byte CP, 2-byte CP, 3-byte CP and 4-byte CP
+ const char in[] = "b\u0448\uAAAA\U0010AAAA";
----------------
Just to improve the readability.
================
Comment at: libcxx/test/std/localization/codecvt_unicode.h:54
+ assert(t.out_size <= array_size(out));
+ auto state = mbstate_t{};
+ auto in_next = (const char*)nullptr;
----------------
Please don't use `auto` here, this does not match the LLVM coding style.
================
Comment at: libcxx/test/std/localization/locale.categories/category.ctype/locale.codecvt/locale.codecvt.members/char16_t_out.pass.cpp:32
+#include "../../../../codecvt_unicode.h"
#include "test_macros.h"
----------------
I'm not fond of this include path, it feels quite fragile. I think it would be better to move the code to the `test/support` directory then the suggestion above works. (This is the same location as the `test_macros.h` reside.
================
Comment at: libcxx/test/std/localization/locale.stdcvt/codecvt_utf8_in.pass.cpp:275
+ {
+ typedef std::codecvt_utf8<char32_t> C;
+ C c;
----------------
This is the preferred style. For the compilers we support this works in C++03 mode.
You could even consider to remove the entire `typedef` since it's only used once.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D143349/new/
https://reviews.llvm.org/D143349
More information about the libcxx-commits
mailing list