[libcxx-commits] [PATCH] D134036: [libc++][format] Implements string escaping.

Louis Dionne via Phabricator via libcxx-commits libcxx-commits at lists.llvm.org
Tue Oct 18 09:29:45 PDT 2022


ldionne accepted this revision as: libc++.
ldionne added a comment.

I did not look at the algorithms used in detail, but the overall shape LGTM. I'm happy if tests are passing and @tahonermann is happy. Unblocking libc++ review, @tahonermann please let @Mordante know once you're satisfied!

Thanks!



================
Comment at: libcxx/include/__format/formatter_output.h:446
+  switch (__value) {
+  case _CharT('\t'):
+    __str += _LIBCPP_STATICALLY_WIDEN(_CharT, "\\t");
----------------
tahonermann wrote:
> Mordante wrote:
> > tahonermann wrote:
> > > tahonermann wrote:
> > > > I think these case statements should use `_LIBCPP_STATICALLY_WIDEN` or similar as there is no guarantee that the `wchar_t` value for `\t` is the same as for `char`.
> > > Any further thoughts on this? Or is there an existing libc++ policy that requires that characters that can be represented in a single code unit in both the ordinary and wide literal encodings must have the same value? If there is, I expect that policy will have to be revisited if `-fexec-charset` is expanded to include additional encodings like IBM-1047 as was proposed in D93031.
> > Thanks for the additional link. The assumption `_CharT('\t')` that works correctly is hard-coded in several places in libc++ (not limited to the format library). I think it's good to address this, but I prefer to do it in a separate review. I really want to look at how we're going to do that properly and properly test it.
> > 
> > (I expect we need a CI job where we use different encodings for `char` and `wchar_t`.)
> Doing so separately makes sense to me. And I agree that a CI job is needed once a solution is in place.
> Or is there an existing libc++ policy that requires that characters that can be represented in a single code unit in both the ordinary and wide literal encodings must have the same value?

There is no policy, however like @Mordante this is used everywhere. I am somewhat skeptical of the value of supporting that -- I'd have to see the cost/benefit of supporting that to really form an opinion. In all cases, IMO supporting this shouldn't be a concern in this patch, since we don't anywhere else.


================
Comment at: libcxx/include/__format/formatter_output.h:388
+  back_insert_iterator __out_it{__str};
+  std::ranges::copy(__prefix, __null_sentinel{}, back_insert_iterator{__str});
+
----------------
?


================
Comment at: libcxx/utils/generate_escaped_output_table.py:338
+    print(
+        MSVC_FORMAT_UCD_TABLES_HPP_TEMPLATE.lstrip().format(
+            content=generate_data_tables()
----------------
Any reason to keep the `MSVC_` prefix here?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134036/new/

https://reviews.llvm.org/D134036



More information about the libcxx-commits mailing list