[libcxx-commits] [PATCH] D144499: [libc++][format] Improves width estimate.

Louis Dionne via Phabricator via libcxx-commits libcxx-commits at lists.llvm.org
Thu Apr 20 09:36:00 PDT 2023


ldionne accepted this revision.
ldionne added inline comments.
This revision is now accepted and ready to land.


================
Comment at: libcxx/utils/generate_width_estimation_table.py:345
+        )
+    # The range U+4DC0 - U+4DFF is neutral and should not be in the tale
+    # The range U+1F300 - U+1F5FF is partly in the range, for example
----------------



================
Comment at: libcxx/utils/generate_width_estimation_table.py:197-242
+// UNICODE, INC. LICENSE AGREEMENT - DATA FILES AND SOFTWARE
+//
+// See Terms of Use <https://www.unicode.org/copyright.html>
+// for definitions of Unicode Inc.'s Data Files and Software.
+//
+// NOTICE TO USER: Carefully read the following legal agreement.
+// BY DOWNLOADING, INSTALLING, COPYING OR OTHERWISE USING UNICODE INC.'S
----------------
Mordante wrote:
> tahonermann wrote:
> > cor3ntin wrote:
> > > Mordante wrote:
> > > > tahonermann wrote:
> > > > > Is there something we can do to ensure this gets updated with newer Unicode releases? Perhaps pull the copyright notice from somewhere to run a comparison when the table is regenerated?
> > > > Not at the moment. I have a reminder in my calendar.
> > > > 
> > > > But maybe I can make a periodic GitHub action that downloads the EastAsianWidth.txt file and compares whether it's different.
> > > > This file start with
> > > > ```
> > > > # EastAsianWidth-15.0.0.txt
> > > > # Date: 2022-05-24, 17:40:20 GMT [KW, LI]
> > > > ```
> > > > which I assume will change with every Unicode release.
> > > > 
> > > > I don't want this in the CI and possibly fail the CI since updating the Unicode files might be non-trivial. I don't expect that for this case, but the grapheme clustering might change the rules and thus need changes in the code. (And I'm not sure how Zach's papers will affect what other parts of the Unicode database we need and how stable these rules are.)
> > > > 
> > > > @ldionne do you have an opinion? According to the Standard we should be using the latest Unicode Standard.
> > > > (This is part of P2736R2 and was voted in during the last plenary.)
> > > Given the rate of change (on average once a year) I'm not sure it's worth trying to automate that license file change).
> > > Maybe a "Here is everything that needs to be done to update unicode" document somewhere would be as good / better. 
> > I think such a doc would suffice as well. I agree it wouldn't be worth spending a lot of time to automate.
> The advantage would be that update notifications don't depend on my agenda. If there is a CI job that mails the libcxx maintainers everybody would be aware. I'll discuss it with Louis and see what he thinks.
One suggestion I'd have here is to make updating this part of our release process. In `Contributing.rst`, we have:

```
Post-release check list
=======================

After branching for an LLVM release:

1. Update ``_LIBCPP_VERSION`` in ``libcxx/include/__config``
2. Update the version number in ``libcxx/docs/conf.py``
3. Update ``_LIBCPPABI_VERSION`` in ``libcxxabi/include/cxxabi.h``
4. Update ``_LIBUNWIND_VERSION`` in ``libunwind/include/__libunwind_config.h``
5. Update the list of supported clang versions in ``libcxx/docs/index.rst``
6. Remove the in-progress warning from ``libcxx/docs/ReleaseNotes.rst``
```

We could potentially also have a pre-release checklist there that would include keeping unicode stuff up-to-date. And I'm sure there are other things we could document there as well. I guess the question is who would be tasked with doing the pre-release checklist -- it should probably not be the LLVM release manager since that's too much work. I guess I could do that and potentially distribute some of the pre-release tasks (e.g. I'd probably ask @Mordante to handle the unicode update).


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144499/new/

https://reviews.llvm.org/D144499



More information about the libcxx-commits mailing list