[PATCH] Implemented llvm::sys::locale::columnWidth and isPrint for the case of generic UTF8-capable terminal.

Dmitri Gribenko gribozavr at gmail.com
Tue Aug 6 15:23:32 PDT 2013

Comment at: lib/Support/LocaleGeneric.inc:40
@@ +39,3 @@
+///   * surrogates (category = Cs);
+///   * unassigned characters (category = Cn).
+/// \return true if the character is considered printable.
Alexander Kornienko wrote:
> Dmitri Gribenko wrote:
> > This change makes sense, but the list of unassigned characters will probably change in future versions of the unicode standard.  Updating this list might become a maintenance issue.
> > 
> > Stepping back a bit, why can't we just rely on iswprint() here?
> > 
> Maintaining this list shouldn't be much of an issue. Newly assigned characters are not likely to become widely used, especially in C/C++ code, so we don't really need to follow the new standard versions closely. Even if we need to update the list, it can be verified or reconstructed in under half an hour, which is a reasonable effort per new Unicode version we're going to care about.
> iswprint requires setlocale, which is not thread-safe and affects other parts of the program, which is particularly bad if llvm/clang is used as a library. iswprint_l would be much better, but it and the corresponding newlocale function are not available on all platforms. In either case we'd have to know the name of a valid Unicode locale, and the set of locales depends on the operating system.
Then this LGTM.


More information about the llvm-commits mailing list