[PATCH] Implemented llvm::sys::locale::columnWidth and isPrint for the case of generic UTF8-capable terminal.

Alexander Kornienko alexfh at google.com
Tue Aug 6 02:04:33 PDT 2013



================
Comment at: lib/Support/LocaleGeneric.inc:40
@@ +39,3 @@
+///   * surrogates (category = Cs);
+///   * unassigned characters (category = Cn).
+/// \return true if the character is considered printable.
----------------
Dmitri Gribenko wrote:
> This change makes sense, but the list of unassigned characters will probably change in future versions of the unicode standard.  Updating this list might become a maintenance issue.
> 
> Stepping back a bit, why can't we just rely on iswprint() here?
> 
Maintaining this list shouldn't be much of an issue. Newly assigned characters are not likely to become widely used, especially in C/C++ code, so we don't really need to follow the new standard versions closely. Even if we need to update the list, it can be verified or reconstructed in under half an hour, which is a reasonable effort per new Unicode version we're going to care about.

iswprint requires setlocale, which is not thread-safe and affects other parts of the program, which is particularly bad if llvm/clang is used as a library. iswprint_l would be much better, but it and the corresponding newlocale function are not available on all platforms. In either case we'd have to know the name of a valid Unicode locale, and the set of locales depends on the operating system.


http://llvm-reviews.chandlerc.com/D1253



More information about the llvm-commits mailing list