[cfe-dev] [libcxx] RFC: Bringing sanity to platform specific <locale>

Craig, Ben via cfe-dev cfe-dev at lists.llvm.org
Wed Feb 3 07:39:24 PST 2016


The locale feature currently handles platform specific code in a very 
inconsistent and difficult to update manner.  I think there are even 
conformance issues buried in there.  I'll start with my suggested fix, 
then talk about some of the problems that are in the current code base.

Suggested fix:
In the generic parts of the code base, don't call locale functions from 
outside the C and C++ standards directly.  That means no calls to POSIX 
only functions, no calls to the *_l functions, and no calls to 
newlocale.  Instead, call into functions from the support headers.  
These functions will all have __* names.  For example, locale.cpp and 
<locale> would no longer call btowc_l or wcsnrtombs_l directly, but 
would instead call __btowc_l and __wcsnrtombs_l.  On platforms that 
natively have the _l functions, the associated support headers would 
have a static inline forwarding function.

To avoid duplication of effort, common implementations of the various 
functions will be provided in helper headers where it makes sense.  I 
expect there to be a helper header for the __*_l to *_l wrapper 
functions, a helper header for the __*_l functions that ignore the 
locale, and a helper header for no-op versions of [new|free|use|dup]locale.

Current Problem Case 1: OS header abstractions
I'll start with the least nasty case.  There are several "support" 
headers that provide OS specific implementations.  For example, 
support/musl/xlocale.h and support/win32/locale_win32.h.

In some systems, strtoull_l is provided by the C library or OS.  On 
others, the support headers provide an implementation.  Plenty of other 
symbols follow the pattern of strtoull_l (like isupper_l).

I like this approach in general, except that I don't like using the name 
strtoull_l on platforms were the C library and OS don't provide that 
symbol.  I'm not even sure it's conforming, as it doesn't follow the 
__lower or _Capital naming scheme that is reserved for the 
implementation.  A different program (or library) could try to take the 
name strtoull_l, and I think that is supposed to work.  I know I've 
certainly had issues in the past when multiple libraries try to fill in 
the gaps of missing OS / library support, and then those gap fillers end 
up conflicting.

Current Problem Case 2: Nearly recursive abstractions
* wchar_t ctype_byname<wchar_t>::do_widen(char c) const
This function will forward to btowc_l if we think the OS supports it.  
If the OS doesn't have btowc_l, we forward to a libcxx defined 
__btowc_l.  In isolation, this is fine.
* wint_t __btowc_l(int __c, locale_t __l)
This function will forward to btowc_l if we think the OS supports it.  
If the OS doesn't have btowc_l, we set the locale, call btowc, then 
restore the locale.  In isolation, this is also fine.

In combination, we have a strange duplication of effort.  We also have a 
confusion situation for anyone that wants to do better than btowc, but 
doesn't have __btowc_l.

There are a few other symbols that follow this pattern.  (wctob_l, 
wcsnrtombs_l, wcrtomb_l, and plenty of others)

Current Problem Case 3: Hidden abstractions
Windows and Sun includes provide libcxx versions of wcsnrtombs_l in 
their respective support directories / headers.  But libcxx doesn't call 
them, because __wcsnrtombs_l forwards to wcsnrtombs.

Do others see value in fixing these problems?  Is this general approach 
reasonable?


-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project




More information about the cfe-dev mailing list