[cfe-dev] [PATCH] Let __attribute__((format(…))) accept OFStrings

Jonathan Schleifer js at webkeks.org
Tue Nov 26 15:37:56 PST 2013


Am 26.11.2013 um 22:53 schrieb Arthur O'Dwyer <arthur.j.odwyer at gmail.com>:

> I propose that it would be a very good idea for the next standard to
> provide format specifiers for char16_t and char32_t. I would nominate
> "%hc" (char16_t) and "%Lc" (char32_t), and the matching "%hs" (array
> of char16_t) and "%Ls" (array of char32_t).

This alone would not solve the problem. C11 has the issue that it is recommended to use Unicode for char16_t and char32_t, but not required and that implementors are free to use another encoding.

So, to really fix this, C1y would need to require Unicode, like C++11 did (no idea why C++11 got it right and they screwed it up in C11 after copying char{16,32}_t over.

The idea is that in the meantime, I do the same Apple does: In order to have a format string as an object, it needs special handling anyway. So I want to introduce the new format string type __OFString__ which takes an OFString object as the format string. I need that anyway, no matter what the outcome of this.

Now that I need my own format string type anyway, I don't see a reason not to do the same as Apple: Interpret %C and %S differently if the format string is an OFString. Apple does *exactly* the same. They special case it to unichar / const unichar*, I special case it to of_unichar_t / const of_unichar_t*.

This does not hurt anybody, as it does not modify any existing behaviour, but instead introduces a new format string type with new behaviour. This is completely independent from the shortcomings of the standard and I'd *really* like to get this in. I need __OFString__ as a format string type anyway, so while I'm at it, I don't see any problem with doing the same special casing Apple does.

While I do map of_unichar_t to C(++)'s char32_t, that does not mean it is the same as char32_t. char32_t is not required to be Unicode - of_unichar_t is. So if C1y introduces a length modifier for char32_t, it would still not be the same: If the system does not use Unicode for char32_t, printf would convert this non-Unicode encoding to whatever multibyte encoding is used for the current locale. So if you put a Unicode character in a char32_t on these systems, it will go wrong.

With of_unichar_t OTOH, I *require* it to be Unicode. Thus I can always assume it is Unicode and convert it to the right multibyte encoding.

So, IMHO, if you really want to fix the standard and do it without any extensions (this could take years, so please, if you are for a standard fix, consider my patch nonetheless), the following would be needed:

* Require char16_t and char32_t to be Unicode (like C++11 does)
** Not required by me, but required to do it right: Require that an array of char16_t may contain UTF-16, so that it is correctly converted to the required multibyte encoding
* Add a length modifier for char16_t / char16_t array / char32_t / char32_t array
** The length modifier for char16_t array should accept UTF-16

And ideally, it should also add the other wchar_t functions for char{16,32}_t - I never got why they were omitted.

But, again, all this will take years. So please, let me just do the same thing for my framework that Apple does for theirs. This worked well for them for years, and it does work well for me too. It will not hurt anybody, will not interfere with anything else and will make me and the users of my framework happy ;).

Thanks.

--
Jonathan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20131127/754ad364/attachment.sig>


More information about the cfe-dev mailing list