Re: [PATCH] Let __attribute__((format(…))) accept OFStrings

Jean-Daniel Dupas devlists at shadowlab.org
Tue Nov 26 10:23:41 PST 2013


Le 26 nov. 2013 à 18:35, Jonathan Schleifer <js at webkeks.org> a écrit :

> Am 26.11.2013 um 17:46 schrieb Arthur O'Dwyer <arthur.j.odwyer at gmail.com>:
> 
>> I think Jonathan's current patch is probably the most "Clang-like" way
>> of doing it, but there *is* at least one more option: we could expose
>> the set of printf format specifiers directly to the user and allow the
>> user to customize it via the command line. For example,
>> 
>>   clang -fprintf-support=std test.c   // warns on %I64 and %C
>>   clang -fprintf-support=std,objc test.c   // warns on %I64
>>   clang -fprintf-support=std,objc,win32 test.c   // warns on %K and %{
>>   clang -fprintf-support=std,objfw,gnu test.c   // warns on %I64
>> again but not %K or %{
>>   clang -fprintf-support=c89 test.c   // warns on %zu but not %u
> 
> Well, this is only half a solution, because currently, there *is* a difference between
> 
> __attribute__((format(printf, 1, 2)))
> 
> and
> 
> __attribute__((format(__NSString__, 1, 2))).
> 
> The former wants a C string to be passed to printf. %C and %S are treated like they are specified in POSIX: They are aliases for %lc and %ls.
> 
> Now, when the format is NSString, 3 things are different:
> 
> * The format string is an NSString, which means it is an object.
> * The format string allows %@, which means the parameter is an object.
> * The format string treats %C and %S differently. It considers them to be of type unichar, which is a Cocoa type.
> 
>> and so on.  The defaults could be set according to the language and/or
>> -fobjc-runtime= flags, but the user could override them; for example,
>> maybe he's got a lot of Win32 code (using %I32) in his codebase, which
>> is going to be compiled even though it's dead; he doesn't want to see
>> warnings on %I32, so he adds "win32" to his list of -fprintf-support=
>> flags.
> 
> The thing about win32 I like :). But I think this could also be done depending on the target?
> 
>> The main problem with this idea, IMHO, is that I haven't dealt with
>> functions like ObjC's NSLog() which must be allowed to take %@ even
>> though ObjC's regular printf() is *not* allowed to take %@. So it
>> seems that __attribute__((printf,__NSString__)) is still required.
> 
> Exactly! That's the point why there is a new format string type __NSString__. And that's why I want to do the same for __OFString__. Especially as it's possible to use Cocoa and ObjFW in the same source file (the usual case is connecting the internal, portable core with the platform specific Cocoa UI).
> 
>> Oh, or *alternatively*, Jonathan, you could rework your runtime's
>> Unicode support so that you can use the existing format specifiers and
>> not need to change the API at all! What's wrong with wchar_t, %lc, and
>> %ls again...?  (Feel free to take this part off-list. I think it's a
>> valid solution, but there may be technical reasons against it?)
> 
> Well, first of all, there's problem 1: Clang needs to accept an OFString as a format specifier. Otherwise, a format string cannot be an ObjFW object. If Clang accepts an OFString for type __NSString__, we're at problem 2: __NSString__ handles %C and %S differently, thinking it is of type unichar, a Cocoa-specific type which is defined to unsigned short.
> 
> So, now there are two solutions: The one my patch implements (treating it as of_unichar_t, which is char32_t) or changing my framework. The latter sounds easier at first, but actually isn't: While Cocoa can only handle UCS-2 and thus only the BMP of Unicode, ObjFW uses UCS-4, so all Unicode characters can be used. So a unichar can't store e.g. an Emoji, while an of_unichar_t can. Cocoa uses UTF-16 to work around this, but it's not possible anymore to store an Emoji in a single character.

That's a rather strange way to express it. UTF-16 is not more a workaround than UTF-32 or UTF-8. They are all first class encodings.
Cocoa supports all Unicode planes and encode them using UTF-16 (or even ASCII internally) which is generally far more space efficient than using UTF-32.

FWIW, it is even possible to use emoji in constant NSString generated at compilation time. So telling that Cocoa can only handle UCS-2 is plainly wrong.

> I hope this explains why changing it to use the type unichar in ObjFW is not an option. Even if that would work, problem 1 would still exist.

> Therefore, I think adding a new format string type is a sane solution to solve 1 and 2 that a.) works in any case (e.g. ObjFW with Apple runtime) b.) does not affect anybody else.
> 
> --
> Jonathan
> _______________________________________________
> cfe-commits mailing list
> cfe-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

-- Jean-Daniel








More information about the cfe-commits mailing list