[cfe-commits] [PATCH] -finput-charset, multi-byte character and BOM support

Sebastian Redl sebastian.redl at getdesigned.at
Sun Jul 17 05:15:53 PDT 2011


On 17.07.2011 05:21, Scott Conger wrote:
> Attached patch adds support for -finput-charset and automatic text
> conversion when there are multibyte characters or a byte-order-mark is
> present. The net effect is that all internal text should now be in
> UTF-8.
>
> I have the exec charset options mostly working, but I trimmed it down
> to this for now, as it's a decently sized patch as-is.
>
>
> Performance impact:
>
> At a minimum, we have to scan through the input text to see if there
> are any multi-byte characters. There are usually none as portable code
> won't have any. The cost of this is lower if you have SSE2 support as
> I added an optimized version using intrinsics:
>
> For 1000 calls against a 16 MB ASCII buffer, on an AMD Athlon 7850
> (2.81 Ghz) rough costs with GCC were:
> Default checkAscii - 13050 ms
> SSE2 checkAscii - 4025 ms
Whoa ... isn't the total preprocessed size of Cacao.h somewhere around 
2-4 MB? That's up to a second delay per compilation.

Sebastian



More information about the cfe-commits mailing list