[LLVMdev] "grep -w" irregularity
Florian Weimer
fw at deneb.enyo.de
Fri Jan 2 14:31:25 PST 2009
* Daniel Berlin:
>> There have been issues the GNU libc regular expression code. Try
>> running with "unset LANG" (or "LC_ALL=C") and see if it improves
>> things.
>>
>> The problem is that the regexp code used to be unacceptably slow in
>> multi-byte locales such as UTF-8, and the patch Debian applied to
>> improve its speed wasn't 100% correct.
>
> Considering most regexps can be done in linear time, it seems fairly
> dumb to break them to get speed, instead of simply changing
> algorithms.
IIRC, it's not an issue of complexity classes. With multi-byte
character set conversion, the constant factor is just too large.
More information about the llvm-dev
mailing list