[LLVMdev] "grep -w" irregularity

Florian Weimer fw at deneb.enyo.de
Fri Jan 2 14:31:25 PST 2009


* Daniel Berlin:

>> There have been issues the GNU libc regular expression code.  Try
>> running with "unset LANG" (or "LC_ALL=C") and see if it improves
>> things.
>>
>> The problem is that the regexp code used to be unacceptably slow in
>> multi-byte locales such as UTF-8, and the patch Debian applied to
>> improve its speed wasn't 100% correct.
>
> Considering most regexps can be done in linear time, it seems fairly
> dumb to break them to get speed, instead of simply changing
> algorithms.

IIRC, it's not an issue of complexity classes.  With multi-byte
character set conversion, the constant factor is just too large.



More information about the llvm-dev mailing list