[LLVMdev] Regular Expression lib support

OvermindDL1 overminddl1 at gmail.com
Sun Aug 23 21:50:36 PDT 2009


On Sun, Aug 23, 2009 at 10:20 PM, Chris Lattner<clattner at apple.com> wrote:
> On Aug 23, 2009, at 9:11 PM, OvermindDL1 wrote:
>>>
>>> Again, forget boost regex. :)
>>
>> What about std::regex?
>
> No, we have to build with c++'98 compilers.  I think you're missing the
> point here.  We care about code size in llvm, and the best code size you can
> get is to link to the one already in your address space because it's part of
> libc.

There are multiple ones that have been created that fullfill
std::regex that work just fine on C++98, such as boost::regex and
dinkumware's and I know there is at least one other.


On Sun, Aug 23, 2009 at 10:28 PM, Chris Lattner<clattner at apple.com> wrote:
> On Aug 23, 2009, at 9:01 PM, Daniel Berlin wrote:
>>>  2. Use POSIX regcomp facilities. This implies importing some
>>> implementation of this interface, e.g., Windows. On Linux, BSD, etc.
>>> we would try to use the platform version if available (and non-
>>> buggy).
>>
>> Don't do it.
>> They are ridiculous slow, and posix made some really dumb choices in
>> regexps.
>
> We want to use this from FileCheck, which we build at -O0 today.
> Also, each regex will be matched once.  Most testcases use fixed
> strings (in fact 100% of them do today!).  This really is not very
> performance sensitive.
>
> Regex engines like this are inherently more powerful but slower than
> fixed-purpose matching logic.  I don't see a reason not to use a
> (slow!) simple regexec version.
>
> I would also prefer not to have all the crazy features.  Just
> supporting simple matching stuff is perfectly acceptable.  We don't
> need unicode character classes, negative assertions, etc.  We don't
> need the full power of perl regex's.

Again, why not Spirit2.1, works just fine on C++98, and it is fast,
and it is split up into the smallest bits so you only include what you
use, and the assembly it compiles into is *very* tiny, far far less
then any regex library could possibly be.

On Sun, Aug 23, 2009 at 10:28 PM, Chris Lattner<clattner at apple.com> wrote:
> I am more concerned about bugginess, but I doubt that affects simple
> regexes.

Spirit2.1 has testcases for every possible aspect of it, thoroughly
tested and well proven.


On Sun, Aug 23, 2009 at 10:28 PM, Chris Lattner<clattner at apple.com> wrote:
>> That said, if you really have the urge to have a BSD'd implementation
>> of regexec, at least choose something like http://laurikari.net/tre/
>> which is a linear time in size of string.
>
> Nice, we need something for windows.

This is my whole push, if just one thing is used for all platforms,
then you do not have to worry about something working correctly on one
platform, but being slightly different on another and so forth.  I
really do not care what we use, just as long as it is the same thing
for all platforms, not weird platform specific things that might have
different hidden bugs that may not always interoperate.




More information about the llvm-dev mailing list