[LLVMdev] Regular Expression lib support

Sun Aug 23 16:19:19 PDT 2009

GAH!  Why on bloody earth does the LLVM mailing list server not set
the response address to the LLVM mailing list server so I actually
respond to the server instead of the person above me, that is so
different then the other 20+ mailing lists I am subscribed to and I
could have sworn the LLVM mailing list was not broken like that
before...  My message is below:

On Sun, Aug 23, 2009 at 4:29 PM, Kenneth Uildriks<kennethuil at gmail.com> wrote:
> On Sun, Aug 23, 2009 at 4:56 PM, Daniel Dunbar<daniel at zuster.org> wrote:
>> We would like to have access to some kind of regular expression
>> library inside LLVM. For example, we need this to extend the FileCheck
>> test case checking tool to support regular expressions.
>>
>> There are three obvious options:
>>  1. Roll our own library. Multiple unnamed individuals may even
>> already have implementations lying around! :)
>>  2. Use POSIX regcomp facilities. This implies importing some
>> implementation of this interface, e.g., Windows. On Linux, BSD, etc.
>> we would try to use the platform version if available (and non-buggy).
>>
>>  3. Import a more heavy weight library such as PCRE, and use it universally.
>
>
> Personally, I'm a big fan of the Boost libraries.  They've got a regex
> library, and a full-blown parser library (which I am using in my
> front-end).  It's definitely heavier than POSIX, but it's portable,
> well-tested, and loaded with features.

Boost.Xpressive supports both dynamic and static regex's, what that
means is that you can use a regex dynamically (as a string), or you
can create it statically (by building up the AST in a *very*
easy-to-use way).  Honestly, I prefer Boost.Spirit though, which is a
PEG parser, it is pure static and it compiles faster then just about
anything else yet found in existence (we have been testing it on
everything, even the string->int simple parser is faster then atoi,
and it just blows yax/etc... away, no comparison).  PEG's have a
syntax very much like regex, however they are recursable, and fully
greedy with unlimited lookahead.  If you want to include
Boost.Spirit2.1 (and I can easily rip it out so you would just need to
include it and the parts of it that it requires, you would not need to
include boost, and the license not only allows doing this it
encourages it), then I can help you integrate it.  I have been using
Spirit2.1 for a long time and have helped in its creation so I know it
quite well now and would be happy to help include it in any/all of the
needs you would need it here in, and yes, I can just about guarantee
that it will be faster then anything else you could possibly use, and
it is very easy to learn, especially if you already know regex
(although you will have to note that everything is greedy, this is a
'good thing' in PEG grammars, it contributes to their very high speed,
and remember that PEG's can be recursive).  Spirit2.1 also compiles to
*very* tight code (being smaller and faster then identical optimized
hand-coded parsers that a certain company wanted us to compare it
with), it may increase compile time, but you get huge runtime
increases, as well as Spirit2.1 and all it depends on are header-only
files, there are no other libs to include, just headers.