[LLVMdev] Regular Expression lib support

Daniel Dunbar daniel at zuster.org
Wed Aug 26 23:06:36 PDT 2009


2009/8/25 Török Edwin <edwintorok at gmail.com>:
> On 2009-08-25 21:18, Daniel Dunbar wrote:
>> Woot! Thanks a bunch Edwin!
>>
>> Some comments on the patch:
>> --
>> I'm not sure if it makes sense to import the man pages, if we only
>> expose Regex.h.
>>
>
> I'd like to keep re_format.7, it describes the format of the regex as
> accepted by this implementation.
> I'll remove regex.3 since its not exposed.

Ok.

>> Can you make these doxyments? Also, I'd prefer more LLVM style and less Unix
>> style enum names, IgnoreCase for example.
>>
>> Does NOSPEC actually ever make sense to use? Wouldn't clients just use string
>> compare?
>>
>
> Indeed, it is worthless for LLVM.
> There's also REG_PEND (non-POSIX, but this impl. supports it) that
> allows matching patterns
> with embedded NULs. Would that be needed/useful for LLVM?

Maybe? If its already there and we are fine using this version of the
lib always then we might as well include it.

>>> +    // notbol: The match-beginning-of-line operator always fails to match,
>>> +    //    unless regex was compiled with NEWLINE flag.
>>> +    // noteol:  The match-end-of-line operator always fails to match.
>>> +    //    unless regex was compiled with NEWLINE flag.
>>> +
>>> +    bool matches(const char *string, bool notbol=false, bool noteol=false);
>>> +    bool match_sub(const char *string, llvm_regmatch_t pmatch[],
>>> +                   unsigned nmatch, bool notbol=false, bool noteol=false);
>>>
>>
>> These should be doxymented, and I don't understand the flags. Does this mean
>> that if I *don't* compile with NEWLINE, but *do* want ^/$ support, then
>> notbol/noteol need to be false? Do we care about use cases that care about this?
>> Could we just always have them as false?
>>
>
> They are meant for cases where you call matches() on portions of a
> larger string.
> When you call it at offset 0, you want notbol=false, but when you call
> it at offset > 0 you want notbol=true.
> Ditto for noteol, when the portion of string passed to matches() doesn't
> contain the real end of the string.
>
> If we are always passing the entire string we intend to match to
> matches() then we can remove those bools.

Ah, that makes sense. For the time being, though, I'd say lets drop
them until some client actually cares.

>> This won't work on Windows as stands (Regex.h == regex.h), but I think we should
>> just keep regex.h private which solves the problem (still nice to rename it to
>> avoid confusion, llvm_regex.h would be consistent).
>>
>
> Ok, I'll try to make it private.

Ok!

I have one more request; since the implementation ends up being spread
into a number of files, could we prefix them with something so they
stand out inside the Support directory?

Thanks again!

 - Daniel




More information about the llvm-dev mailing list