[LLVMdev] Regular Expression lib support

Török Edwin edwintorok at gmail.com
Thu Aug 27 01:37:15 PDT 2009


On 2009-08-27 09:06, Daniel Dunbar wrote:
> 2009/8/25 Török Edwin <edwintorok at gmail.com>:
>   
>> On 2009-08-25 21:18, Daniel Dunbar wrote:
>>     
>>> Woot! Thanks a bunch Edwin!
>>>
>>> Some comments on the patch:
>>> --
>>> I'm not sure if it makes sense to import the man pages, if we only
>>> expose Regex.h.
>>>
>>>       
>> I'd like to keep re_format.7, it describes the format of the regex as
>> accepted by this implementation.
>> I'll remove regex.3 since its not exposed.
>>     
>
> Ok.
>
>   
>>> Can you make these doxyments? Also, I'd prefer more LLVM style and less Unix
>>> style enum names, IgnoreCase for example.
>>>
>>> Does NOSPEC actually ever make sense to use? Wouldn't clients just use string
>>> compare?
>>>
>>>       
>> Indeed, it is worthless for LLVM.
>> There's also REG_PEND (non-POSIX, but this impl. supports it) that
>> allows matching patterns
>> with embedded NULs. Would that be needed/useful for LLVM?
>>     
>
> Maybe? If its already there and we are fine using this version of the
> lib always then we might as well include it.
>   

I changed the constructor to take  StringRef, and now I always use REG_PEND
internally, so it works both with char* and std::string (which may not
be null-terminated, or may include embedded nuls).
You can add a Twine& constructor if needed ;)

>   
>>>> +    // notbol: The match-beginning-of-line operator always fails to match,
>>>> +    //    unless regex was compiled with NEWLINE flag.
>>>> +    // noteol:  The match-end-of-line operator always fails to match.
>>>> +    //    unless regex was compiled with NEWLINE flag.
>>>> +
>>>> +    bool matches(const char *string, bool notbol=false, bool noteol=false);
>>>> +    bool match_sub(const char *string, llvm_regmatch_t pmatch[],
>>>> +                   unsigned nmatch, bool notbol=false, bool noteol=false);
>>>>
>>>>         
>>> These should be doxymented, and I don't understand the flags. Does this mean
>>> that if I *don't* compile with NEWLINE, but *do* want ^/$ support, then
>>> notbol/noteol need to be false? Do we care about use cases that care about this?
>>> Could we just always have them as false?
>>>
>>>       
>> They are meant for cases where you call matches() on portions of a
>> larger string.
>> When you call it at offset 0, you want notbol=false, but when you call
>> it at offset > 0 you want notbol=true.
>> Ditto for noteol, when the portion of string passed to matches() doesn't
>> contain the real end of the string.
>>
>> If we are always passing the entire string we intend to match to
>> matches() then we can remove those bools.
>>     
>
> Ah, that makes sense. For the time being, though, I'd say lets drop
> them until some client actually cares.
>   

Dropped.

>   
>>> This won't work on Windows as stands (Regex.h == regex.h), but I think we should
>>> just keep regex.h private which solves the problem (still nice to rename it to
>>> avoid confusion, llvm_regex.h would be consistent).
>>>
>>>       
>> Ok, I'll try to make it private.
>>     
>
> Ok!
>
> I have one more request; since the implementation ends up being spread
> into a number of files, could we prefix them with something so they
> stand out inside the Support directory?
>   

They are now all lib/Support/reg*

Attached new patch.

Best regards,
--Edwin

-------------- next part --------------
A non-text attachment was scrubbed...
Name: regex-support.patch.gz
Type: application/gzip
Size: 33735 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20090827/612fe731/attachment.bin>


More information about the llvm-dev mailing list