[LLVMdev] Regular Expression lib support

Daniel Dunbar daniel at zuster.org
Fri Aug 28 01:53:01 PDT 2009


Nice!

This looks good to me but probably Chris or someone else should sign off on it.

There were two minor warnings on Darwin:
--
In file included from /Volumes/Data/Users/ddunbar/llvm/lib/Support/regexec.c:81:
/Volumes/Data/Users/ddunbar/llvm/lib/Support/regengine.inc: In
function 'sbackref':
/Volumes/Data/Users/ddunbar/llvm/lib/Support/regengine.inc:665:
warning: control reaches end of non-void function
In file included from
/Volumes/Data/Users/ddunbar/llvm/lib/Support/regexec.c:130:
/Volumes/Data/Users/ddunbar/llvm/lib/Support/regengine.inc: In
function 'lbackref':
/Volumes/Data/Users/ddunbar/llvm/lib/Support/regengine.inc:665:
warning: control reaches end of non-void function
/Volumes/Data/Users/ddunbar/llvm/lib/Support/regexec.c: In function
'llvm_regexec':
/Volumes/Data/Users/ddunbar/llvm/lib/Support/regexec.c:157: warning:
comparison between signed and unsigned
--

And one in the unittest:
--
/Volumes/Data/Users/ddunbar/llvm/utils/unittest/googletest/include/gtest/gtest.h:
In function 'testing::AssertionResult
testing::internal::CmpHelperEQ(const char*, const char*, const T1&,
const T2&) [with T1 = int, T2 = long unsigned int]':
/Volumes/Data/Users/ddunbar/llvm/utils/unittest/googletest/include/gtest/gtest.h:650:
  instantiated from 'static testing::AssertionResult
testing::internal::EqHelper<lhs_is_null_literal>::Compare(const char*,
const char*, const T1&, const T2&) [with T1 = int, T2 = size_t, bool
lhs_is_null_literal = false]'
/Volumes/Data/Users/ddunbar/llvm/unittests/Support/RegexTest.cpp:29:
instantiated from here
/Volumes/Data/Users/ddunbar/llvm/utils/unittest/googletest/include/gtest/gtest.h:617:
warning: comparison between signed and unsigned integer expressions
--

 - Daniel

2009/8/27 Török Edwin <edwintorok at gmail.com>:
> On 2009-08-27 09:06, Daniel Dunbar wrote:
>> 2009/8/25 Török Edwin <edwintorok at gmail.com>:
>>
>>> On 2009-08-25 21:18, Daniel Dunbar wrote:
>>>
>>>> Woot! Thanks a bunch Edwin!
>>>>
>>>> Some comments on the patch:
>>>> --
>>>> I'm not sure if it makes sense to import the man pages, if we only
>>>> expose Regex.h.
>>>>
>>>>
>>> I'd like to keep re_format.7, it describes the format of the regex as
>>> accepted by this implementation.
>>> I'll remove regex.3 since its not exposed.
>>>
>>
>> Ok.
>>
>>
>>>> Can you make these doxyments? Also, I'd prefer more LLVM style and less Unix
>>>> style enum names, IgnoreCase for example.
>>>>
>>>> Does NOSPEC actually ever make sense to use? Wouldn't clients just use string
>>>> compare?
>>>>
>>>>
>>> Indeed, it is worthless for LLVM.
>>> There's also REG_PEND (non-POSIX, but this impl. supports it) that
>>> allows matching patterns
>>> with embedded NULs. Would that be needed/useful for LLVM?
>>>
>>
>> Maybe? If its already there and we are fine using this version of the
>> lib always then we might as well include it.
>>
>
> I changed the constructor to take  StringRef, and now I always use REG_PEND
> internally, so it works both with char* and std::string (which may not
> be null-terminated, or may include embedded nuls).
> You can add a Twine& constructor if needed ;)
>
>>
>>>>> +    // notbol: The match-beginning-of-line operator always fails to match,
>>>>> +    //    unless regex was compiled with NEWLINE flag.
>>>>> +    // noteol:  The match-end-of-line operator always fails to match.
>>>>> +    //    unless regex was compiled with NEWLINE flag.
>>>>> +
>>>>> +    bool matches(const char *string, bool notbol=false, bool noteol=false);
>>>>> +    bool match_sub(const char *string, llvm_regmatch_t pmatch[],
>>>>> +                   unsigned nmatch, bool notbol=false, bool noteol=false);
>>>>>
>>>>>
>>>> These should be doxymented, and I don't understand the flags. Does this mean
>>>> that if I *don't* compile with NEWLINE, but *do* want ^/$ support, then
>>>> notbol/noteol need to be false? Do we care about use cases that care about this?
>>>> Could we just always have them as false?
>>>>
>>>>
>>> They are meant for cases where you call matches() on portions of a
>>> larger string.
>>> When you call it at offset 0, you want notbol=false, but when you call
>>> it at offset > 0 you want notbol=true.
>>> Ditto for noteol, when the portion of string passed to matches() doesn't
>>> contain the real end of the string.
>>>
>>> If we are always passing the entire string we intend to match to
>>> matches() then we can remove those bools.
>>>
>>
>> Ah, that makes sense. For the time being, though, I'd say lets drop
>> them until some client actually cares.
>>
>
> Dropped.
>
>>
>>>> This won't work on Windows as stands (Regex.h == regex.h), but I think we should
>>>> just keep regex.h private which solves the problem (still nice to rename it to
>>>> avoid confusion, llvm_regex.h would be consistent).
>>>>
>>>>
>>> Ok, I'll try to make it private.
>>>
>>
>> Ok!
>>
>> I have one more request; since the implementation ends up being spread
>> into a number of files, could we prefix them with something so they
>> stand out inside the Support directory?
>>
>
> They are now all lib/Support/reg*
>
> Attached new patch.
>
> Best regards,
> --Edwin
>
>




More information about the llvm-dev mailing list