[llvm-commits] PATCH: add backreference support to Regex

Sean Silva silvas at purdue.edu
Wed Nov 28 10:47:42 PST 2012


Thanks for the explanation; the patch LGTM.

At least for me, the big picture was clear, but I had forgotten about
the part at the end of the linked message where you described the
specific details about how regcomp.c already supports backreferences
so that really all that is needed is put in a little shim to get it to
use backreferences in ERE mode. For me, that's really the context that
was missing for the patch, otherwise, it looked like "add
backreference support to some old untouched C regex code that didn't
support backreferences", which from a code-review perspective would
have required a lot more effort to validate. I think it would have
been really beneficial to include the last two paragraphs of the
linked message in the OP. Also, I think those two paragraphs (or
something equivalent) should be included in the commit message.

-- Sean Silva

On Wed, Nov 28, 2012 at 1:30 PM, Eli Bendersky <eliben at google.com> wrote:
> On Wed, Nov 28, 2012 at 10:12 AM, Sean Silva <silvas at purdue.edu> wrote:
>> I guess the question for me about this patch is "what is it that you
>> want reviewed?". I think most people understand why this is necessary,
>> but I don't think that anyone is really that familiar with this code
>> at all (it looks like it was simply imported directly from OpenBSD)
>> and so you should really explain what it is that needs to be reviewed
>> here, because otherwise it's hard to give an opinion about it.
>
> Sean, fair points, thanks for replying.
>
> Re what I want reviewed, I guess it's no different from any other
> non-trivial patch being sent to the list. I linked to a post
> explaining the motivation, and I think that the message itself
> explains the patch on a "big picture" level pretty well (correct me if
> I'm wrong). Perhaps I can explain the code changes as well? Ok, then:
>
> Yes, this regex implementation was imported from *BSD, and as I
> explained in the original motivation post to llvmdev
> (http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-November/055840.html)
> it's adhering to POSIX in an unusually strict way by disallowing
> backrefs in Extended Regular Expressions (EREs). I assume backrefs are
> a familiar concept, but if not:
>
> "([a-z])_\1"
>
> Means "some character [a-z] followed by an underscore, followed by the
> same character". For example "x_x" will match, but not "x_y" or "9_9".
> backrefs make regular expressions immensely more useful.
>
> Our regex implementation actually supports backrefs, but only for
> Basic Regular Expressions. Luckily, this distinction only plays a role
> while "compiling" the regex to an internal data structure. So my patch
> enhances the compilation process of EREs to also support regexes. This
> will allow us to keep using EREs (their syntax is much nicer than BREs
> overall) while also enabling backrefs, which can help craft more
> powerful and useful FileCheck regular expressions, as well as enable
> the feature I originally wanted - defining a FileCheck variable and
> using it on the same line.
>
> I hope this is enough, please let me know if anything is missing.
>
>>
>> Also, one question that comes to mind is how have you tested this? I'm
>> guessing that the unittests pass and that you rebuilt and ran the
>> entire test suite to see if anything breaks, but it would be nice if
>> you explicitly mentioned it.
>>
>
> Yes, as I mentioned earlier there was just one test in the whole suite
> that would fail given this change. I fixed that test to not rely on
> the old functionality, so it wasn't a problem any longer. I ran "make
> check-all" and added new unit tests that exercise the added
> functionality, and everything passed.
>
> Eli



More information about the llvm-commits mailing list