[llvm-commits] PATCH: add backreference support to Regex

Eli Bendersky eliben at google.com
Wed Nov 28 10:30:16 PST 2012


On Wed, Nov 28, 2012 at 10:12 AM, Sean Silva <silvas at purdue.edu> wrote:
> I guess the question for me about this patch is "what is it that you
> want reviewed?". I think most people understand why this is necessary,
> but I don't think that anyone is really that familiar with this code
> at all (it looks like it was simply imported directly from OpenBSD)
> and so you should really explain what it is that needs to be reviewed
> here, because otherwise it's hard to give an opinion about it.

Sean, fair points, thanks for replying.

Re what I want reviewed, I guess it's no different from any other
non-trivial patch being sent to the list. I linked to a post
explaining the motivation, and I think that the message itself
explains the patch on a "big picture" level pretty well (correct me if
I'm wrong). Perhaps I can explain the code changes as well? Ok, then:

Yes, this regex implementation was imported from *BSD, and as I
explained in the original motivation post to llvmdev
(http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-November/055840.html)
it's adhering to POSIX in an unusually strict way by disallowing
backrefs in Extended Regular Expressions (EREs). I assume backrefs are
a familiar concept, but if not:

"([a-z])_\1"

Means "some character [a-z] followed by an underscore, followed by the
same character". For example "x_x" will match, but not "x_y" or "9_9".
backrefs make regular expressions immensely more useful.

Our regex implementation actually supports backrefs, but only for
Basic Regular Expressions. Luckily, this distinction only plays a role
while "compiling" the regex to an internal data structure. So my patch
enhances the compilation process of EREs to also support regexes. This
will allow us to keep using EREs (their syntax is much nicer than BREs
overall) while also enabling backrefs, which can help craft more
powerful and useful FileCheck regular expressions, as well as enable
the feature I originally wanted - defining a FileCheck variable and
using it on the same line.

I hope this is enough, please let me know if anything is missing.

>
> Also, one question that comes to mind is how have you tested this? I'm
> guessing that the unittests pass and that you rebuilt and ran the
> entire test suite to see if anything breaks, but it would be nice if
> you explicitly mentioned it.
>

Yes, as I mentioned earlier there was just one test in the whole suite
that would fail given this change. I fixed that test to not rely on
the old functionality, so it wasn't a problem any longer. I ran "make
check-all" and added new unit tests that exercise the added
functionality, and everything passed.

Eli



More information about the llvm-commits mailing list