[LLVMdev] Proposal: Enhance FileCheck's variable matching capabilities

Eli Bendersky eliben at google.com
Fri Nov 16 05:58:26 PST 2012


Hello,

FileCheck allows us to define match variables and use them on
subsequent lines. This is quite useful, but could be even more useful
if it was possible to use the variable later on the same line it
matched. For example, I would want to write this:

; CHECK: bic [[REG:r[0-9]+]], [[REG]], #3

But I currently can't because [[REG]] will only match a REG variable
defined on a _previous_ line. As the FileCheck ref manual
(http://llvm.org/docs/CommandGuide/FileCheck.html) mentions, there are
known workarounds like having two separate CHECK lines. However, this
is a hacky and inelegant solution that makes code less readable.

I hope the rationale for the code above is clear: I want my
instruction to be acting on the same register, though I don't care
which (and don't want the test to be affected by reg-alloc changes
sometime in the future).

IMHO the proposed feature here is the natural way to match the same
register on a line, and it can be useful in writing tests. I know it
would be extremely useful in the tests I'm currently writing (not
upstream yet, but soon will be).

If the idea sounds good to people, all that's left is the
implementation :-) I already have it in my branches and can provide a
full implementation with tests (FileCheck now has a test/ dir all of
its own since r168113) if the proposal is accepted.

The rough outline of the implementation:

To enable such matching in a natural way, our regex implementation
needs to support backreferences in matches. This then allows to find
all references to a variable defined on the same line and substitute
them by backrefs.

Luckily, our regex implementation already supports backreferences,
although a bit of hacking is required to enable it. It supports both
Basic Regular Expressions (BREs) and Extended Regular Expressions
(EREs), without supporting backrefs for EREs, following POSIX strictly
in this respect. And EREs is what we actually use (rightly). This is
contrary to many implementations (including the default on Linux) of
POSIX regexes, that do allow backrefs in EREs.

Adding backref support to our EREs is a very simple change in the
regcomp parsing code. I fail to think of significant cases where it
would clash with existing things, and can bring more versatility to
the regexes we write. There's always the danger of a backref in a
specially crafted regex causing exponential matching times, but since
we mainly use them for testing purposes I don't think it's a big
problem. [it can also be placed behind a flag specific to FileCheck,
if needed].

Please share your thoughts,

Eli



More information about the llvm-dev mailing list