[llvm-dev] [RFC] Formalizing FileCheck Features

via llvm-dev llvm-dev at lists.llvm.org
Thu May 24 06:46:03 PDT 2018


Background
----------

FileCheck [0] is a cornerstone testing tool for the LLVM project.  It
has grown new features over the years to meet new needs, but these
sometimes have surprising and counter-intuitive behavior [1].  This
has become even more evident in Joel Denny's recent quest to repair
what seemed like an obvious defect [2] but which led me to the
conclusion [3] that FileCheck sorely needed a clear, intuitive
conceptual model.  And then someone to make it work that way (hi
Joel!).

Basic Conceptual Model
----------------------

FileCheck should operate on the basis of these three fundamental
concepts.

(1) Search range.  This is some substring of the input text where one
or more directives will do their pattern-matching magic.

(2) Match range.  This is a substring of a search range where a
directive (or in one case, a group of directives) has matched a
pattern.

(3) Directive groups.  These are sequences of adjacent directives that
operate in a related way on a search range.  Directives within a group
are processed in order, except as noted in the directive description.

Finally we add The Rule:  No match ranges may overlap.

(This is largely formalizing what FileCheck already does, except that
it didn't have The Rule with respect to DAG matches.  That's the bug
that Joel was originally trying to fix, until I stuck my nose into
it.)

Directive Descriptions Based On Conceptual Model
------------------------------------------------

Given the conceptual model, all directives can be defined in terms of
it. This is possibly going overboard with the formalism but hey, we're
all compiler geeks here.

CHECK: Scans the search range for a pattern match. Fails if no match
is found.  The end of the match range becomes the start of the search
range for subsequent directives.

CHECK-SAME: Like CHECK, plus there must be zero newlines prior to the
start of the match range.

CHECK-NEXT: Like CHECK, plus there must be exactly one newline prior
to the start of the match range.

CHECK-LABEL: All LABEL directives are processed before any other
directives.  These directives have two effects.  First, they act like
CHECK directives, but also partition the input text into disjoint
search ranges, delimited by the match ranges of the LABEL directives.
Second, they partition the remaining directives into Label Groups,
each of which operates on the corresponding search range.  For truly
pedantic formalism, we can say there are implicit LABEL directives
matching the start and end of the entire input text, thus all
non-LABEL directives are always in some Label Group.

CHECK-NOT: A sequence of NOT directives forms a NOT Group. The group
is not executed immediately; instead the next non-NOT directive is
executed first, and the start of that directive's match range becomes
the end of the NOT Group's search range.  (If the next directive is
LABEL, it has already executed and has a match range, which is already
the end of the search range.)  After the NOT Group's search range is
defined, each NOT directive scans the range for a match, and fails if
a match is found.

CHECK-DAG: A sequence of DAG directives forms a DAG Group. The group
is not executed immediately; instead the next non-DAG directive is
executed first, and the start of that directive's match range becomes
the end of the DAG Group's search range.  If the next directive is
CHECK-NOT, the end of the DAG Group's search range is
unaffected. (This might or might not be FileCheck's historical
behavior; I didn't check.)  After the DAG Group's search range is
defined, each DAG directive scans the range for a match, and fails if
a match is not found.  Per The Rule, match ranges for DAG directives
may not overlap. (This is not historical FileCheck behavior, and the
bug Joel Denny wanted to fix.)  After all DAG directives run, the
match range for the entire DAG Group extends from the start of the
earliest match to the end of the latest match.  The end of that match
range becomes the start of the search range for subsequent directives.

Observations
------------

A CHECK-NOT still separates surrounding CHECK-DAG directives into
disjoint groups, and does not permit matches from the two groups to
overlap. DAG was originally implemented to detect and diagnose an
overlap, but this worked only for the first DAG after a NOT. This can
lead to counter-intuitive behavior and potentially makes certain kinds
of matches impossible.

Putting CHECK-SAME and CHECK-NEXT after CHECK-DAG now has defined
behavior, but it's unlikely to be useful.  Putting SAME or NEXT as the
first directive in a file likewise has defined behavior, matching
precisely the first or second line (respectively) of the input text.


References
----------
[0] https://llvm.org/docs/CommandGuide/FileCheck.html
[1] https://www.youtube.com/watch?v=4rhW8knj0L8
[2] https://lists.llvm.org/pipermail/llvm-dev/2018-May/123010.html
[3] https://reviews.llvm.org/D47106



More information about the llvm-dev mailing list