[llvm-dev] [RFC] Formalizing FileCheck Features

Joel E. Denny via llvm-dev llvm-dev at lists.llvm.org
Thu May 24 08:58:15 PDT 2018


Hi Paul,

On Thu, May 24, 2018 at 9:46 AM, <paul.robinson at sony.com> wrote:

> Background
> ----------
>
> FileCheck [0] is a cornerstone testing tool for the LLVM project.  It
> has grown new features over the years to meet new needs, but these
> sometimes have surprising and counter-intuitive behavior [1].  This
> has become even more evident in Joel Denny's recent quest to repair
> what seemed like an obvious defect [2] but which led me to the
> conclusion [3] that FileCheck sorely needed a clear, intuitive
> conceptual model.


Agreed.  Thanks for doing this.



> And then someone to make it work that way (hi
> Joel!).
>

Sure, I can help with the implementation given that I'm running into these
issues a lot in my own work.  As I'm a bit too close to the FileCheck
implementation at this point, I would suggest that someone else write the
initial specification-based tests to find the deviations from the
description we arrive at.  Paul, you're the obvious person for that one.  I
can of course work on further implementation-based testing.

I also recommend we make changes toward the new specification incrementally.


> Basic Conceptual Model
> ----------------------
>
> FileCheck should operate on the basis of these three fundamental
> concepts.
>
> (1) Search range.  This is some substring of the input text where one
> or more directives will do their pattern-matching magic.
>
> (2) Match range.  This is a substring of a search range where a
> directive (or in one case, a group of directives) has matched a
> pattern.
>
> (3) Directive groups.  These are sequences of adjacent directives that
> operate in a related way on a search range.  Directives within a group
> are processed in order, except as noted in the directive description.
>
> Finally we add The Rule:  No match ranges may overlap.
>
> (This is largely formalizing what FileCheck already does, except that
> it didn't have The Rule with respect to DAG matches.  That's the bug
> that Joel was originally trying to fix, until I stuck my nose into
> it.)
>

I agree with The Rule.  I haven't found any real use case yet that needs to
violate that rule.


> Directive Descriptions Based On Conceptual Model
> ------------------------------------------------
>
> Given the conceptual model, all directives can be defined in terms of
> it. This is possibly going overboard with the formalism but hey, we're
> all compiler geeks here.
>

I think it's great.


>
> CHECK: Scans the search range for a pattern match. Fails if no match
> is found.  The end of the match range becomes the start of the search
> range for subsequent directives.
>
> CHECK-SAME: Like CHECK, plus there must be zero newlines prior to the
> start of the match range.
>

... within the search range.

Should it be possible for CHECK-SAME match range to include newlines?


>
> CHECK-NEXT: Like CHECK, plus there must be exactly one newline prior
> to the start of the match range.
>

... within the search range.

Your choice to talk about the match range rather than the search range for
CHECK-SAME and CHECK-NEXT implies you like the current behavior that
extends the search range beyond these match range restrictions and then
complains if the match range restrictions aren't met.  For example,
CHECK-SAME searches past the newline and then complains if the match range
starts after the newline.  Is that what you prefer?

I'd note that, in the case of CHECK-NEXT, that choice can restrict what
CHECK-NEXT can match.  That is, it will complain about a match on the
previous line rather than skip it and look on the next line.


>
> CHECK-LABEL: All LABEL directives are processed before any other
> directives.  These directives have two effects.  First, they act like
> CHECK directives, but also partition the input text into disjoint
> search ranges, delimited by the match ranges of the LABEL directives.
> Second, they partition the remaining directives into Label Groups,
> each of which operates on the corresponding search range.  For truly
> pedantic formalism, we can say there are implicit LABEL directives
> matching the start and end of the entire input text, thus all
> non-LABEL directives are always in some Label Group.
>
> CHECK-NOT: A sequence of NOT directives forms a NOT Group. The group
> is not executed immediately; instead the next non-NOT directive is
> executed first, and the start of that directive's match range becomes
> the end of the NOT Group's search range.


Based on the following, that wording is not quite right when a DAG group
follows, so there should probably be some note about that here.



>   (If the next directive is
> LABEL, it has already executed and has a match range, which is already
> the end of the search range.)  After the NOT Group's search range is
> defined, each NOT directive scans the range for a match, and fails if
> a match is found.
>

> CHECK-DAG: A sequence of DAG directives forms a DAG Group. The group
> is not executed immediately; instead the next non-DAG directive is
> executed first, and the start of that directive's match range becomes
> the end of the DAG Group's search range.


That's definitely a change from the current behavior.  Currently, the DAG
group finds its own end based on the farthest match.


>   If the next directive is
> CHECK-NOT, the end of the DAG Group's search range is
> unaffected.


Unaffected means that it's as if there's no following directive?  So next
CHECK-LABEL (possibly the implicit one at EOF)?  What if there's a CHECK,
CHECK-NEXT, or CHECK-SAME after all the DAGs and NOTs?


> (This might or might not be FileCheck's historical
> behavior; I didn't check.)  After the DAG Group's search range is
> defined, each DAG directive scans the range for a match, and fails if
> a match is not found.  Per The Rule, match ranges for DAG directives
> may not overlap. (This is not historical FileCheck behavior, and the
> bug Joel Denny wanted to fix.)  After all DAG directives run, the
> match range for the entire DAG Group extends from the start of the
> earliest match to the end of the latest match.  The end of that match
> range becomes the start of the search range for subsequent directives.
>

That last sentence contradicts the first few sentences: the subsequent
directive has already been matched.

One point not addressed here is the start of the DAG group's search range.
Currently, if the DAG group is preceded by a NOT group preceded by a DAG
group, the last DAG group's search range starts at the start of the first
DAG group's match range.  Any matches in the first DAG group's match range
produces a reordering error.  This is somewhat similar to the CHECK-SAME
and CHECK-NEXT behavior I mentioned earlier: the search ranges permit
invalid match ranges and then complain about them in an effort to diagnose
mistakes.  However, that restricts what can be matched.

I'm not claiming that either behavior is best.  It's not clear to me.  The
best use of DAG-NOT-DAG is very confusing to me.  An effort to prescribe
the right semantics to it needs to be informed by real use cases, in my
opinion.


>
> Observations
> ------------
>
> A CHECK-NOT still separates surrounding CHECK-DAG directives into
> disjoint groups, and does not permit matches from the two groups to
> overlap. DAG was originally implemented to detect and diagnose an
> overlap, but this worked only for the first DAG after a NOT. This can
> lead to counter-intuitive behavior and potentially makes certain kinds
> of matches impossible.
>

I definitely agree it shouldn't be just the first DAG.  The reordering
detection should happen for all consecutive DAGs after the NOT or none of
them.


>
> Putting CHECK-SAME and CHECK-NEXT after CHECK-DAG now has defined
> behavior, but it's unlikely to be useful.


I believe they had predictable behavior before (their search ranges started
at the end of the match range for the entire CHECK-DAG), but it's different
with the above description (they define the end of the search range for the
preceding CHECK-DAG group).

Thanks.

Joel


>   Putting SAME or NEXT as the
> first directive in a file likewise has defined behavior, matching
> precisely the first or second line (respectively) of the input text.
>
>
> References
> ----------
> [0] https://llvm.org/docs/CommandGuide/FileCheck.html
> [1] https://www.youtube.com/watch?v=4rhW8knj0L8
> [2] https://lists.llvm.org/pipermail/llvm-dev/2018-May/123010.html
> [3] https://reviews.llvm.org/D47106
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180524/bfe1bed4/attachment-0001.html>


More information about the llvm-dev mailing list