[llvm-dev] [RFC] Formalizing FileCheck Features

Sat May 26 09:11:30 PDT 2018

Hi Paul,

On Fri, May 25, 2018 at 10:40 AM, <paul.robinson at sony.com> wrote:

> > Should it be possible for CHECK-SAME match range to include newlines?
>
> It is possible to write a regex that matches newlines.  Doing that in
> CHECK-SAME seems a bit odd but I don't think it's worth trying to forbid
> it.
>

OK, so SAME has the sense of matching *starting* on the same line rather
than *within* the same line.  Seems fine.

> > I'd note that, in the case of CHECK-NEXT, that choice can restrict what
> > CHECK-NEXT can match.  That is, it will complain about a match on the
> > previous line rather than skip it and look on the next line.
>
> Ah, so we could define CHECK-NEXT as: move the start of the search
> range past the first newline, then behaves as CHECK-SAME?

Right.

But, appending {{.*$}} to the previous pattern should have the same
> effect if you have a CHECK-NEXT that runs into that problem.

So the current behavior is more flexible even if less intuitive at first
glance (to me, at least).  It's also more consistent with the way search
ranges work in general.

I think this subtlety and this tip should be mentioned in the user
documentation. Also, because sometimes the previous directive isn't nearby
or could be one of many directives due to multiple check prefixes, the docs
should also offer this formula:

CHECK-SAME: {{.*}}
CHECK-NEXT: your pattern

And I
> do think it's valuable for SAME and NEXT to tell you they found
> matches but not on the line you asked for. So I'd prefer to leave these
> defined as they are.

Agreed.

>> CHECK-NOT: A sequence of NOT directives forms a NOT Group. The group
> >> is not executed immediately; instead the next non-NOT directive is
> >> executed first, and the start of that directive's match range becomes
> >> the end of the NOT Group's search range.
> >
> > Based on the following, that wording is not quite right when a DAG
> > group follows, so there should probably be some note about that here.
>
> So, "the next non-NOT directive or DAG group is executed ... the start
> of that directive or group's match range ..." ?
>

Sounds good.

> >>  (If the next directive is
> >> LABEL, it has already executed and has a match range, which is already
> >> the end of the search range.)  After the NOT Group's search range is
> >> defined, each NOT directive scans the range for a match, and fails if
> >> a match is found.
> >>
> >> CHECK-DAG: A sequence of DAG directives forms a DAG Group. The group
> >> is not executed immediately; instead the next non-DAG directive is
> >> executed first, and the start of that directive's match range becomes
> >> the end of the DAG Group's search range.
> >
> > That's definitely a change from the current behavior.  Currently, the
> > DAG group finds its own end based on the farthest match.
>
> Oh good catch.  Copy-thinko from the NOT description.  NOT is the only
> kind of directive that has deferred execution.
>
> >>  If the next directive is
> >> CHECK-NOT, the end of the DAG Group's search range is
> >> unaffected.
> >
> > Unaffected means that it's as if there's no following directive?  So
> > next CHECK-LABEL (possibly the implicit one at EOF)?  What if there's
> > a CHECK, CHECK-NEXT, or CHECK-SAME after all the DAGs and NOTs?
>
> If DAG doesn't have deferred execution then the end of the search range
> is the next (explicit or implicit) CHECK-LABEL point, end of story.
>

> >>  After all DAG directives run, the
> >> match range for the entire DAG Group extends from the start of the
> >> earliest match to the end of the latest match.  The end of that match
> >> range becomes the start of the search range for subsequent directives.
> >
> > That last sentence contradicts the first few sentences: the subsequent
> > directive has already been matched.
>
> Right, fixing the previous bug means this sentence says the right thing.
>

Yep, I agree it's fixed.

>
> > One point not addressed here is the start of the DAG group's search
> > range.  Currently, if the DAG group is preceded by a NOT group
> > preceded by a DAG group, the last DAG group's search range starts at
> > the start of the first DAG group's match range.  Any matches in the
> > first DAG group's match range produces a reordering error.  This is
> > somewhat similar to the CHECK-SAME and CHECK-NEXT behavior I mentioned
> > earlier: the search ranges permit invalid match ranges and then
> > complain about them in an effort to diagnose mistakes.  However, that
> > restricts what can be matched.
> >
> > I'm not claiming that either behavior is best.  It's not clear to me.
> > The best use of DAG-NOT-DAG is very confusing to me.  An effort to
> > prescribe the right semantics to it needs to be informed by real use
> > cases, in my opinion.
>
> I did some email archaeology, and found this exchange on llvm-dev between
> myself and Michael Liao (original DAG implementor) 13 Mar 2016:
>
> pr> Commentary in FileCheck itself can easily be interpreted to mean the
> pr> intent was that –NOT would scan the region between the points defined
> pr> by the last match of the preceding DAG group (which the code gets
> pr> right) and the first match of the following DAG group (which the code
> pr> does not get right). But the commentary is not really that clear.
>
> ml> That's the intention of the original design. CHECK-NOT never occurs
> ml> before we find the start point (the start of file by default) and end
> ml> point (the end of file by default.) All other points are through other
> ml> CHECKs, including CHECK-DAG but excluding CHECK-NOT.  So that, if you
> ml> use CHECK-NOT, you need to be aware of how that range is defined. As
> ml> CHECK-DAG pattern matches a group of pattern in any order, the match
> ml> point of that group of CHECK-DAG (a consecutive CHECK-DAGs without any
> ml> other CHECKs interleaved) is always the point where one of that pgroup
> ml> is matched. If one CHECK-DAG is separated by any other CHECKs
> ml> (including CHECK-NOT) from preceding CHECK-DAGs, it is not in the
> ml> preceding group of CHECK-DAG. That's way how we could check the order
> ml> where a group of patterns should never occur before another group of
> ml> patterns.
>

Thanks for digging that up.

> So, I believe my specification for the interaction between DAG and NOT
> does match the original intent.

I can't argue there.

>   Regarding the diagnostic aid, it does
> make some sequences really hard to match,

Theoretically, I agree.  But do you know of a real use case where it's a
problem?

and I don't have a general

idea how to fix that (versus {{.*$}} for the similar NEXT situation).
>

Me neither.

It's also a reasonable continuation of the behavior of plain CHECK, in
> that a second CHECK doesn't search the prior text to complain about
> ordering issues.
>

Good point.

The main difference I see is that DAG is specifically about unordered text
(and it might vary from run to run in the parallel programs I'm thinking
of), so the chances of accidental reordering might be higher than with
plain CHECK.

>
> SAME and NEXT are, I think, a different category; that has to do with
> line-breaks that are not explicitly described by user-written patterns,
> and my own experience is that it's helpful to be told that something
> matches but isn't on the line I expected.
>

Agreed.

>
> So, I don't have a definitive answer for changing DAG-NOT-DAG, but
> intuitively the spec makes sense to me and my inclination is to think
> the diagnostic isn't hugely valuable.
>

You might be right. Again, I find it hard to think of solid arguments about
DAG-NOT-DAG because it seems like such an unlikely use case.

You mentioned Chris Lattner's point.  DAG-NOT-DAG was the first thing that
came to my mind.

DAG-NOT-DAG is a weird case where (1) you want two or more consecutive but
non-overlapping DAG groups, and (2) you want to exclude certain patterns in
between.  Strangely, with existing directives, you cannot accomplish #1
without #2, right?  Why do those go together?  It feels like a use case
that arose from an accident in a language specification and not from a real
need.

Well, maybe the best approach is just to go with a clear specification (as
you have now) and hope for the best.

> >> Putting CHECK-SAME and CHECK-NEXT after CHECK-DAG now has defined
> >> behavior, but it's unlikely to be useful.
> >
> > I believe they had predictable behavior before (their search ranges
> > started at the end of the match range for the entire CHECK-DAG), but
> > it's different with the above description (they define the end of the
> > search range for the preceding CHECK-DAG group).
>
> You're right, it was predictable before, and I am fixing the bug where
> the directive after DAG gets executed first so the range isn't affected.
>

Makes sense, so your specification keeps the old behavior.

> Taking Chris Lattner's point into consideration, we might want to say
> SAME or NEXT after a DAG should be an error.  But we could also leave
> that for a later round.
>

With your specification, I think the meaning of those cases is clear and
potentially useful.  The only potential problem I see is that people who
haven't studied your specification carefully might think SAME and NEXT
constrain the end of the search range of the DAG group.  It might be
worthwhile to emphasize in the docs that, no, really, DAG does not work
that way.

Actually, I wish there were a way to do that for the sake of matching
unordered text on a single line.  SAME after DAGs is as close as I can get
to that.  Maybe we need a CHECK-DAG-SAME.

Speaking of wish lists, I've been thinking it would be nice to have some
way to apply a NOT pattern among a range of matches:

CHECK-NOT-PUSH: pattern
...
CHECK-NOT-POP:

For example, with a pattern of {{.}} and DAGs in between PUSH and POP, I
can check for an unordered set of strings while rejecting any other text
among them. (Now that's a use case for DAG plus NOT that seems very clear
to me.)

Like normal NOT, PUSH's action would be deferred until the next directive
or group.  At that point, it would push the specified NOT pattern along
with the next non-NOT directive's match range end as its search range
start. POP would pop and apply those using the previous non-NOT directive's
match range start as its search range end.  The Rule would apply to its
matches.  PUSH and POP would be like normal NOT in terms of their effect on
neighboring directives: each would terminate any preceding DAG group, and,
because there's no match in a successful run, each would have no effect on
any neighboring directive's search range.  PUSH and POP with no directives
in between other than those in the NOT family would be an error.

Your formal specification of FileCheck makes it straight-forward to
describe this behavior precisely.

>
> --paulr
>
> P.S. I am away next week but expect to keep an eye on the lists.
>
>
Sure.  Have fun.  No rush.

Thanks.

Joel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180526/7c196463/attachment.html>