[PATCH] D47106: [FileCheck] Make CHECK-DAG non-overlapping

Wed May 23 12:46:47 PDT 2018

jdenny added inline comments.

================
Comment at: utils/FileCheck/FileCheck.cpp:1245
       // All subsequent CHECK-DAGs should be matched from the farthest
-      // position of all precedent CHECK-DAGs (including this one.)
+      // position of all precedent CHECK-DAGs (not including this one).
       StartPos = LastPos;
----------------
probinson wrote:
> jdenny wrote:
> > probinson wrote:
> > > jdenny wrote:
> > > > probinson wrote:
> > > > > jdenny wrote:
> > > > > > Speaking of the reordering check, we should discuss this comment change.
> > > > > > 
> > > > > > I made this change to reflect the implementation, but now I'm wondering if the comment described the desired behavior and the implementation was wrong.  I'm not really sure.  Moreover, this comment change does not reflect any change made by this patch, so perhaps it should be in a separate patch.
> > > > > > 
> > > > > > The implementation works as follows (with or without this patch as there's no overlapping here).  Using the X-Y model from earlier, the implementation seeks non-initial members of Y from the end of X rather than the start of Y.  For example, the following succeeds even though the second member of Y matches before the first member of Y:
> > > > > > 
> > > > > > 
> > > > > > ```
> > > > > > X:
> > > > > > abc
> > > > > > 
> > > > > > Y:
> > > > > > ghi
> > > > > > def
> > > > > > foo
> > > > > > jkl
> > > > > > 
> > > > > > X:
> > > > > > CHECK-DAG: {{^}}abc
> > > > > > 
> > > > > > CHECK-NOT: {{^}}foo
> > > > > > 
> > > > > > Y:
> > > > > > CHECK-DAG: {{^}}def
> > > > > > CHECK-DAG: {{^}}ghi
> > > > > > CHECK-DAG: {{^}}jkl
> > > > > > ```
> > > > > > 
> > > > > > To make things more confusing, if I move the occurrence of foo right before the occurrence of def, the CHECK-NOT fails.  That is, the CHECK-NOTs are sought before the first member of Y, but the other members of Y are sought both before and after Y.  It might be more intuitive if the first member of Y marks the beginning of Y (as the comment originally claimed) so that CHECK-NOTs are sought strictly between all of X and all of Y.
> > > > > My understanding of putting NOT between DAGs is this:
> > > > > ```
> > > > > X:
> > > > > abc
> > > > > Y:
> > > > > ghi
> > > > > def
> > > > > foo
> > > > > jkl
> > > > > 
> > > > > X:
> > > > > CHECK-DAG: b
> > > > > CHECK-DAG: a
> > > > > 
> > > > > CHECK-NOT: foo
> > > > > 
> > > > > Y:
> > > > > CHECK-DAG: def
> > > > > CHECK-DAG: ghi
> > > > > ```
> > > > > First the X group of DAGs runs, over the entire input text.  The farthest-forward match (here 'b') is remembered.
> > > > > Then the Y group of DAGs runs, from the farthest-forward X match (i.e., after 'b') to the end of the input text.  The earliest match (here 'ghi') is remembered.
> > > > > Finally the NOT runs, in the range from the end of the X matches (after 'b') to the beginning of the earliest Y match (before 'ghi').
> > > > > 
> > > > > The X matches and the Y matches can never overlap, even though they are both DAG groups they are separated by the NOT.  Therefore the search range of the Y group is constrained to start after the latest match of the X group.
> > > > > 
> > > > > Is that not what happens?  Even with overlapping DAG matches enabled?  Your explanation made it sound as if you expected the order of DAG directives in Y to make a difference to the NOT directive, when I think it should not.
> > > > > First the X group of DAGs runs, over the entire input text. The farthest-forward match (here 'b') is remembered.
> > > > 
> > > > Agreed.
> > > > 
> > > > > Then the Y group of DAGs runs, from the farthest-forward X match (i.e., after 'b') to the end of the input text.
> > > > > The earliest match (here 'ghi') is remembered.
> > > > 
> > > > Only the first DAG in Y runs at this point, so 'def' is remembered.
> > > > 
> > > > > Finally the NOT runs, in the range from the end of the X matches (after 'b') to the beginning of the earliest Y match (before 'ghi').
> > > > 
> > > > Before 'def'.
> > > > 
> > > > > The X matches and the Y matches can never overlap, even though they are both DAG groups they are separated by the NOT. Therefore the search range of the Y group is constrained to start after the latest match of the X group.
> > > > 
> > > > Agreed.
> > > > 
> > > > > Is that not what happens? Even with overlapping DAG matches enabled?
> > > > 
> > > > My patch and -allow-deprecated-dag-overlap shouldn't affect these examples as there is no overlapping.
> > > > 
> > > > > Your explanation made it sound as if you expected the order of DAG directives in Y to make a difference to the NOT directive, when I think it should not.
> > > > 
> > > > It does make a difference, and I agree it's not intuitive.
> > > > 
> > > > 
> > > It is so unintuitive that I would call it a bug.  The question is whether to fix that first, before doing the no-overlapping bit.  Do you think it would be straightforward, or a big deal?
> > I see two possible fixes:
> > 
> > 1. One approach is what the original comment here suggested: make the first DAG in Y mark the beginning of all Y matches.  Basically, for that first directive in Y, CHECK-DAG becomes CHECK.  This approach seems very quick to implement.  However, it constrains what DAGs in Y can match, so it might break existing tests.
> > 
> > 2. Another approach is to do what you described: match all DAGs in Y and then make the earliest match delimit the end of the preceding NOTs.  That will require more work to implement, I think.  However, it doesn't constrain what DAGs in Y can match.  Instead, it constrains what the preceding NOTs can match, and constraining NOTs helps tests pass not fail.  The question is whether they will be false passes that we'll never hear about.  My hunch is no.
> > 
> > While #2 seems harder to implement, it seems like it wouldn't causes tests to fail, and I think it produces more intuitive behavior.  What do you think?
> > 
> > >> Therefore the search range of the Y group is constrained to start after the latest match of the X group.
> > > 
> > > Agreed.
> > 
> > I got confused here.  The search range of the first DAG in Y starts where X's search range starts not where X's matches end.   Otherwise, there would be no possibility of a reordering complaint.
> > While #2 seems harder to implement, it seems like it wouldn't causes tests to fail, and I think it produces more intuitive behavior. What do you think?
> 
> I think a group of DAGs should not behave differently just because it is preceded by a NOT, and so I still prefer #2.
> 
> > I got confused here. The search range of the first DAG in Y starts where X's search range starts not where X's matches end. Otherwise, there would be no possibility of a reordering complaint.
> 
> That suggests that the first DAG in Y is being mis-handled as part of the X group.  I don't think that makes any sense at all.  Either a NOT cleanly separates the surrounding DAGs into two groups, or they are all part of one bigger group.  Taking the lexically first DAG from the second group and moving it to the first group is just wrong.
> 
> I think a group of DAGs should not behave differently just because it is preceded by a NOT, and so I still prefer #2.

Agreed.  I'll look into it.  If it doesn't break anything, we can put it in before this patch.

So that I can keep this straight: The above is about the search range end for the NOTs before Y.  The below is about the search range start for Y and thus reordering detection.

>> I got confused here. The search range of the first DAG in Y starts where X's search range starts not where X's matches end. Otherwise, there would be no possibility of a reordering complaint.
> 
> That suggests that the first DAG in Y is being mis-handled as part of the X group. I don't think that makes any sense at all. Either a NOT cleanly separates the surrounding DAGs into two groups, or they are all part of one bigger group. Taking the lexically first DAG from the second group and moving it to the first group is just wrong.

I agree the current reordering detection is broken.  I think the two options you are proposing are the following:

1. Search for every DAG in Y from the end of X's matches until EOF.  That would make reordering detection impossible, so we could just get rid of it.

2. Search for every DAG in Y from the start of X's matches until EOF.  That way, reordering is detected for any Y member not just the first.

Is that what you mean?  Which do you prefer?

https://reviews.llvm.org/D47106