<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Mon, May 7, 2018 at 12:56 PM,  <span dir="ltr"><<a href="mailto:paul.robinson@sony.com" target="_blank">paul.robinson@sony.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span class="gmail-m_-551390959675599399gmail-">> 1. In a CHECK-DAG group, don't let the matches for patterns overlap.<br>
> 2. Add a new CHECK-DAG-N directive, where N is some integer, to express<br>
> that a pattern must have N non-overlapping matches.<br>
<br>
</span>I think #1 is much more intuitive and easy to describe/document than #2.<br>
Changing the meaning of DAG in that way is highly unlikely to affect any <br>
existing test, IMO.  And if it does, my first expectation is that the test <br>
wasn't doing what the author intended anyway.<br></blockquote><div><br></div><div>I implemented #1, and I now have 105 new unexpected tests failures from llvm core (58), clang (45), openmp runtime (1), and compiler-rt (1) test suites.<br></div><div><br></div><div>I've only looked at a few failures so far.  Of those, the problem is the expected one: in a CHECK-DAG group, multiple patterns match the same line, so not permitting overlapping matches causes some patterns never to match successfully.  However, all the overlapping matches so far seem unintentional.  That is, this new implementation seems to be catching test errors.<br></div><div><br></div><div>Here are some scenarios I found:</div><div><span style="font-family:arial,helvetica,sans-serif"><br></span></div><div><span style="font-family:arial,helvetica,sans-serif"></span></div><div><span style="font-family:arial,helvetica,sans-serif">S1. Patterns sometimes identical<br></span></div><div><span style="font-family:arial,helvetica,sans-serif">------------------------------<wbr>--<br></span></div><div><br></div><div>For example, tools/clang/test/OpenMP/for_<wbr>codegen.cpp has:<br></div><div><br></div><div>```<br></div>// TERM_DEBUG-DAG: [[DBG_LOC_START]] = !DILocation(line: [[@LINE-15]],<br>// TERM_DEBUG-DAG: [[DBG_LOC_END]] = !DILocation(line: [[@LINE-16]],<br>// TERM_DEBUG-DAG: [[DBG_LOC_CANCEL]] = !DILocation(line: [[@LINE-17]],</div><div class="gmail_quote">```</div><div class="gmail_quote"><br></div><div class="gmail_quote">Patterns are the same except for the names of regex variables referenced.  The variables happen to have the same value, so the patterns match the same line when overlapping is permitted.</div><div class="gmail_quote"><br></div><div class="gmail_quote">Perhaps the test was designed to handle the possibility that the variables might have different values one day, and not permitting overlapping matches would break that case.  My guess is that this was actually unintentional, so my solution is to rename all the variables to DBG_LOC and remove two of the above patterns.<br></div></div><div class="gmail_extra"><div class="gmail_quote"><span style="font-family:arial,helvetica,sans-serif"><br></span></div><span style="font-family:arial,helvetica,sans-serif">S2. Superset before subset</span><br><span style="font-family:arial,helvetica,sans-serif"></span><div class="gmail_quote"><span style="font-family:arial,helvetica,sans-serif">--------------------------</span><span style="font-family:monospace,monospace"><br></span></div><div class="gmail_quote"><br></div><div class="gmail_quote"></div><div class="gmail_quote">For example, tools/clang/test/OpenMP/<wbr>target_parallel_codegen.cpp has this:</div><div class="gmail_quote"><br></div><div class="gmail_quote">```<br></div><div class="gmail_quote">// CHECK-DAG:   store i[[SZ]] {{4|8}}, i[[SZ]]* {{%[^,]+}}</div><div class="gmail_quote">...<br></div><div class="gmail_quote">// CHECK-DAG:   store i[[SZ]] 4, i[[SZ]]* {{%[^,]+}}</div><div class="gmail_quote">```<br></div><div class="gmail_quote"></div><div class="gmail_quote"><br></div><div class="gmail_quote">Some pattern X matches a superset of some pattern Y.  X appears before Y, but Y's intended match appears before X's intended match in the output.  The result is that X matches Y's intended match.  If overlapping is not permitted, Y has nothing to match, and FileCheck complains.  If overlapping is permitted, Y also matches Y's intended match, and X's intended match is quietly ignored.  This case is clearly unintentional.  One fix is to move X after Y.  A cleaner fix is to somehow make X distinct.<br></div><div class="gmail_quote"><br></div><div class="gmail_quote">Another example of this scenario:<br></div><div class="gmail_quote"><br></div><div class="gmail_quote">* test/Bitcode/thinlto-function-summary-originalnames.ll<br></div><div class="gmail_quote"></div><div class="gmail_quote"><div><span style="font-family:monospace,monospace"><br></span></div><div><div><span style="font-family:arial,helvetica,sans-serif">S3. Wrong number of identical patterns<br></span></div><div><span style="font-family:arial,helvetica,sans-serif">------------------------------<wbr>--------</span><span style="font-family:monospace,monospace"><br></span></div><div><span style="font-family:arial,helvetica,sans-serif"><br></span></div><div><span style="font-family:arial,helvetica,sans-serif">This scenario is just an incorrect attempt at the original use case I wanted to address.<br></span></div><div><span style="font-family:arial,helvetica,sans-serif"><br></span></div><div><span style="font-family:arial,helvetica,sans-serif">For example, projects/openmp/runtime/test/ompt/parallel/not_enough_threads.c repeats the following three times:<br></span></div><div><span style="font-family:arial,helvetica,sans-serif"><br></span></div><div><span style="font-family:arial,helvetica,sans-serif">```</span></div><div><span style="font-family:arial,helvetica,sans-serif">// CHECK-DAG: {{^}}[[THREAD_ID:[0-9]+]]: ompt_event_implicit_task_begin: parallel_id=[[PARALLEL_ID]], task_id=[[IMPLICIT_TASK_ID:[0-9]+]]<br>// CHECK-DAG: {{^}}[[THREAD_ID]]: ompt_event_barrier_begin: parallel_id=[[PARALLEL_ID]], task_id=[[IMPLICIT_TASK_ID]]<br>```<br></span></div><div><span style="font-family:arial,helvetica,sans-serif"><br></span></div><div><span style="font-family:arial,helvetica,sans-serif">The author was obviously expecting CHECK-DAG not to permit overlapping matches and wanted to check that the above lines appeared three times.  Specifically, he's expecting 3 threads after the master thread, but the run line sets OMP_THREAD_LIMIT=2.  If overlapping matches are permitted, this will permit any non-zero number of threads after the master thread.  If overlapping matches are not permitted, we must increase OMP_THREAD_LIMIT to at least 4 or reduce the number of pattern repetitions.<br></span></div><br><div><span style="font-family:arial,helvetica,sans-serif">S4. CHECK-DAG unneeded</span></div><div><span style="font-family:arial,helvetica,sans-serif">----------------------<br></span></div><div><span style="font-family:arial,helvetica,sans-serif"><br></span></div><div><span style="font-family:arial,helvetica,sans-serif"><span style="font-family:arial,helvetica,sans-serif">For example, test/CodeGen/AMDGPU/store-<wbr>v3i64.ll has:</span><br></span></div><div><span style="font-family:arial,helvetica,sans-serif"><br></span></div><div><span style="font-family:arial,helvetica,sans-serif">```<br></span></div><div><span style="font-family:arial,helvetica,sans-serif">; GCN-DAG: buffer_store_byte v<br>; GCN-DAG: buffer_store_byte v<br>; GCN-DAG: buffer_store_byte v<br>```</span></div><div><span style="font-family:arial,helvetica,sans-serif"><br></span></div><div><span style="font-family:arial,helvetica,sans-serif"></span></div><div><span style="font-family:arial,helvetica,sans-serif">The intention appears to be the same as S3: match a pattern N times.  However, this example doesn't actually need CHECK-DAG at all.  First, all patterns in the group are the same, so they are not unordered, so CHECK would have been sufficient.  Second, there's only one occurrence in the actual output (which I'm assuming is correct), so there should be only one directive, so it really doesn't matter which of these directives we use.<br></span></div><div><span style="font-family:arial,helvetica,sans-serif"><br></span></div><div><span style="font-family:arial,helvetica,sans-serif">S5. Incorrect pattern with same match<br></span></div><div><span style="font-family:arial,helvetica,sans-serif">-------------------------------------</span><span style="font-family:monospace,monospace"><br></span></div><div><span style="font-family:arial,helvetica,sans-serif"><div><span style="font-family:arial,helvetica,sans-serif"><br></span></div><div><span style="font-family:arial,helvetica,sans-serif">For example, projects/compiler-rt/test/profile/instrprof-visibility.cpp has:</span></div><div><span style="font-family:arial,helvetica,sans-serif"><br></span></div><div><span style="font-family:arial,helvetica,sans-serif">```</span></div><div><span style="font-family:arial,helvetica,sans-serif">// COV-DAG: {{.*}}|{{ *}}1|#ifndef NO_WEAK<br>...</span></div><div><span style="font-family:arial,helvetica,sans-serif">// COV-DAG: {{.*}}|{{ *}}1|#ifndef NO_WEAK<br>```</span></div><div><span style="font-family:arial,helvetica,sans-serif"><br></span></div><div><span style="font-family:arial,helvetica,sans-serif">The "1" does not appear in first occurrence in the actual output.  Let's assume the actual output is correct and the first pattern is incorrect.  If overlapping is permitted, the first pattern matches the intended line for the second pattern, which quietly matches that line too.  If overlapping is not permitted, FileCheck complains that the second pattern has nothing left to match.<br></span></div><div><span style="font-family:arial,helvetica,sans-serif"><br></span></div><div><span style="font-family:arial,helvetica,sans-serif">An example where the second pattern is incorrect:<br></span></div><div><span style="font-family:arial,helvetica,sans-serif"><br></span></div><div><span style="font-family:arial,helvetica,sans-serif">* test/Bitcode/thinlto-summary-local-5.0.ll<br></span></div><div><span style="font-family:arial,helvetica,sans-serif"><br></span></div><div><span style="font-family:arial,helvetica,sans-serif">An example where the incorrect pattern is identical but it's not as obvious because a regex variable by a different name (<span style="font-family:arial,helvetica,sans-serif"><span style="font-family:arial,helvetica,sans-serif"><span style="font-family:arial,helvetica,sans-serif"><span style="font-family:arial,helvetica,sans-serif">ATOMIC_REDUCE_BARRIER_LOC</span></span></span></span> instead of <span style="font-family:arial,helvetica,sans-serif"><span style="font-family:arial,helvetica,sans-serif"><span style="font-family:arial,helvetica,sans-serif"><span style="font-family:arial,helvetica,sans-serif">REDUCTION_LOC</span></span></span></span>) is captured within it:<br></span></div><div><span style="font-family:arial,helvetica,sans-serif"><div class="gmail_extra"><span style="font-family:arial,helvetica,sans-serif"><span style="font-family:arial,helvetica,sans-serif"></span></span><br></div><div class="gmail_extra"><span style="font-family:arial,helvetica,sans-serif"><span style="font-family:arial,helvetica,sans-serif"></span></span><div class="gmail_quote"><div><span style="font-family:arial,helvetica,sans-serif"><div><span style="font-family:arial,helvetica,sans-serif">* tools/clang/test/OpenMP/for_<wbr>reduction_codegen_UDR.cpp has:</span></div></span><span style="font-family:arial,helvetica,sans-serif"><div><span style="font-family:arial,helvetica,sans-serif"></span></div></span></div><span style="font-family:arial,helvetica,sans-serif"></span><span style="font-family:arial,helvetica,sans-serif"></span></div></div></span></div></span><span style="font-family:arial,helvetica,sans-serif"><div><span style="font-family:arial,helvetica,sans-serif"><br></span></div></span></div></div><div><span style="font-family:arial black,sans-serif">Path Forward</span></div><div><span style="font-family:arial black,sans-serif">------------------</span><br></div><div></div><div><br></div><div>These examples seem like strong evidence that permitting overlapping matches is a significant source of test errors.  Moreover, we have my original goal that some test authors were apparently assuming was already supported: to be able to handle sets of strings that are both unordered and non-unique.  Nevertheless, it's possible that there are use cases, such as the one suggested by S1, that we lose.<br></div><div><br></div><div>Here are some potential paths forward:</div><div><br></div><div>P1. Give up on approach #1 (non-overlapping CHECK-DAG) and go back to #2 (CHECK-DAG-N).  I don't prefer this, but it would be less work.</div><div><br></div><div>P2. Make CHECK-DAG non-overlapping but fix all test cases before committing that change.  Patch reviews will surely be slow.<br></div><div><br></div><div></div><div></div><div>P3. Make non-overlapping CHECK-DAG a special mode that's off by default until all test suites have been fixed.<br></div><div><br></div><div>By the way, I think it would be generally useful for FileCheck to have a debugging mode that prints every match.  It would be particularly useful while migrating to a non-overlapping CHECK-DAG.</div><div><br></div><div>Thoughts?<br></div><div></div><div></div><div><br></div><div>Joel<br></div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
--paulr<br>
<br>
</blockquote></div><br></div></div>