[flang-dev] RFC: Port test_modfile.sh tests in Flang to use FileCheck

Thu Jul 9 08:01:13 PDT 2020

Hi all

Thanks Tim and Hal for your replies.

In my original post, I didn’t do a great job of explaining why I think we need to change away from test_modfile.sh and why I think my ported test behaviour is better.

The reason I think we need to change away from test_modfile.sh eventually so we can enable Windows support in Flang. I also think we should consider the principle of least surprise with these tests. I think these tests should be written to optimise for the common use case and I think that use case is someone examining a test to determine the root cause of a test failure. I think this use case is going to be more common over time than a developer making a broad change across all tests of a similar type or a developer adding a new test of this type. I think that my common user will appreciate explicitness over brevity in both the test specification and test output and would prefer that the test use familiar tools where possible over something custom and specific to particular tests.

In the ported tests that I have shared, the flang execution line is clearly written in the test file and clearly reported on the case of error (or verbose output) by lit. This makes reproducing a test failure with a compiler invocation outside of any infrastructure (lit, test_modfile or FileCheck) a very simple copy/paste job. This will help my typical user. 

The test_modfile.sh failure output is certainly shorter and more succinct than the ported tests output and I agree that it is clear and does a good job. The test_modfile.sh output does assume tacit knowledge of what test_modfile.sh does to understand it completely. Taking Tim's first example, which way around is the diff here? Did flang generate i but the test expect ii or the other way around? Where is m.mod to be found? How was it generated? With the ported tests, all of this is explicit in the output i.e. "lit ran flang like this generating this file here. FileCheck looked at it and expected to see this inside it, but actually got this inside it." I think that my casual user would find the FileCheck output an improvement over test_modfile.sh's already good output. I think that the fact it is standard FileCheck output will also help my typical user. (Note, this is for the first of Tim's examples only, I totally agree that his second example is worse and I'll come back to that later).

In terms of test specification, I agree that test_modfile.sh makes that shorter and cleaner, but I don't agree this is a good thing. My typical user would rather read exactly what is being run in the test file itself rather than have to understand test_modfile.sh in tandem with the test file and reconstitute the run line themselves.

Tim's point on test_modfile.sh making the tests maintainabilie only holds if the new test that one wants to add works the same way as all the existing tests or if you want to change something common to all tests, as per his example. If one wanted to add a new test that needed to drive flang differently or run different tests over the resulting module files then test_modfile.sh would need to change to accommodate it, increasing complexity and decreasing transparency of the script. This would make the job of figuring out what ran and how to reproduce the a failure harder for my typical user.

I make the same point as before on familiarity. I think multiple RUN lines and CHECK lines with different prefixes would not cause a problem for a typical reader investigating a regression.

For these reasons, I think the ported tests are better than what went before them with one exception…

I totally agree that the NO_MODFILES command is not good and certainly would represent a regression in experience for my typical user (and everyone else!). We can aid my typical user by adding comments in the test itself and perhaps by picking a better name for the CHECK prefix for this check, but clearly the custom diagnostic message that test_modfiles.sh gives is always going to be clearer in meaning. The NO_MODFILES run line perhaps poses a portability question to Windows too given that it uses wc and ls directly. I think this would mean we should add a REQUIRES: shell to the tests which would restrict the testing to Unix systems only, which rather defeats the point of the change.

If we were to agree that for the rest of the tests, the FileCheck port is better, then we should remove the NO_MODFILES check from every module file test and find some other way to test flang against this requirement. This could perhaps be a unit test, a custom testing utility or perhaps with some platform-specific lit tests that use ls | wc on Unix and whatever the Windows equivalent of that is on Windows.

Clearly a few different opinions here - more comments and opinions would be very helpful.

As a next step, I will have a look at what a python port of test_modfiles.sh might look like and report back so we can consider that option too.

Ta
Rich

PS. @Tim: Good catch on the temp file deletion - I mistakenly believed that lit guaranteed a unique or clean temp dir with %t. The correct fix for that would be to clean the temp dir at the start of each test execution.

> -----Original Message-----
> From: Hal Finkel <hfinkel at anl.gov>
> Sent: 8 July, 2020 21:35
> To: Timothy Keith <tkeith at nvidia.com>; Richard Barton
> <Richard.Barton at arm.com>; flang-dev at lists.llvm.org
> Subject: Re: [flang-dev] RFC: Port test_modfile.sh tests in Flang to use
> FileCheck
> 
> On 7/7/20 1:16 PM, Timothy Keith via flang-dev wrote:
> > My opinion is that the new form is less readable, is less maintainable, and
> produces less comprehensible error messages on failures.
> >
> > Readability: The essentials for this kind of test is the Fortran source and the
> expected name and contents of the .mod file that is produced. The rest is
> boilerplate. In the original form that consists of one RUN command with the
> expected results in comments. In the new form it is three RUN commands
> plus one per module in the test plus the NO_MODFILES line; expected
> results are prefixed with a variety of strings.
> >
> > Maintainability: To add a module to a test, for example, requires adding the
> source and expected results, adding another RUN line, and changing the
> NO_MODFILES line. If something has to change in how the tests are run (e.g.
> we probably need to delete the temp directory at the start of the test), that
> has to be changed in all 36 tests rather than once in test_modfile.sh.
> >
> > Quality of messages on failure: I'll add some examples below.
> >
> > This is a heavy price to pay for "making more clear exactly what the test is
> doing without needing to also understand how test_modfile.sh works".
> Moreover, I don't agree it is more clear. The original form encapsulates the
> boilerplate into test_modfile.sh which in my opinion makes the tests clearer
> and simpler.
> >
> > Tim
> 
> 
> I agree with Tim, the current errors are clearer and there's less
> boilerplate. A few thoughts:
> 
>   0. I don't see a need to use FileCheck for all tests; clang has a
> -verify mode, etc. and if we have a set of tests with specialized needs
> I don't see any reason not to have a specialized checker for them. That
> said...
> 
>   1. Does the current script run on Windows? If we keep it, we probably
> need to rewrite it in Python.
> 
>   2. The current script doesn't seem to have any comments on top
> documenting how it works. We should have documentation.
> 
>   3. My preference would be for the `!Expect: m.mod` lines, etc. to have
> directives in all caps like FileCheck - that makes them stand out more
> and, in my opinion, easier to read.
> 
>   4. I don't find FileCheck's output in the first case below all that
> bad -- it's not as succinct as diff's output, but it's okay. We might
> have a wrapper that calls FileCheck instead of calling diff.
> 
> 
> As time allows, we can also discuss this on Monday's call.
> 
>   -Hal
> 
> 
> >
> >
> > Examples of test failure messages:
> >
> > 1. .mod file contents are different from what was expected
> >
> > old form:
> > 	Module file m.mod differs from expected:
> > 	@@ -1,11 +1,11 @@
> > 	 module m
> > 	 integer(4)::i
> > 	 integer(4),private::j
> > 	 type::t
> > 	-integer(4)::ii
> > 	+integer(4)::i
> > 	 integer(4),private::j
> > 	 end type
> > 	 type,private::u
> > 	 end type
> > 	 type(t)::x
> > 	 end
> >
> > new form:
> > 	test/Semantics/modfile01.f90:33:16: error: MOD-M1-NEXT: expected
> string not found in input
> > 	! MOD-M1-NEXT: integer(4)::ii
> > 	               ^
> > 	build/Release-clang-9.0.0-
> ccache/test/Semantics/Output/modfile01.f90.tmp.dir/m1.mod:6:1: note:
> scanning from here
> > 	integer(4)::i
> > 	^
> >
> > 	Input file: build/Release-clang-9.0.0-
> ccache/test/Semantics/Output/modfile01.f90.tmp.dir/m1.mod
> > 	Check file: test/Semantics/modfile01.f90
> >
> > 	-dump-input=help describes the format of the following dump.
> >
> > 	Full input was:
> > 	<<<<<<
> > 	         1: !mod$ v1 sum:cfaed627f0a11097
> > 	         2: module m1
> > 	         3: integer(4)::i
> > 	         4: integer(4),private::j
> > 	         5: type::t
> > 	         6: integer(4)::i
> > 	next:33     X~~~~~~~~~~~~ error: no match found
> > 	         7: integer(4),private::j
> > 	next:33     ~~~~~~~~~~~~~~~~~~~~~
> > 	         8: end type
> > 	next:33     ~~~~~~~~
> > 	         9: type,private::u
> > 	next:33     ~~~~~~~~~~~~~~~
> > 	        10: end type
> > 	next:33     ~~~~~~~~
> > 	        11: type(t)::x
> > 	next:33     ~~~~~~~~~~
> > 	        12: end
> > 	next:33     ~~~
> > 	>>>>>>
> >
> > 2. Extra or missing .mod file
> >
> > old form:
> > 	Unexpected .mod files produced: m5.mod
> > or:
> > 	Compilation did not produce expected mod file: m5.mod
> >
> > new form:
> > 	test/Semantics/modfile01.f90:6:16: error: NO_MODFILES: expected
> string not found in input
> > 	! NO_MODFILES: 4
> > 	               ^
> > 	<stdin>:1:2: note: scanning from here
> > 	 5
> > 	 ^
> >
> > 	Input file: <stdin>
> > 	Check file:
> /Users/tsk/llvm/llvm1/flang/test/Semantics/modfile01.f90
> >
> > 	-dump-input=help describes the format of the following dump.
> >
> > 	Full input was:
> > 	<<<<<<
> > 	         1:  5
> > 	check:6      X error: no match found
> > 	>>>>>>
> >
> > On 7/7/20, 9:27 AM, "flang-dev on behalf of Richard Barton via flang-dev"
> <flang-dev-bounces at lists.llvm.org on behalf of flang-dev at lists.llvm.org>
> wrote:
> >
> >      External email: Use caution opening links or attachments
> >
> >
> >      Hi flang-dev
> >
> >      I'm looking at porting the Flang regression tests over to using standard
> LLVM utilities like FileCheck.
> >
> >      The modfile tests are a bunch of tests in Semantics that define modules
> and have check lines saying which module files are expected to be created
> and what they are expected to contain. The test_modfile.sh script runs Flang
> on the sources to generate said module files then checks that all expected
> module files have the correct contents and that no additional, unexpected
> module files are created.
> >
> >      The ported tests will exec Flang in the same way to create modfiles. We
> then use FileCheck on each module file in turn by name to ensure the correct
> name is generated and check the contents in the same way using CHECK-
> NEXT to ensure no additional information is added. The ported tests use the -
> module option to put the module files in a temporary location to make sure
> the tests cannot spuriously pass due to a dirty build directory. To check that
> no extra modfiles are generated, the tests use ls on the temporary location
> and ensure the right number of files are present.
> >
> >      Here is an example port of modfile01: https://reviews.llvm.org/D83320
> >
> >      The ported tests are more verbose than the previous approach using
> test_modfile.sh, at the expense of making more clear exactly what the test is
> doing without needing to also understand how test_modfile.sh works. I think
> the ported tests are preferable to the original on this principal of least
> surprise.
> >
> >      If folks are happy with this porting approach then I will do the rest of the
> tests and re-submit as a single patch.
> >
> >      Note: at present, lit has a bug when printing out UTF characters to a non-
> UTF shell if run with python 2. Flang's modfiles contain a short UTF BOM at
> the start of the file so the lit issue would be hit were these tests to fail in such
> an environment with -v or -a enabled. I have submitted a fix for this issue
> here: https://reviews.llvm.org/D82754 and I will not push the ports until that
> issue is fixed.
> >
> >      Thanks
> >      Rich
> >
> >      Richard Barton
> >      Principal Engineer - Arm Compiler for Linux
> >      Arm Manchester
> >
> >      _______________________________________________
> >      flang-dev mailing list
> >      flang-dev at lists.llvm.org
> >      https://lists.llvm.org/cgi-bin/mailman/listinfo/flang-dev
> >
> > _______________________________________________
> > flang-dev mailing list
> > flang-dev at lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/flang-dev
> 
> --
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory