[llvm] 1d297f9 - [lit] Sort test start times based on prior test timing data

Tue Mar 16 02:23:16 PDT 2021

Author: David Zarzycki
Date: 2021-03-16T05:23:04-04:00
New Revision: 1d297f90649dd63187590548e20de0eced61750c

URL: https://github.com/llvm/llvm-project/commit/1d297f90649dd63187590548e20de0eced61750c
DIFF: https://github.com/llvm/llvm-project/commit/1d297f90649dd63187590548e20de0eced61750c.diff

LOG: [lit] Sort test start times based on prior test timing data

Lit as it exists today has three hacks that allow users to run tests earlier:

1) An entire test suite can set the `is_early` boolean.
2) A very recently introduced "early_tests" feature.
3) The `--incremental` flag forces failing tests to run first.

All of these approaches have problems.

1) The `is_early` feature was until very recently undocumented. Nevertheless it still lacks testing and is a imprecise way of optimizing test starting times.
2) The `early_tests` feature requires manual updates and doesn't scale.
3) `--incremental` is undocumented, untested, and it requires modifying the *source* file system by "touching" the file. This "touch" based approach is arguably a hack because it confuses editors (because it looks like the test was modified behind the back of the editor) and "touching" the test source file doesn't work if the test suite is read only from the perspective of `lit` (via advanced filesystem/build tricks).

This patch attempts to simplify and address all of the above problems.

This patch formalizes, documents, tests, and defaults lit to recording the execution time of tests and then reordering all tests during the next execution. By reordering the tests, high core count machines run faster, sometimes significantly so.

This patch also always runs failing tests first, which is a positive user experience win for those that didn't know about the hidden `--incremental` flag.

Finally, if users want, they can _optionally_ commit the test timing data (or a subset thereof) back to the repository to accelerate bots and first-time runs of the test suite.

Reviewed By: jhenderson, yln

Differential Revision: https://reviews.llvm.org/D98179

Added: 
    llvm/utils/lit/tests/Inputs/reorder/.lit_test_times.txt
    llvm/utils/lit/tests/Inputs/reorder/aaa.txt
    llvm/utils/lit/tests/Inputs/reorder/bbb.txt
    llvm/utils/lit/tests/Inputs/reorder/lit.cfg
    llvm/utils/lit/tests/Inputs/reorder/subdir/ccc.txt
    llvm/utils/lit/tests/reorder.py

Modified: 
    llvm/docs/CommandGuide/lit.rst
    llvm/test/Unit/lit.cfg.py
    llvm/utils/lit/lit/Test.py
    llvm/utils/lit/lit/TestingConfig.py
    llvm/utils/lit/lit/cl_arguments.py
    llvm/utils/lit/lit/discovery.py
    llvm/utils/lit/lit/main.py
    llvm/utils/lit/tests/ignore-fail.py
    llvm/utils/lit/tests/shtest-shell.py
    mlir/test/Unit/lit.cfg.py

Removed: 
    llvm/utils/lit/tests/Inputs/early-tests/aaa.txt
    llvm/utils/lit/tests/Inputs/early-tests/bbb.txt
    llvm/utils/lit/tests/Inputs/early-tests/lit.cfg
    llvm/utils/lit/tests/Inputs/early-tests/subdir/ccc.txt
    llvm/utils/lit/tests/early-tests.py


################################################################################
diff  --git a/llvm/docs/CommandGuide/lit.rst b/llvm/docs/CommandGuide/lit.rst
index 7e61a276765b..413b64e95007 100644

--- a/llvm/docs/CommandGuide/lit.rst
+++ b/llvm/docs/CommandGuide/lit.rst
@@ -20,7 +20,7 @@ user interface as possible.
 command line.  Tests can be either individual test files or directories to
 search for tests (see :ref:`test-discovery`).
 
-Each specified test will be executed (potentially in parallel) and once all
+Each specified test will be executed (potentially concurrently) and once all
 tests have been run :program:`lit` will print summary information on the number
 of tests which passed or failed (see :ref:`test-status-results`).  The
 :program:`lit` program will execute with a non-zero exit code if any tests
@@ -151,8 +151,7 @@ EXECUTION OPTIONS
 
  Track the wall time individual tests take to execute and includes the results
  in the summary output.  This is useful for determining which tests in a test
- suite take the most time to execute.  Note that this option is most useful
- with ``-j 1``.
+ suite take the most time to execute.
 
 .. option:: --ignore-fail
 
@@ -168,6 +167,17 @@ EXECUTION OPTIONS
 SELECTION OPTIONS
 -----------------
 
+By default, `lit` will run failing tests first, then run tests in descending
+execution time order to optimize concurrency.
+
+The timing data is stored in the `test_exec_root` in a file named
+`.lit_test_times.txt`. If this file does not exist, then `lit` checks the
+`test_source_root` for the file to optionally accelerate clean builds.
+
+.. option:: --shuffle
+
+ Run the tests in a random order, not failing/slowest first.
+
 .. option:: --max-failures N
 
  Stop execution after the given number ``N`` of failures.
@@ -201,10 +211,6 @@ SELECTION OPTIONS
  must be in the range ``1..M``. The environment variable
  ``LIT_RUN_SHARD`` can also be used in place of this option.
 
-.. option:: --shuffle
-
- Run the tests in a random order.
-
 .. option:: --timeout=N
 
  Spend at most ``N`` seconds (approximately) running each individual test.
@@ -416,13 +422,6 @@ executed, two important global variables are predefined:
  **root** The root configuration.  This is the top-most :program:`lit` configuration in
  the project.
 
- **is_early** Whether the test suite as a whole should be given a head start
- before other test suites run.
-
- **early_tests** An explicit set of '/' separated test paths that should be
- given a head start before other tests run. For example, the top five or so
- slowest tests. See also: `--time-tests`
-
  **pipefail** Normally a test using a shell pipe fails if any of the commands
  on the pipe fail. If this is not desired, setting this variable to false
  makes the test fail only if the last command in the pipe fails.

diff  --git a/llvm/test/Unit/lit.cfg.py b/llvm/test/Unit/lit.cfg.py
index 3198ab2c9539..3a5c40dc14da 100644
--- a/llvm/test/Unit/lit.cfg.py
+++ b/llvm/test/Unit/lit.cfg.py
@@ -13,9 +13,6 @@
 # suffixes: A list of file extensions to treat as test files.
 config.suffixes = []
 
-# is_early; Request to run this suite early.
-config.is_early = True
-
 # test_source_root: The root path where tests are located.
 # test_exec_root: The root path where tests should be run.
 config.test_exec_root = os.path.join(config.llvm_obj_root, 'unittests')

diff  --git a/llvm/utils/lit/lit/Test.py b/llvm/utils/lit/lit/Test.py
index ce87cfa8abb5..ad42ef183ede 100644
--- a/llvm/utils/lit/lit/Test.py
+++ b/llvm/utils/lit/lit/Test.py
@@ -207,6 +207,16 @@ def __init__(self, name, source_root, exec_root, config):
         # The test suite configuration.
         self.config = config
 
+        self.test_times = {}
+        test_times_file = os.path.join(exec_root, '.lit_test_times.txt')
+        if not os.path.exists(test_times_file):
+            test_times_file = os.path.join(source_root, '.lit_test_times.txt')
+        if os.path.exists(test_times_file):
+            with open(test_times_file, 'r') as time_file:
+                for line in time_file:
+                    time, path = line.split(maxsplit=1)
+                    self.test_times[path.strip('\n')] = float(time)
+
     def getSourcePath(self, components):
         return os.path.join(self.source_root, *components)
 
@@ -246,6 +256,18 @@ def __init__(self, suite, path_in_suite, config, file_path = None):
         # The test result, once complete.
         self.result = None
 
+        # The previous test failure state, if applicable.
+        self.previous_failure = False
+
+        # The previous test elapsed time, if applicable.
+        self.previous_elapsed = 0.0
+
+        if os.sep.join(path_in_suite) in suite.test_times:
+            time = suite.test_times[os.sep.join(path_in_suite)]
+            self.previous_elapsed = abs(time)
+            self.previous_failure = time < 0
+
+
     def setResult(self, result):
         assert self.result is None, "result already set"
         assert isinstance(result, Result), "unexpected result type"
@@ -395,15 +417,3 @@ def getUsedFeatures(self):
         )
         identifiers = set(filter(BooleanExpression.isIdentifier, tokens))
         return identifiers
-
-    def isEarlyTest(self):
-        """
-        isEarlyTest() -> bool
-
-        Check whether this test should be executed early in a particular run.
-        This can be used for test suites with long running tests to maximize
-        parallelism or where it is desirable to surface their failures early.
-        """
-        if '/'.join(self.path_in_suite) in self.suite.config.early_tests:
-            return True
-        return self.suite.config.is_early

diff  --git a/llvm/utils/lit/lit/TestingConfig.py b/llvm/utils/lit/lit/TestingConfig.py
index fafc754c1bc1..612db574677e 100644
--- a/llvm/utils/lit/lit/TestingConfig.py
+++ b/llvm/utils/lit/lit/TestingConfig.py
@@ -125,10 +125,6 @@ def __init__(self, parent, name, suffixes, test_format,
         # require one of the features in this list if this list is non-empty.
         # Configurations can set this list to restrict the set of tests to run.
         self.limit_to_features = set(limit_to_features)
-        # Whether the suite should be tested early in a given run.
-        self.is_early = bool(is_early)
-        # List of tests to run early.
-        self.early_tests = {}
         self.parallelism_group = parallelism_group
         self._recursiveExpansionLimit = None
 

diff  --git a/llvm/utils/lit/lit/cl_arguments.py b/llvm/utils/lit/lit/cl_arguments.py
index 4d829659ea18..3eb1870bf16d 100644
--- a/llvm/utils/lit/lit/cl_arguments.py
+++ b/llvm/utils/lit/lit/cl_arguments.py
@@ -9,8 +9,7 @@
 
 
 class TestOrder(enum.Enum):
-    EARLY_TESTS_THEN_BY_NAME = enum.auto()
-    FAILING_FIRST = enum.auto()
+    DEFAULT = enum.auto()
     RANDOM = enum.auto()
 
 
@@ -155,7 +154,7 @@ def parse_args():
             help="Run tests in random order",
             action="store_true")
     selection_group.add_argument("-i", "--incremental",
-            help="Run modified and failing tests first (updates mtimes)",
+            help="Run failed tests first (DEPRECATED: now always enabled)",
             action="store_true")
     selection_group.add_argument("--filter",
             metavar="REGEX",
@@ -208,12 +207,13 @@ def parse_args():
     if opts.echoAllCommands:
         opts.showOutput = True
 
+    if opts.incremental:
+        print('WARNING: --incremental is deprecated. Failing tests now always run first.')
+
     if opts.shuffle:
         opts.order = TestOrder.RANDOM
-    elif opts.incremental:
-        opts.order = TestOrder.FAILING_FIRST
     else:
-        opts.order = TestOrder.EARLY_TESTS_THEN_BY_NAME
+        opts.order = TestOrder.DEFAULT
 
     if opts.numShards or opts.runShard:
         if not opts.numShards or not opts.runShard:

diff  --git a/llvm/utils/lit/lit/discovery.py b/llvm/utils/lit/lit/discovery.py
index a185ae676d14..43481d8bd3b3 100644
--- a/llvm/utils/lit/lit/discovery.py
+++ b/llvm/utils/lit/lit/discovery.py
@@ -281,6 +281,11 @@ def find_tests_for_inputs(lit_config, inputs, indirectlyRunCheck):
         if prev == len(tests):
             lit_config.warning('input %r contained no tests' % input)
 
+    # This data is no longer needed but keeping it around causes awful
+    # performance problems while the test suites run.
+    for k, suite in test_suite_cache.items():
+      suite[0].test_times = None
+
     # If there were any errors during test discovery, exit now.
     if lit_config.numErrors:
         sys.stderr.write('%d errors, exiting.\n' % lit_config.numErrors)

diff  --git a/llvm/utils/lit/lit/main.py b/llvm/utils/lit/lit/main.py
index 3f265446be2e..cfc12661a785 100755
--- a/llvm/utils/lit/lit/main.py
+++ b/llvm/utils/lit/lit/main.py
@@ -105,6 +105,8 @@ def main(builtin_params={}):
     run_tests(selected_tests, lit_config, opts, len(discovered_tests))
     elapsed = time.time() - start
 
+    record_test_times(selected_tests, lit_config)
+
     if opts.time_tests:
         print_histogram(discovered_tests)
 
@@ -163,20 +165,12 @@ def print_discovered(tests, show_suites, show_tests):
 
 def determine_order(tests, order):
     from lit.cl_arguments import TestOrder
-    if order == TestOrder.EARLY_TESTS_THEN_BY_NAME:
-        tests.sort(key=lambda t: (not t.isEarlyTest(), t.getFullName()))
-    elif order == TestOrder.FAILING_FIRST:
-        def by_mtime(test):
-            return os.path.getmtime(test.getFilePath())
-        tests.sort(key=by_mtime, reverse=True)
-    elif order == TestOrder.RANDOM:
+    if order == TestOrder.RANDOM:
         import random
         random.shuffle(tests)
-
-
-def touch_file(test):
-    if test.isFailure():
-        os.utime(test.getFilePath(), None)
+    else:
+        assert order == TestOrder.DEFAULT, 'Unknown TestOrder value'
+        tests.sort(key=lambda t: (not t.previous_failure, -t.previous_elapsed, t.getFullName()))
 
 
 def filter_by_shard(tests, run, shards, lit_config):
@@ -213,12 +207,7 @@ def run_tests(tests, lit_config, opts, discovered_tests):
     display = lit.display.create_display(opts, len(tests), discovered_tests,
                                          workers)
 
-    def progress_callback(test):
-        display.update(test)
-        if opts.order == lit.cl_arguments.TestOrder.FAILING_FIRST:
-            touch_file(test)
-
-    run = lit.run.Run(tests, lit_config, workers, progress_callback,
+    run = lit.run.Run(tests, lit_config, workers, display.update,
                       opts.max_failures, opts.timeout)
 
     display.print_header()
@@ -267,6 +256,27 @@ def execute_in_tmp_dir(run, lit_config):
                 lit_config.warning("Failed to delete temp directory '%s', try upgrading your version of Python to fix this" % tmp_dir)
 
 
+def record_test_times(tests, lit_config):
+    times_by_suite = {}
+    for t in tests:
+        if not t.result.elapsed:
+            continue
+        if not t.suite.exec_root in times_by_suite:
+            times_by_suite[t.suite.exec_root] = []
+        time = -t.result.elapsed if t.isFailure() else t.result.elapsed
+        times_by_suite[t.suite.exec_root].append((os.sep.join(t.path_in_suite), t.result.elapsed))
+
+    for s, value in times_by_suite.items():
+        try:
+            path = os.path.join(s, '.lit_test_times.txt')
+            with open(path, 'w') as time_file:
+                for name, time in value:
+                    time_file.write(("%e" % time) + ' ' + name + '\n')
+        except:
+            lit_config.warning('Could not save test time: ' + path)
+            continue
+
+
 def print_histogram(tests):
     test_times = [(t.getFullName(), t.result.elapsed)
                   for t in tests if t.result.elapsed]

diff  --git a/llvm/utils/lit/tests/Inputs/reorder/.lit_test_times.txt b/llvm/utils/lit/tests/Inputs/reorder/.lit_test_times.txt
new file mode 100644
index 000000000000..00aecc968ed3
--- /dev/null
+++ b/llvm/utils/lit/tests/Inputs/reorder/.lit_test_times.txt
@@ -0,0 +1,3 @@
+3.0 subdir/ccc.txt
+2.0 bbb.txt
+0.1 aaa.txt

diff  --git a/llvm/utils/lit/tests/Inputs/early-tests/aaa.txt b/llvm/utils/lit/tests/Inputs/reorder/aaa.txt
similarity index 100%
rename from llvm/utils/lit/tests/Inputs/early-tests/aaa.txt
rename to llvm/utils/lit/tests/Inputs/reorder/aaa.txt

diff  --git a/llvm/utils/lit/tests/Inputs/early-tests/bbb.txt b/llvm/utils/lit/tests/Inputs/reorder/bbb.txt
similarity index 100%
rename from llvm/utils/lit/tests/Inputs/early-tests/bbb.txt
rename to llvm/utils/lit/tests/Inputs/reorder/bbb.txt

diff  --git a/llvm/utils/lit/tests/Inputs/early-tests/lit.cfg b/llvm/utils/lit/tests/Inputs/reorder/lit.cfg
similarity index 67%
rename from llvm/utils/lit/tests/Inputs/early-tests/lit.cfg
rename to llvm/utils/lit/tests/Inputs/reorder/lit.cfg
index db030510c249..6320609a1e6c 100644
--- a/llvm/utils/lit/tests/Inputs/early-tests/lit.cfg
+++ b/llvm/utils/lit/tests/Inputs/reorder/lit.cfg
@@ -1,7 +1,6 @@
 import lit.formats
-config.name = 'early-tests'
+config.name = 'reorder'
 config.suffixes = ['.txt']
 config.test_format = lit.formats.ShTest()
 config.test_source_root = None
 config.test_exec_root = None
-config.early_tests = { "subdir/ccc.txt" }

diff  --git a/llvm/utils/lit/tests/Inputs/early-tests/subdir/ccc.txt b/llvm/utils/lit/tests/Inputs/reorder/subdir/ccc.txt
similarity index 100%
rename from llvm/utils/lit/tests/Inputs/early-tests/subdir/ccc.txt
rename to llvm/utils/lit/tests/Inputs/reorder/subdir/ccc.txt

diff  --git a/llvm/utils/lit/tests/early-tests.py b/llvm/utils/lit/tests/early-tests.py
deleted file mode 100644
index b2ca9ac0a97d..000000000000
--- a/llvm/utils/lit/tests/early-tests.py
+++ /dev/null
@@ -1,9 +0,0 @@
-## Check that we can run tests early.
-
-# RUN: %{lit} -j1 %{inputs}/early-tests | FileCheck %s
-
-# CHECK:     -- Testing: 3 tests, 1 workers --
-# CHECK-NEXT: PASS: early-tests :: subdir/ccc.txt
-# CHECK-NEXT: PASS: early-tests :: aaa.txt
-# CHECK-NEXT: PASS: early-tests :: bbb.txt
-# CHECK:     Passed: 3

diff  --git a/llvm/utils/lit/tests/ignore-fail.py b/llvm/utils/lit/tests/ignore-fail.py
index 135e29baa5a6..63c34516226d 100644
--- a/llvm/utils/lit/tests/ignore-fail.py
+++ b/llvm/utils/lit/tests/ignore-fail.py
@@ -6,10 +6,10 @@
 
 # END.
 
-# CHECK: FAIL: ignore-fail :: fail.txt
-# CHECK: UNRESOLVED: ignore-fail :: unresolved.txt
-# CHECK: XFAIL: ignore-fail :: xfail.txt
-# CHECK: XPASS: ignore-fail :: xpass.txt
+# CHECK-DAG: FAIL: ignore-fail :: fail.txt
+# CHECK-DAG: UNRESOLVED: ignore-fail :: unresolved.txt
+# CHECK-DAG: XFAIL: ignore-fail :: xfail.txt
+# CHECK-DAG: XPASS: ignore-fail :: xpass.txt
 
 #      CHECK: Testing Time:
 # CHECK-NEXT:   Expectedly Failed : 1

diff  --git a/llvm/utils/lit/tests/reorder.py b/llvm/utils/lit/tests/reorder.py
new file mode 100644
index 000000000000..7c9dc8d21fe3
--- /dev/null
+++ b/llvm/utils/lit/tests/reorder.py
@@ -0,0 +1,12 @@
+## Check that we can reorder test runs.
+
+# RUN: cp %{inputs}/reorder/.lit_test_times.txt %{inputs}/reorder/.lit_test_times.txt.orig
+# RUN: %{lit} -j1 %{inputs}/reorder | FileCheck %s
+# RUN: not 
diff  %{inputs}/reorder/.lit_test_times.txt %{inputs}/reorder/.lit_test_times.txt.orig
+# END.
+
+# CHECK:     -- Testing: 3 tests, 1 workers --
+# CHECK-NEXT: PASS: reorder :: subdir/ccc.txt
+# CHECK-NEXT: PASS: reorder :: bbb.txt
+# CHECK-NEXT: PASS: reorder :: aaa.txt
+# CHECK:     Passed: 3

diff  --git a/llvm/utils/lit/tests/shtest-shell.py b/llvm/utils/lit/tests/shtest-shell.py
index 4c247de15ddd..3f1ead3b297a 100644
--- a/llvm/utils/lit/tests/shtest-shell.py
+++ b/llvm/utils/lit/tests/shtest-shell.py
@@ -8,6 +8,8 @@
 #
 # Test again in non-UTF shell to catch potential errors with python 2 seen
 # on stdout-encoding.txt
+# FIXME: lit's testing sets source_root == exec_root which complicates running lit more than once per test.
+# RUN: rm -f %{inputs}/shtest-shell/.lit_test_times.txt
 # RUN: env PYTHONIOENCODING=ascii not %{lit} -j 1 -a %{inputs}/shtest-shell > %t.ascii.out
 # FIXME: Temporarily dump test output so we can debug failing tests on
 # buildbots.

diff  --git a/mlir/test/Unit/lit.cfg.py b/mlir/test/Unit/lit.cfg.py
index ea14853e71d6..d645971074f5 100644
--- a/mlir/test/Unit/lit.cfg.py
+++ b/mlir/test/Unit/lit.cfg.py
@@ -13,9 +13,6 @@
 # suffixes: A list of file extensions to treat as test files.
 config.suffixes = []
 
-# is_early; Request to run this suite early.
-config.is_early = True
-
 # test_source_root: The root path where tests are located.
 # test_exec_root: The root path where tests should be run.
 config.test_exec_root = os.path.join(config.mlir_obj_root, 'unittests')