[llvm-dev] [lit] RFC: Per test timeout
Jonathan Roelofs via llvm-dev
llvm-dev at lists.llvm.org
Sun Nov 15 08:09:22 PST 2015
On 11/14/15 9:37 AM, Jonathan Roelofs via llvm-dev wrote:
>
>
> On 11/14/15 3:10 AM, Dan Liew via llvm-dev wrote:
>> Hi,
>>
>> A feature I've wanted in lit for a while is a having a timeout per
>> test. Attached
>> are patches that implement this idea.
>
> Cool, I hope this succeeds. I tried implementing per-test timeouts
> before, and couldn't get it to work in all cases. The review eventually
> fizzled out, and I abandoned it.
>
> Here's that old review: http://reviews.llvm.org/D6584 Perhaps you can
> cannibalize testcases from it.
>
>>
>> I'm e-mailing llvm-dev rather than llvm-commits
>> because I want to gather more feedback on my initial implementation and
>> hopefully some answers to some unresolved issues with my implementation.
>>
>> Currently in lit you can set a global timeout for
>> all of the tests but not for each individual test.
>>
>> The attached patches add
>>
>> * Support for a new ResultCode called TIMEOUT
>
> When I implemented mine, one of the first revisions had this, but then
> @ddunbar said he'd prefer I didn't add a new ResultCode.
>
>> * A new command line option --max-individual-test-time
>
> I think you should call it `--timeout=`, and then say in the description
> that it's a per-test timeout.
>
>> * Support for running external and internal ShTests with a per test
>> timeout
>> * Support for running GTests with a per test timeout
>>
>> I wanted to get some initial feedback on the implementation.
>>
>> * If a timeout is requested the Python psutil module is required.
>> This module does not ship with Python but is availble via pip
>> (or on Linux your distribution's package manager). How do people feel
>> about this? I don't like adding extra dependencies but this module
>> makes it really easy to kill a process and all its children
>> recursively
>> in a platform neutral way. Note that if a per test timeout is not
>> requested
>> then the psutil module is not imported and lit acts just like it did
>> before my patches.
>
> This must be the missing piece... I couldn't get my implementation to
> work without resorting to Python 3.x features (which is incompatible
> with a 2.x minimum version).
>
>>
>> * If the platform running lit doesn't have psutil installed and a
>> timeout is requested
>> an exception will be thrown. Should we provide a more friendly
>> error message?
>
> Yes.
>
>> If so where should this go in lit's code?
>
> Not sure off the top of my head, but probably somewhere near the rest of
> the argument parsing stuff. I'll have a look later.
>
>>
>> * I've not tested these patches on OSX or Windows. Is anyone on those
>> platforms willing
>> to give them a try?
>
> Yes, I'll give it a whirl on Darwin early next week.
I implemented the fixes I suggested, and my testcases from before. Works
for me on Darwin. See attached.
Cheers,
Jon
>
>
> Mind squashing your two patches, throwing them on Phabricator, and
> cc-ing llvm-commits?
>
>
> Thanks!
>
> Jon
>
>>
>> * The behaviour of --max-individual-test-time is a little at odds with
>> the behaviour of --max-time. --max-time will mark unexecuted tests as
>> UNRESOLVED whereas --max-individual-test-time will mark a test that
>> ran out of time as TIMEOUT. Is this okay?
>>
>> * @Chris Matthews. Does the code that emits xunit xml files need to
>> change in anyway?
>>
>> * @Daniel Dunbar. If these changes (in some form) end up being
>> committed would you be happy
>> to push a new release of lit to PyPy?
>>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
--
Jon Roelofs
jonathan at codesourcery.com
CodeSourcery / Mentor Embedded
-------------- next part --------------
Index: utils/lit/lit/LitConfig.py
===================================================================
--- utils/lit/lit/LitConfig.py (revision 253080)
+++ utils/lit/lit/LitConfig.py (working copy)
@@ -21,7 +21,8 @@
def __init__(self, progname, path, quiet,
useValgrind, valgrindLeakCheck, valgrindArgs,
noExecute, debug, isWindows,
- params, config_prefix = None):
+ params, config_prefix = None,
+ timeout = 0):
# The name of the test runner.
self.progname = progname
# The items to add to the PATH environment variable.
@@ -57,6 +58,7 @@
self.valgrindArgs.append('--leak-check=no')
self.valgrindArgs.extend(self.valgrindUserArgs)
+ self.timeout = timeout
def load_config(self, config, path):
"""load_config(config, path) - Load a config object from an alternate
Index: utils/lit/lit/Test.py
===================================================================
--- utils/lit/lit/Test.py (revision 253080)
+++ utils/lit/lit/Test.py (working copy)
@@ -33,6 +33,7 @@
XPASS = ResultCode('XPASS', True)
UNRESOLVED = ResultCode('UNRESOLVED', True)
UNSUPPORTED = ResultCode('UNSUPPORTED', False)
+TIMEOUT = ResultCode('TIMEOUT', True)
# Test metric values.
Index: utils/lit/lit/TestRunner.py
===================================================================
--- utils/lit/lit/TestRunner.py (revision 253080)
+++ utils/lit/lit/TestRunner.py (working copy)
@@ -3,6 +3,7 @@
import re
import platform
import tempfile
+import threading
import lit.ShUtil as ShUtil
import lit.Test as Test
@@ -14,6 +15,11 @@
self.command = command
self.message = message
+class TestTimeoutError(Exception):
+ def __init__(self, command, message):
+ self.command = command
+ self.message = message
+
kIsWindows = platform.system() == 'Windows'
# Don't use close_fds on Windows.
@@ -33,28 +39,144 @@
self.cwd = cwd
self.env = dict(env)
-def executeShCmd(cmd, shenv, results):
+class TimeoutHelper(object):
+ """
+ Object used to helper manage enforcing a timeout in
+ _executeShCmd(). It is passed through recursive calls
+ to collect processes that have been executed so that when
+ the timeout happens they can be killed.
+ """
+ def __init__(self, timeout):
+ self.timeout = timeout
+ self._procs = []
+ self._timeoutReached = False
+ self._doneKillPass = False
+ # This lock will be used to protect concurrent access
+ # to _procs and _doneKillPass
+ self._lock = None
+ self._timer = None
+
+ def cancel(self):
+ if not self.active():
+ return
+ self._timer.cancel()
+
+ def active(self):
+ return self.timeout > 0
+
+ def addProcess(self, proc):
+ if not self.active():
+ return
+ needToRunKill = False
+ try:
+ self._lock.acquire()
+ self._procs.append(proc)
+ needToRunKill = self._doneKillPass
+ finally:
+ self._lock.release()
+
+ # The initial call to _kill() from the timer thread
+ # already happened so we need to call it again from
+ # this thread, otherwise this process will be left to
+ # run even though the timeout was already hit
+ if needToRunKill:
+ assert self.timeoutReached()
+ self._kill()
+
+ def startTimer(self):
+ if not self.active():
+ return
+
+ # Do some late initialisation that's only needed
+ # if there is a timeout set
+ import psutil
+ self._lock = threading.Lock()
+ self._timer = threading.Timer(self.timeout, self._handleTimeoutReached)
+ self._timer.start()
+
+ def _handleTimeoutReached(self):
+ self._timeoutReached = True
+ self._kill()
+
+ def timeoutReached(self):
+ return self._timeoutReached
+
+
+ def _kill(self):
+ """
+ This method may be called multiple times as we might get unlucky
+ and be in the middle of creating a new process in _executeShCmd()
+ which won't yet be in ``self._procs``. By locking here and in
+ addProcess() we should be able to kill processes launched after
+ the initial call to _kill()
+ """
+ # Replace with with statement?
+ try:
+ self._lock.acquire()
+ import psutil
+ for p in self._procs:
+ try:
+ psutilProc = psutil.Process(p.pid)
+ for child in psutilProc.children(recursive=True):
+ try:
+ child.kill()
+ except psutil.NoSuchProcess:
+ pass
+ psutilProc.kill()
+ except psutil.NoSuchProcess:
+ pass
+
+ # Empty the list and note that we've done a pass over the list
+ self._procs = [] # Python2 doesn't have list.clear()
+ self._doneKillPass = True
+ finally:
+ self._lock.release()
+
+def executeShCmd(cmd, shenv, results, timeout=0):
+ """
+ Wrapper around _executeShCmd that handles
+ timeout
+ """
+ # Use the helper even when no timeout is required to make
+ # other code simpler. ``timeoutHelper.timeoutReached()``
+ # will always return False
+ timeoutHelper = TimeoutHelper(timeout)
+ if timeout > 0:
+ timeoutHelper.startTimer()
+ finalExitCode = _executeShCmd(cmd, shenv, results, timeoutHelper)
+ timeoutHelper.cancel()
+ if timeoutHelper.timeoutReached():
+ raise TestTimeoutError(cmd, 'Reached test timeout of {} seconds'.format(timeout))
+ return finalExitCode
+
+
+def _executeShCmd(cmd, shenv, results, timeoutHelper):
+ if timeoutHelper.timeoutReached():
+ # Prevent further recursion if the timeout has been hit
+ # as we should try avoid launching more processes.
+ return None
+
if isinstance(cmd, ShUtil.Seq):
if cmd.op == ';':
- res = executeShCmd(cmd.lhs, shenv, results)
- return executeShCmd(cmd.rhs, shenv, results)
+ res = _executeShCmd(cmd.lhs, shenv, results, timeoutHelper)
+ return _executeShCmd(cmd.rhs, shenv, results, timeoutHelper)
if cmd.op == '&':
raise InternalShellError(cmd,"unsupported shell operator: '&'")
if cmd.op == '||':
- res = executeShCmd(cmd.lhs, shenv, results)
+ res = _executeShCmd(cmd.lhs, shenv, results, timeoutHelper)
if res != 0:
- res = executeShCmd(cmd.rhs, shenv, results)
+ res = _executeShCmd(cmd.rhs, shenv, results, timeoutHelper)
return res
if cmd.op == '&&':
- res = executeShCmd(cmd.lhs, shenv, results)
+ res = _executeShCmd(cmd.lhs, shenv, results, timeoutHelper)
if res is None:
return res
if res == 0:
- res = executeShCmd(cmd.rhs, shenv, results)
+ res = _executeShCmd(cmd.rhs, shenv, results, timeoutHelper)
return res
raise ValueError('Unknown shell command: %r' % cmd.op)
@@ -206,6 +328,8 @@
stderr = stderr,
env = cmd_shenv.env,
close_fds = kUseCloseFDs))
+ # Let the helper know about this process
+ timeoutHelper.addProcess(procs[-1])
except OSError as e:
raise InternalShellError(j, 'Could not create process ({}) due to {}'.format(executable, e))
@@ -311,7 +435,7 @@
results = []
try:
shenv = ShellEnvironment(cwd, test.config.environment)
- exitCode = executeShCmd(cmd, shenv, results)
+ exitCode = executeShCmd(cmd, shenv, results, timeout=litConfig.timeout)
except InternalShellError:
e = sys.exc_info()[1]
exitCode = 127
@@ -359,8 +483,12 @@
# run on clang with no real loss.
command = litConfig.valgrindArgs + command
- return lit.util.executeCommand(command, cwd=cwd,
- env=test.config.environment)
+ try:
+ return lit.util.executeCommand(command, cwd=cwd,
+ env=test.config.environment,
+ timeout=litConfig.timeout)
+ except lit.util.ExecuteCommandTimeoutException:
+ raise TestTimeoutError(command, 'Hit timeout of {} seconds'.format(litConfig.timeout))
def parseIntegratedTestScriptCommands(source_path, keywords):
"""
@@ -564,12 +692,15 @@
lit.util.mkdir_p(os.path.dirname(tmpBase))
execdir = os.path.dirname(test.getExecPath())
- if useExternalSh:
- res = executeScript(test, litConfig, tmpBase, script, execdir)
- else:
- res = executeScriptInternal(test, litConfig, tmpBase, script, execdir)
- if isinstance(res, lit.Test.Result):
- return res
+ try:
+ if useExternalSh:
+ res = executeScript(test, litConfig, tmpBase, script, execdir)
+ else:
+ res = executeScriptInternal(test, litConfig, tmpBase, script, execdir)
+ if isinstance(res, lit.Test.Result):
+ return res
+ except TestTimeoutError as e:
+ return lit.Test.Result(Test.TIMEOUT, e.message)
out,err,exitCode = res
if exitCode == 0:
Index: utils/lit/lit/formats/googletest.py
===================================================================
--- utils/lit/lit/formats/googletest.py (revision 253080)
+++ utils/lit/lit/formats/googletest.py (working copy)
@@ -109,8 +109,15 @@
if litConfig.noExecute:
return lit.Test.PASS, ''
- out, err, exitCode = lit.util.executeCommand(
- cmd, env=test.config.environment)
+ try:
+ out, err, exitCode = lit.util.executeCommand(
+ cmd, env=test.config.environment,
+ timeout=litConfig.timeout)
+ except lit.util.ExecuteCommandTimeoutException:
+ return (lit.Test.TIMEOUT,
+ 'Reached timeout of {} seconds'.format(
+ litConfig.timeout)
+ )
if exitCode:
return lit.Test.FAIL, out + err
Index: utils/lit/lit/main.py
===================================================================
--- utils/lit/lit/main.py (revision 253080)
+++ utils/lit/lit/main.py (working copy)
@@ -205,6 +205,10 @@
group.add_option("", "--xunit-xml-output", dest="xunit_output_file",
help=("Write XUnit-compatible XML test reports to the"
" specified file"), default=None)
+ group.add_option("", "--timeout", dest="timeout",
+ help="Maximum time to spend running a single test (in seconds)."
+ " 0 means no time limit. [Default: %default]",
+ type=int, default=0)
parser.add_option_group(group)
group = OptionGroup(parser, "Test Selection")
@@ -275,6 +279,12 @@
name,val = entry.split('=', 1)
userParams[name] = val
+ if opts.timeout != 0:
+ try:
+ import psutil
+ except ImportError as e:
+ parser.error('psutil must be installed to use --timeout=N for N > 0')
+
# Create the global config object.
litConfig = lit.LitConfig.LitConfig(
progname = os.path.basename(sys.argv[0]),
@@ -287,7 +297,8 @@
debug = opts.debug,
isWindows = isWindows,
params = userParams,
- config_prefix = opts.configPrefix)
+ config_prefix = opts.configPrefix,
+ timeout = opts.timeout)
# Perform test discovery.
run = lit.run.Run(litConfig,
@@ -377,7 +388,7 @@
extra = ' of %d' % numTotalTests
header = '-- Testing: %d%s tests, %d threads --'%(len(run.tests), extra,
opts.numThreads)
-
+ print('Using individual test timeout of {} seconds'.format(opts.timeout))
progressBar = None
if not opts.quiet:
if opts.succinct and opts.useProgressBar:
@@ -447,7 +458,8 @@
('Unsupported Tests ', lit.Test.UNSUPPORTED),
('Unresolved Tests ', lit.Test.UNRESOLVED),
('Unexpected Passes ', lit.Test.XPASS),
- ('Unexpected Failures', lit.Test.FAIL)):
+ ('Unexpected Failures', lit.Test.FAIL),
+ ('Individual Timeouts', lit.Test.TIMEOUT)):
if opts.quiet and not code.isFailure:
continue
N = len(byCode.get(code,[]))
Index: utils/lit/lit/util.py
===================================================================
--- utils/lit/lit/util.py (revision 253080)
+++ utils/lit/lit/util.py (working copy)
@@ -6,6 +6,7 @@
import signal
import subprocess
import sys
+import threading
def to_bytes(str):
# Encode to UTF-8 to get binary data.
@@ -155,18 +156,60 @@
pDigits, pfDigits, i*barH, pDigits, pfDigits, (i+1)*barH,
'*'*w, ' '*(barW-w), cDigits, len(row), cDigits, len(items)))
+class ExecuteCommandTimeoutException(Exception):
+ pass
# Close extra file handles on UNIX (on Windows this cannot be done while
# also redirecting input).
kUseCloseFDs = not (platform.system() == 'Windows')
-def executeCommand(command, cwd=None, env=None, input=None):
+def executeCommand(command, cwd=None, env=None, input=None, timeout=0):
+ if timeout > 0:
+ # We need psutil to help us kill child processes in
+ # a clean manner. Psutil has a wait() with a timeout but
+ # it can't be used because we block with communicate() which
+ # doesn't take a timeout. To use it we would need to replace
+ # the pipes with files (to avoid deadlock) and just use wait()
+ # and read the files afterwards
+ import psutil
p = subprocess.Popen(command, cwd=cwd,
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
env=env, close_fds=kUseCloseFDs)
- out,err = p.communicate(input=input)
- exitCode = p.wait()
+ timerObject = None
+ # FIXME: Because of the way nested function scopes work in Python 2.x we
+ # need to use a reference to a mutable object rather than a plain
+ # bool. In Python 3 we could use the "nonlocal" keyword but we need
+ # to support Python 2 as well.
+ hitTimeOut = [False]
+ try:
+ if timeout > 0:
+ def killProcess():
+ # We may be invoking a shell so we need to kill the
+ # process and all its children. Using psutil seems
+ # like the easiest way to do this.
+ hitTimeOut[0] = True
+ try:
+ psutilProc = psutil.Process(p.pid)
+ for child in psutilProc.children(recursive=True):
+ try:
+ child.kill()
+ except psutil.NoSuchProcess:
+ pass
+ psutilProc.kill()
+ except psutil.NoSuchProcess:
+ pass
+ timerObject = threading.Timer(timeout, killProcess);
+ timerObject.start()
+ out,err = p.communicate(input=input)
+ exitCode = p.wait()
+ finally:
+ if timerObject != None:
+ timerObject.cancel()
+
+ if hitTimeOut[0]:
+ raise ExecuteCommandTimeoutException()
+
# Detect Ctrl-C in subprocess.
if exitCode == -signal.SIGINT:
raise KeyboardInterrupt
Index: utils/lit/tests/Inputs/timeout/infloop.py
===================================================================
--- utils/lit/tests/Inputs/timeout/infloop.py (revision 0)
+++ utils/lit/tests/Inputs/timeout/infloop.py (working copy)
@@ -0,0 +1,8 @@
+# RUN: %{python} %s
+# XFAIL: *
+
+import sys
+
+print "infinite loop"
+while True:
+ pass
Index: utils/lit/tests/Inputs/timeout/lit.cfg
===================================================================
--- utils/lit/tests/Inputs/timeout/lit.cfg (revision 0)
+++ utils/lit/tests/Inputs/timeout/lit.cfg (working copy)
@@ -0,0 +1,16 @@
+# -*- Python -*-
+
+import os
+import sys
+
+import lit.formats
+
+config.name = 'timeout'
+config.test_format = lit.formats.ShTest(execute_external=False)
+config.suffixes = ['.py']
+config.test_source_root = os.path.dirname(__file__)
+config.test_exec_root = config.test_source_root
+config.target_triple = '(unused)'
+src_root = os.path.join(config.test_source_root, '..')
+config.environment['PYTHONPATH'] = src_root
+config.substitutions.append(('%{python}', sys.executable))
Index: utils/lit/tests/Inputs/timeout/short.py
===================================================================
--- utils/lit/tests/Inputs/timeout/short.py (revision 0)
+++ utils/lit/tests/Inputs/timeout/short.py (working copy)
@@ -0,0 +1,5 @@
+# RUN: %{python} %s
+
+import sys
+
+print "short program"
Index: utils/lit/tests/timeout.py
===================================================================
--- utils/lit/tests/timeout.py (revision 0)
+++ utils/lit/tests/timeout.py (working copy)
@@ -0,0 +1,11 @@
+# RUN: not %{lit} \
+# RUN: %{inputs}/timeout/infloop.py \
+# RUN: %{inputs}/timeout/short.py \
+# RUN: -j 1 -v --debug --timeout 1 > %t.out 2> %t.err
+# RUN: FileCheck --check-prefix=CHECK-OUT < %t.out %s
+#
+
+# CHECK-OUT: TIMEOUT: timeout :: infloop.py
+# CHECK-OUT: PASS: timeout :: short.py
+# CHECK-OUT: Expected Passes{{ *}}: 1
+# CHECK-OUT: Individual Timeouts{{ *}}: 1
More information about the llvm-dev
mailing list