[llvm] [lit] Fix lit hang on pool join when exception is thrown (PR #131881)
David Garcia Orozco via llvm-commits
llvm-commits at lists.llvm.org
Tue Mar 18 11:22:05 PDT 2025
https://github.com/ayylol created https://github.com/llvm/llvm-project/pull/131881
In certain environments if an exception is not immediately caught lit will hang on the following call https://github.com/llvm/llvm-project/blob/19970535f92c0f2dcda01b7fc60f95945166e424/llvm/utils/lit/lit/run.py#L93
This can occur when using the internal lit shell and trying to run a program that does not exist. In this case `_executeShCmd` will throw an internal shell error, which will not be caught by the function directly calling it, `executeShCmd`, rather it is caught one function higher in the call stack in `executeScriptInternal`. Because that exception is percolated up the call stack instead of being immediately caught lit will hang in certain environments. This patch changes the location where we catch this exception to `executeShCmd` instead to avoid this.
For more background on what causes this hang see:
https://stackoverflow.com/questions/15314189/python-multiprocessing-pool-hangs-at-join
https://bugs.python.org/issue9400
https://github.com/python/cpython/issues/53646
>From 7851d95c356be62edd3b1286c78f3ad2efc24ecc Mon Sep 17 00:00:00 2001
From: "Garcia Orozco, David" <david.garcia.orozco at intel.com>
Date: Tue, 18 Mar 2025 07:12:19 -0700
Subject: [PATCH] Fix lit hang
---
llvm/utils/lit/lit/TestRunner.py | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/llvm/utils/lit/lit/TestRunner.py b/llvm/utils/lit/lit/TestRunner.py
index 00432b8d31778..471c76b732e62 100644
--- a/llvm/utils/lit/lit/TestRunner.py
+++ b/llvm/utils/lit/lit/TestRunner.py
@@ -201,7 +201,12 @@ def executeShCmd(cmd, shenv, results, timeout=0):
timeoutHelper = TimeoutHelper(timeout)
if timeout > 0:
timeoutHelper.startTimer()
- finalExitCode = _executeShCmd(cmd, shenv, results, timeoutHelper)
+ try:
+ finalExitCode = _executeShCmd(cmd, shenv, results, timeoutHelper)
+ except InternalShellError:
+ e = sys.exc_info()[1]
+ finalExitCode = 127
+ results.append(ShellCommandResult(e.command, "", e.message, finalExitCode, False))
timeoutHelper.cancel()
timeoutInfo = None
if timeoutHelper.timeoutReached():
@@ -1105,15 +1110,10 @@ def executeScriptInternal(
results = []
timeoutInfo = None
- try:
- shenv = ShellEnvironment(cwd, test.config.environment)
- exitCode, timeoutInfo = executeShCmd(
+ shenv = ShellEnvironment(cwd, test.config.environment)
+ exitCode, timeoutInfo = executeShCmd(
cmd, shenv, results, timeout=litConfig.maxIndividualTestTime
- )
- except InternalShellError:
- e = sys.exc_info()[1]
- exitCode = 127
- results.append(ShellCommandResult(e.command, "", e.message, exitCode, False))
+ )
out = err = ""
for i, result in enumerate(results):
More information about the llvm-commits
mailing list