[llvm-branch-commits] [llvm] release/22.x: [lit] Explicitly unset timer to free thread stack (#188717) (PR #188938)

Mon Mar 30 08:07:27 PDT 2026

https://github.com/c-rhodes updated https://github.com/llvm/llvm-project/pull/188938

>From 719a040bf13fc854bde9e35a95520a78cf021c37 Mon Sep 17 00:00:00 2001
From: Nick Begg <nick at stunttruck.net>
Date: Mon, 16 Mar 2026 15:25:16 +0100
Subject: [PATCH 1/3] [lit] Stop holding subprocess objects open in
 TimeoutHelper (#186712)

Tweak TestRunner's TimeoutHelper storage to hold only PIDs rather
than the whole process object. Holding the object causes many pipes to
stay open, when all we need is the pid.

Addresses #185941

(cherry picked from commit 202ef22faeb1c2a7b5846a446e8c8dfe579d7c29)
---
 llvm/utils/lit/lit/TestRunner.py | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/llvm/utils/lit/lit/TestRunner.py b/llvm/utils/lit/lit/TestRunner.py
index e9d73ade9827e..0d151ff96583e 100644
--- a/llvm/utils/lit/lit/TestRunner.py
+++ b/llvm/utils/lit/lit/TestRunner.py
@@ -136,7 +136,9 @@ def addProcess(self, proc):
             return
         needToRunKill = False
         with self._lock:
-            self._procs.append(proc)
+            # just store the pid, rather than the whole proc object.
+            # Holding the proc object keeps resources (eg pipes) open unnecessarily.
+            self._procs.append(proc.pid)
             # Avoid re-entering the lock by finding out if kill needs to be run
             # again here but call it if necessary once we have left the lock.
             # We could use a reentrant lock here instead but this code seems
@@ -176,8 +178,8 @@ def _kill(self):
         the initial call to _kill()
         """
         with self._lock:
-            for p in self._procs:
-                lit.util.killProcessAndChildren(p.pid)
+            for pid in self._procs:
+                lit.util.killProcessAndChildren(pid)
             # Empty the list and note that we've done a pass over the list
             self._procs = []  # Python2 doesn't have list.clear()
             self._doneKillPass = True

>From 58468775a5e2c488b1e42f489bb3237ca377a5af Mon Sep 17 00:00:00 2001
From: Nick Begg <nick at stunttruck.net>
Date: Thu, 26 Mar 2026 01:07:34 +0100
Subject: [PATCH 2/3] [lit] dealloc ApplyResult objects as they're waited on
 (#188642)

In _wait_for(), all async tasks are waited for. However, the objects
are held in the async_result list until the function calls complete.
This leads to about 3.6gig mem usage on my system when running
check-llvm, even though these objects aren't needed after the ar.get()
call.

Dealloc them as we go instead.

Addresses #188641

(cherry picked from commit b7d8831f8c432db97e5fcd5acdc470e7a82c92b2)
---
 llvm/utils/lit/lit/run.py | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/llvm/utils/lit/lit/run.py b/llvm/utils/lit/lit/run.py
index 9c54511bfd625..fb18a60db1b39 100644
--- a/llvm/utils/lit/lit/run.py
+++ b/llvm/utils/lit/lit/run.py
@@ -142,8 +142,10 @@ def _execute(self, deadline):
 
     def _wait_for(self, async_results, deadline):
         timeout = deadline - time.time()
-        for idx, ar in enumerate(async_results):
+        idx = 0
+        while len(async_results) > 0:
             try:
+                ar = async_results.pop(0)
                 test = ar.get(timeout)
             except multiprocessing.TimeoutError:
                 raise TimeoutError()
@@ -153,6 +155,7 @@ def _wait_for(self, async_results, deadline):
                     self.failures += 1
                     if self.failures == self.max_failures:
                         raise MaxFailuresError()
+            idx += 1
 
     # Update local test object "in place" from remote test object.  This
     # ensures that the original test object which is used for printing test

>From 561944ebcbd164bc7fc5716c641476b14085fa13 Mon Sep 17 00:00:00 2001
From: Nikita Popov <npopov at redhat.com>
Date: Thu, 26 Mar 2026 16:57:41 +0100
Subject: [PATCH 3/3] [lit] Explicitly unset timer to free thread stack
 (#188717)

Currently the virtual address space usage of lit fluctuates wildly, with
peak usage exceeding 4GB, which results in subsequent thread spawning
errors on 32-bit systems.

The cause of this is a circular reference in TimeoutHelper._timer (via the
callback), which causes the 8MB thread stack to not be immediately
reclaimed when the timer is cancelled.

We can avoid this by explicitly unsetting the timer.

(cherry picked from commit dfefc03769f58d8982202276cd3381356da12dfe)
---
 llvm/utils/lit/lit/TestRunner.py | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/llvm/utils/lit/lit/TestRunner.py b/llvm/utils/lit/lit/TestRunner.py
index 0d151ff96583e..93f03624f9984 100644
--- a/llvm/utils/lit/lit/TestRunner.py
+++ b/llvm/utils/lit/lit/TestRunner.py
@@ -127,6 +127,8 @@ def cancel(self):
         if not self.active():
             return
         self._timer.cancel()
+        # Break reference cycle so that thread stack is freed immediately.
+        self._timer = None
 
     def active(self):
         return self.timeout > 0