[llvm] [llvm-exegesis] Kill process that recieve a signal (PR #86069)

Aiden Grossman via llvm-commits llvm-commits at lists.llvm.org
Wed Mar 20 20:15:34 PDT 2024


https://github.com/boomanaiden154 created https://github.com/llvm/llvm-project/pull/86069

Before this patch, llvm-exegesis would leave processes lingering that experienced signals like segmentation faults. They would up in a signal-delivery-stop state under the ptrace and never exit. This does not cause problems (or at least many) in llvm-exegesis as they are cleaned up after the main process exits, which usually happens quickly. However, in downstream use, when many blocks are being executed (many of which run into signals) within a single process, these processes stay around and can easily exhaust the process limit on some systems.

This patch cleans them up by sending SIGKILL after information about the signal that was sent has been gathered.

>From 7c8a31cb7f6595f59d46d4717b9579bfececbe7a Mon Sep 17 00:00:00 2001
From: Aiden Grossman <agrossman154 at yahoo.com>
Date: Wed, 20 Mar 2024 20:12:08 -0700
Subject: [PATCH] [llvm-exegesis] Kill process that recieve a signal

Before this patch, llvm-exegesis would leave processes lingering that
experienced signals like segmentation faults. They would up in a
signal-delivery-stop state under the ptrace and never exit. This does
not cause problems (or at least many) in llvm-exegesis as they are
cleaned up after the main process exits, which usually happens quickly.
However, in downstream use, when many blocks are being executed (many of
which run into signals) within a single process, these processes stay
around and can easily exhaust the process limit on some systems.

This patch cleans them up by sending SIGKILL after information about the
signal that was sent has been gathered.
---
 .../llvm-exegesis/lib/BenchmarkRunner.cpp      | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/llvm/tools/llvm-exegesis/lib/BenchmarkRunner.cpp b/llvm/tools/llvm-exegesis/lib/BenchmarkRunner.cpp
index 5c9848f3c68885..f0452605eb24bf 100644
--- a/llvm/tools/llvm-exegesis/lib/BenchmarkRunner.cpp
+++ b/llvm/tools/llvm-exegesis/lib/BenchmarkRunner.cpp
@@ -342,7 +342,7 @@ class SubProcessFunctionExecutorImpl
       return make_error<Failure>("Failed to attach to the child process: " +
                                  Twine(strerror(errno)));
 
-    if (wait(NULL) == -1) {
+    if (waitpid(ParentOrChildPID, NULL, 0) == -1) {
       return make_error<Failure>(
           "Failed to wait for child process to stop after attaching: " +
           Twine(strerror(errno)));
@@ -361,7 +361,7 @@ class SubProcessFunctionExecutorImpl
       return SendError;
 
     int ChildStatus;
-    if (wait(&ChildStatus) == -1) {
+    if (waitpid(ParentOrChildPID, &ChildStatus, 0) == -1) {
       return make_error<Failure>(
           "Waiting for the child process to complete failed: " +
           Twine(strerror(errno)));
@@ -401,6 +401,20 @@ class SubProcessFunctionExecutorImpl
                                  Twine(strerror(errno)));
     }
 
+    // Send SIGKILL rather than SIGTERM as the child process has no SIGTERM
+    // handlers to run, and calling SIGTERM would mean that ptrace will force
+    // it to block in the signal-delivery-stop for the SIGSEGV/other signals,
+    // and upon exit.
+    if (kill(ParentOrChildPID, SIGKILL) == -1)
+      return make_error<Failure>("Failed to kill child benchmarking proces: " +
+                                 Twine(strerror(errno)));
+
+    // Wait for the process to exit so that there are no zombie processes left
+    // around.
+    if (waitpid(ParentOrChildPID, NULL, 0) == -1)
+      return make_error<Failure>("Failed to wait for process to die: " +
+                                 Twine(strerror(errno)));
+
     if (ChildSignalInfo.si_signo == SIGSEGV)
       return make_error<SnippetSegmentationFault>(
           reinterpret_cast<intptr_t>(ChildSignalInfo.si_addr));



More information about the llvm-commits mailing list