[llvm] Upstream libc++ buildbot restarter. (PR #93582)

via llvm-commits llvm-commits at lists.llvm.org
Tue May 28 11:15:02 PDT 2024


https://github.com/EricWF updated https://github.com/llvm/llvm-project/pull/93582

>From 13bb4f8268900aeea37df8236133476de3ffb86b Mon Sep 17 00:00:00 2001
From: Eric Fiselier <eric at efcs.ca>
Date: Tue, 28 May 2024 13:12:31 -0400
Subject: [PATCH 1/2] Upstream libc++ buildbot restarter.

I've been running a cronjob on my local machine to restart preempted
libc++ CI runs. This is bad and brittle. This upstreams a much better
version of the restarter.

It works by matching on check run annotations looking for mention
of the machine being shutdown.

If there are both preempted jobs and failing jobs, we don't restart
the workflow. Maybe we should change that?
---
 .../restart-preempted-libcxx-jobs.yaml        | 108 ++++++++++++++++++
 1 file changed, 108 insertions(+)
 create mode 100644 .github/workflows/restart-preempted-libcxx-jobs.yaml

diff --git a/.github/workflows/restart-preempted-libcxx-jobs.yaml b/.github/workflows/restart-preempted-libcxx-jobs.yaml
new file mode 100644
index 0000000000000..3da17b9f85544
--- /dev/null
+++ b/.github/workflows/restart-preempted-libcxx-jobs.yaml
@@ -0,0 +1,108 @@
+name: Restart Preempted Libc++ Workflow
+
+# The libc++ builders run on preemptable VMs, which can be shutdown at any time.
+# This workflow identifies when a workflow run was canceled due to the VM being preempted,
+# and restarts the workflow run.
+
+# We identify a canceled workflow run by checking the annotations of the check runs in the check suite,
+# which should contain the message "The runner has received a shutdown signal."
+
+# Note: If a job is both preempted and also contains a non-preemption failure, we do not restart the workflow.
+
+on:
+  workflow_run:
+    workflows:
+      - "Build and Test libc\+\+"
+    types:
+      - failure
+      - canceled
+
+permissions:
+  contents: read
+
+jobs:
+  restart:
+    name: "Restart Job"
+    permissions:
+      statuses: read
+      checks: read
+      actions: write
+    runs-on: ubuntu-latest
+    steps:
+      - name: "Restart Job"
+        uses: actions/github-script at 60a0d83039c74a4aee543508d2ffcb1c3799cdea #v7.0.1
+        with:
+          script: |
+            const failure_regex = /Process completed with exit code 1./
+            const preemption_regex = /The runner has received a shutdown signal/
+            
+            console.log('Listing check runs for suite')
+            const check_suites = await github.rest.checks.listForSuite({
+              owner: context.repo.owner,
+              repo: context.repo.repo,
+              check_suite_id: context.payload.workflow_run.check_suite_id
+            })
+
+            check_run_ids = [];
+            for (check_run of check_suites.data.check_runs) {
+              console.log('Checking check run: ' + check_run.id);
+              console.log(check_run);
+              if (check_run.status != 'completed') {
+                console.log('Check run was not completed. Skipping.');
+                continue;
+              }
+              if (check_run.conclusion != 'failure' && check_run.conclusion != 'cancelled') {
+                console.log('Check run had conclusion: ' + check_run.conclusion + '. Skipping.');
+                continue;
+              }
+              check_run_ids.push(check_run.id);
+            }
+            
+            has_preempted_job = false;
+
+            for (check_run_id of check_run_ids) {
+              console.log('Listing annotations for check run: ' + check_run_id);
+                 
+              annotations = await github.rest.checks.listAnnotations({
+                owner: context.repo.owner,
+                repo: context.repo.repo,
+                check_run_id: check_run_id
+              })
+              
+              console.log(annotations);
+              for (annotation of annotations.data) {
+                if (annotation.annotation_level != 'failure') {
+                  continue;
+                }
+                
+                const preemption_match = annotation.message.match(preemption_regex);
+              
+                if (preemption_match != null) {
+                  console.log('Found preemption message: ' + annotation.message);
+                  has_preempted_job = true;
+                }
+                
+                const failure_match = annotation.message.match(failure_regex);
+                if (failure_match != null) {
+                  // We only want to restart the workflow if all of the failures were due to preemption.
+                  // We don't want to restart the workflow if there were other failures.
+                  console.log('Choosing not to rerun workflow because we found a non-preemption failure');
+                  console.log('Failure message: ' + annotation.message);
+                  return;
+                }
+              }
+            } 
+             
+            if (!has_preempted_job) {
+              console.log('No preempted jobs found. Not restarting workflow.');
+              return;
+            }
+            
+            console.log("Restarted workflow: " + context.payload.workflow_run.id);
+            await github.rest.actions.reRunWorkflowFailedJobs({
+                owner: context.repo.owner,
+                repo: context.repo.repo,
+                run_id: context.payload.workflow_run.id
+              })
+            
+        

>From a1b745c1f979909d4e44cde55151ee6f7f9f82e7 Mon Sep 17 00:00:00 2001
From: Eric Fiselier <eric at efcs.ca>
Date: Tue, 28 May 2024 14:14:44 -0400
Subject: [PATCH 2/2] Disable workflow on forks

---
 .github/workflows/restart-preempted-libcxx-jobs.yaml | 1 +
 1 file changed, 1 insertion(+)

diff --git a/.github/workflows/restart-preempted-libcxx-jobs.yaml b/.github/workflows/restart-preempted-libcxx-jobs.yaml
index 3da17b9f85544..a71f2084182e5 100644
--- a/.github/workflows/restart-preempted-libcxx-jobs.yaml
+++ b/.github/workflows/restart-preempted-libcxx-jobs.yaml
@@ -22,6 +22,7 @@ permissions:
 
 jobs:
   restart:
+    if: github.repository_owner == 'llvm'
     name: "Restart Job"
     permissions:
       statuses: read



More information about the llvm-commits mailing list