[llvm] [libc++] Restart all preempted jobs (PR #95565)

via llvm-commits llvm-commits at lists.llvm.org
Fri Jun 14 13:18:05 PDT 2024


EricWF wrote:

> Instead of only restarting preempted jobs when there was no other failure in the workflow, always restart preempted jobs. Doing otherwise leads to really confusing CI output since most of the failures are due to preempted workflows. One has to basically search for the job that failed "for real".

But there _are_ failed jobs, and they will restart and waste resources. If the majority of the jobs are failing, we're just wasting resources to solve a UI issue.

The current situation is unfortunate. But this leaves us re-running and re-running failed jobs because a single job was preempted. This could, in theory, continue in-perpetuity. As the group of failing jobs might have one or more get preempted every time. This could cripple the windows bots, which are part of a very large group of other jobs, and are already overloaded.

Additionally, It seems from experience that the more bots we have running in any one cluster, the more likely _each one_ of them is to be selected for preemption.

What we really want is the ability to restart a single job, rather than all failed jobs. Alternatively, we could add github annotations to the failing jobs that contain the error message. 

Give me a few days to work on a more direct solution to this.

https://github.com/llvm/llvm-project/pull/95565


More information about the llvm-commits mailing list