[llvm] [CI] Extend metrics container to log BuildKite metrics (PR #129699)

Aiden Grossman via llvm-commits llvm-commits at lists.llvm.org
Wed Mar 5 09:31:08 PST 2025


Nathan =?utf-8?q?Gauër?= <brioche at google.com>,
Nathan =?utf-8?q?Gauër?= <brioche at google.com>,
Nathan =?utf-8?q?Gauër?= <brioche at google.com>
Message-ID:
In-Reply-To: <llvm.org/llvm/llvm-project/pull/129699 at github.com>


================
@@ -35,6 +48,145 @@ class GaugeMetric:
     time_ns: int
 
 
+# Fetches a page of the build list using the GraphQL BuildKite API.
+# Returns the BUILDKITE_GRAPHQL_BUILDS_PER_PAGE last **finished** builds by
+# default, or the BUILDKITE_GRAPHQL_BUILDS_PER_PAGE **finished** builds older
+# than the one pointer by
+# |cursor| if provided.
+# The |cursor| value is taken from the previous page returned by the API.
+# The returned data had the following format:
+# [
+#   {
+#       "cursor": <value>,
+#       "number": <build-number>,
+#   }
+# ]
+def buildkite_fetch_page_build_list(buildkite_token, after_cursor=None):
+    BUILDKITE_GRAPHQL_QUERY = """
+  query OrganizationShowQuery {{
+    organization(slug: "llvm-project") {{
+      pipelines(search: "Github pull requests", first: 1) {{
+        edges {{
+          node {{
+            builds (state: [FAILED, PASSED], first: {PAGE_SIZE}, after: {AFTER}) {{
+              edges {{
+                cursor
+                node {{
+                  number
+                }}
+              }}
+            }}
+          }}
+        }}
+      }}
+    }}
+  }}
+  """
+    data = BUILDKITE_GRAPHQL_QUERY.format(
+        PAGE_SIZE=BUILDKITE_GRAPHQL_BUILDS_PER_PAGE,
+        AFTER="null" if after_cursor is None else '"{}"'.format(after_cursor),
+    )
+    data = data.replace("\n", "").replace('"', '\\"')
+    data = '{ "query": "' + data + '" }'
+    url = "https://graphql.buildkite.com/v1"
+    headers = {
+        "Authorization": "Bearer " + buildkite_token,
+        "Content-Type": "application/json",
+    }
+    r = requests.post(url, data=data, headers=headers)
+    data = r.json()
+    # De-nest the build list.
+    builds = data["data"]["organization"]["pipelines"]["edges"][0]["node"]["builds"][
+        "edges"
+    ]
+    # Fold cursor info into the node dictionnary.
+    return [{**x["node"], "cursor": x["cursor"]} for x in builds]
+
+
+# Returns all the info associated with the provided |build_number|.
+# Note: for unknown reasons, graphql returns no jobs for a given build, while
+# this endpoint does, hence why this uses this API instead of graphql.
+def buildkite_get_build_info(build_number):
+    URL = "https://buildkite.com/llvm-project/github-pull-requests/builds/{}.json"
+    return requests.get(URL.format(build_number)).json()
+
+
+# returns the last BUILDKITE_GRAPHQL_BUILDS_PER_PAGE builds by default, or
+# until the build pointed by |last_cursor| is found.
+def buildkite_get_builds_up_to(buildkite_token, last_cursor=None):
+    output = []
+    cursor = None
+
+    while True:
+        page = buildkite_fetch_page_build_list(buildkite_token, cursor)
+        # No cursor provided, return the first page.
+        if last_cursor is None:
+            return page
+
+        # Cursor has been provided, check if present in this page.
+        match_index = next(
+            (i for i, x in enumerate(page) if x["cursor"] == last_cursor), None
+        )
+        # Not present, continue loading more pages.
+        if match_index is None:
+            output += page
+            cursor = page[-1]["cursor"]
+            continue
+        # Cursor found, keep results up to cursor
+        output += page[:match_index]
+        return output
+
+
+# Returns a (metrics, cursor) tuple.
+# Returns the BuildKite workflow metrics up to the build pointed by |last_cursor|.
+# If |last_cursor| is None, no metrics are returned.
+# The returned cursor is either:
+#  - the last processed build.
+#  - the last build if no initial cursor was provided.
+def buildkite_get_metrics(buildkite_token, last_cursor=None):
+    builds = buildkite_get_builds_up_to(buildkite_token, last_cursor)
+    # Don't return any metrics if last_cursor is None.
+    # This happens when the program starts.
----------------
boomanaiden154 wrote:

That makes sense.

I'm not sure recovery matters that much. If we miss some metrics in the time the container is down, I don't think it's a big deal as long as everything runs well in steady state.

https://github.com/llvm/llvm-project/pull/129699


More information about the llvm-commits mailing list