[llvm] [llvm]Add a simple Telemetry framework (PR #102323)

James Henderson via llvm-commits llvm-commits at lists.llvm.org
Thu Oct 24 02:11:31 PDT 2024


================
@@ -0,0 +1,136 @@
+//===- llvm/Telemetry/Telemetry.h - Telemetry -------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+///
+/// \file
+/// This file provides the basic framework for Telemetry
+///
+/// It comprises of three important structs/classes:
+///
+/// - Telemeter: The class responsible for collecting and forwarding
+///              telemery data.
+/// - TelemetryInfo: data courier
+/// - TelemetryConfig: this stores configurations on Telemeter.
+///
+/// Refer to its documentation at llvm/docs/Telemetry.rst for more details.
+//===---------------------------------------------------------------------===//
+
+#ifndef LLVM_TELEMETRY_TELEMETRY_H
+#define LLVM_TELEMETRY_TELEMETRY_H
+
+#include <chrono>
+#include <ctime>
+#include <memory>
+#include <optional>
+#include <string>
+
+#include "llvm/ADT/StringExtras.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/Support/Error.h"
+#include "llvm/Support/JSON.h"
+
+namespace llvm {
+namespace telemetry {
+
+/// Configuration for the Telemeter class.
+/// This stores configurations from both users and vendors and is passed
+/// to the Telemeter upon construction. (Any changes to the config after
+/// the Telemeter's construction will not have effect on it).
+///
+/// This struct can be extended as needed to add additional configuration
+/// points specific to a vendor's implementation.
+struct Config {
+  // If true, telemetry will be enabled.
+  bool EnableTelemetry;
+
+  // Implementation-defined names of additional destinations to send
+  // telemetry data (Could be stdout, stderr, or some local paths, etc).
+  //
+  // These strings will be interpreted by the vendor's code.
+  // So the users must pick the from their vendor's pre-defined
+  // set of Destinations.
+  std::vector<std::string> AdditionalDestinations;
+};
+
+/// For isa, dyn_cast, etc operations on TelemetryInfo.
+typedef unsigned KindType;
+/// This struct is used by TelemetryInfo to support isa<>, dyn_cast<>
+/// operations.
+/// It is defined as a struct (rather than an enum) because it is
+/// expectend to be extended by subclasses which may have
+/// additional TelemetryInfo types defined to describe different events.
+struct EntryKind {
+  static const KindType Base = 0;
+};
+
+/// TelemetryInfo is the data courier, used to move instrumented data
+/// from the tool being monitored to the Telemery framework.
+///
+/// This base class contains only the basic set of telemetry data.
+/// Downstream implementations can add more fields as needed to describe
+/// additional events.
+///
+/// For example, The LLDB debugger can define a DebugCommandInfo subclass
+/// which has additional fields about the debug-command being instrumented,
+/// such as `CommandArguments` or `CommandName`.
+struct TelemetryInfo {
+  // This represents a unique-id, conventionally corresponding to
+  // a tool's session - i.e., every time the tool starts until it exits.
+  //
+  // Note: a tool could have multiple sessions running at once, in which
+  // case, these shall be multiple sets of TelemetryInfo with multiple unique
+  // ids.
+  //
+  // Different usages can assign different types of IDs to this field.
+  std::string SessionId;
+
+  TelemetryInfo() = default;
+  virtual ~TelemetryInfo() = default;
+
+  virtual json::Object serializeToJson() const;
----------------
jh7370 wrote:

I'm wondering if we've got slightly different visions of things and it's driving the way we're looking at the design differently. Reading between the lines of some of the comments and responses you've made, I get the impression you expect either all concrete `TelemetryInfo` subclasses to be defined by vendors, or at least the all downstream (i.e. all) `Destination` classes to know how to convert them to whatever the Destination wants. In my view, I'd prefer it if a) there was the option to have some (but certainly not all) `TelemetryInfo` subclasses in the upstream LLVM repository, and b) a way to avoid repeating more than is fundamentally necessary to do the serialization. This is a strategy we use successfully in our internal telemetry project.

In my mind, unless we have concrete instances of at least `TelemetryInfo` in the upstream codebase, and therefore the ability for upstream tools to dispatch telemetry with no more than a vendor-specified Destination (specified in just one place), the whole upstream framework becomes pointless, because downstream vendors still end up with the same level of patches needed to actually make use of the framework.

Regarding the details, I'd expect the theoretical `JsonSerializer` class to make use of the `llvm::json` interface, so converting the input types passed by the TelemetryInfo subclass `serialize` method into the appropriate Json types (aside: this is probably generic enough that it could live in the core LLVM code). Similarly, a `ProtobufSerializer` would do something equivalent. The keys are intended to be string literals typically that the `TelemetryInfo` subclasses specify in their `serialize` method - I'm not quite sure what you mean by the "right key string" in this context, as I'm not familiar with protobuf; it would help if you could elaborate some more what the issue here is.

https://github.com/llvm/llvm-project/pull/102323


More information about the llvm-commits mailing list