[llvm] [docs][mlgo] Document `MLModelRunner` (PR #139205)

Wed May 14 07:42:47 PDT 2025

================
@@ -1,28 +1,176 @@
-====
-MLGO
-====
+=============================================
+Machine Learning - Guided Optimization (MLGO)
+=============================================
 
 Introduction
 ============
 
-MLGO is a framework for integrating ML techniques systematically in LLVM. It is
-designed primarily to replace heuristics within LLVM with machine learned
-models. Currently there is upstream infrastructure for the following
-heuristics:
+MLGO refers to integrating ML techniques (primarily) to replace heuristics within
+LLVM with machine learned models.
+
+Currently the following heuristics feature such integration:
 
 * Inlining for size
 * Register allocation (LLVM greedy eviction heuristic) for performance
 
-This document is an outline of the tooling that composes MLGO.
+This document is an outline of the tooling and APIs facilitating MLGO.
+
+Note that tools for orchestrating ML training are not part of LLVM, as they are
+dependency-heavy - both on the ML infrastructure choice, as well as choices of
+distrubuted computing. For the training scenario, LLVM only contains facilities
+enabling it, such as corpus extraction, training data extraction, and evaluation
+of models during training.
+
+
+.. contents::
 
 Corpus Tooling
 ==============
 
 ..
     TODO(boomanaiden154): Write this section.
 
-Model Runner Interfaces
-=======================
+Interacting with ML models
+==========================
+
+We interact with ML models in 2 primary scenarios: one is to train such a model.
+The other, inference, is to use a model during compilation, to make optimization
+decisions.
+
+For a specific optimization problem - i.e. inlining, or regalloc eviction - we
+first separate correctness - preserving decisions from optimization decisions.
+For example, not inlining functions marked "no inline" is an example of the
+former. Same is not evicting an unevictable live range. An exmple of the latter
+is deciding to inline a function that will bloat the caller size, just because
+we have reason to believe that later, the effect will be some constant
+propagation that will actually reduce the size (or dynamic instruction count).
+
+ML models can be understood as functions. Their inputs are tensors - buffers of
+scalars. The output (in our case, singular) is a scalar. For example, for
+inlining, the inputs are properties of the caller, callee, and the callsite
+being analyzed for inlining. The output is a boolean.
+
+Inputs and outputs are named, have a scalar type (e.g. int32_t) and a shape
+(e.g. 3x4). These are the elements that we use to bind to a ML model.
+
+In both training and inference, we want to expose to ML (training algorithms or
+trained model, respectively) the features we want to make optimization
+decisions on. In that regard, the interface from the compiler side to the ML
+side is the same: pass features, and get a decision. It's essentially a function
+call, where the parameters and result are bound by name and are described by
+name, scalar type, and shape tuples.
+
+The main types in LLVM are:
+- ``MLModelRunner`` - an abstraction for the decision making mechanism
+- ``TensorSpec`` which describes a tensor.
----------------
kbeyls wrote:

Apologies for the late comment. I just noticed that this "bullet list" doesn't get formatted as a bullet list at https://llvm.org/docs/MLGO.html. Maybe this needs a newline before the start of the bullet point list?

https://github.com/llvm/llvm-project/pull/139205