[llvm] Adding IR2Vec as an analysis pass (PR #134004)

Wed May 14 23:51:59 PDT 2025

================
@@ -174,3 +174,147 @@ clang.
     TODO(mtrofin): 
         - logging, and the use in interactive mode.
         - discuss an example (like the inliner)
+
+IR2Vec Embeddings
+=================
+
+IR2Vec is a program embedding approach designed specifically for LLVM IR. It
+is implemented as a function analysis pass in LLVM. The IR2Vec embeddings
+capture syntactic, semantic, and structural properties of the IR through 
+learned representations. These representations are obtained as a JSON 
+vocabulary that maps the entities of the IR (opcodes, types, operands) to 
+n-dimensional floating point vectors (embeddings). 
+
+With IR2Vec, representation at different granularities of IR, such as
+instructions, functions, and basic blocks, can be obtained. Representations 
+of loops and regions can be derived from these representations, which can be
+useful in different scenarios. The representations can be useful for various
+downstream tasks, including ML-guided compiler optimizations.
+
+Currently, to use IR2Vec embeddings, the JSON vocabulary first needs to be read
+and used to obtain the vocabulary mapping. Then, use this mapping to
+derive the representations. In LLVM, this process is implemented using two
+independent passes: ``IR2VecVocabAnalysis`` and ``IR2VecAnalysis``. The former
+reads the JSON vocabulary and populates ``IR2VecVocabResult``, which is then used
+by ``IR2VecAnalysis``. 
+
+``IR2VecVocabAnalysis`` is immutable and is intended to
+be run once before ``IR2VecAnalysis`` is run. In the future, we plan
+to improve this requirement by automatically generating default the vocabulary mappings
+during build time, eliminating the need for a separate file read.
+
+IR2VecAnalysis Usage
+--------------------
+
+To use IR2Vec in an LLVM-based tool or pass, interaction with the analysis 
+results can be done through the following APIs:
+    
+1. **Accessing the Analysis Results:**
+
+   To access the IR2Vec embeddings, obtain the ``IR2VecAnalysis``
+   result from the Function Analysis Manager (FAM).
+
+   .. code-block:: c++
+
+      #include "llvm/Analysis/IR2VecAnalysis.h"
+
+      // ... other includes and code ...
+
+      llvm::FunctionAnalysisManager &FAM = ...; // The FAM instance
+      llvm::Function &F = ...; // The function to analyze
+      auto &IR2VecResult = FAM.getResult<llvm::IR2VecAnalysis>(F);
+
+2. **Checking for Valid Results:**
+
+   Ensure that the analysis result is valid before accessing the embeddings:
+
+   .. code-block:: c++
+
+      if (IR2VecResult.isValid()) {
+        // Proceed to access embeddings
+      }
+
+3. **Retrieving Embeddings:**
+
+   The ``IR2VecResult`` provides access to embeddings (currently) at three levels:
+
+   - **Instruction Embeddings:**
+
+     .. code-block:: c++
+
+        const auto &instVecMap = IR2VecResult.getInstVecMap();
----------------
svkeerthy wrote:

Fixed it

https://github.com/llvm/llvm-project/pull/134004