[llvm] Adding IR2Vec as an analysis pass (PR #134004)
S. VenkataKeerthy via llvm-commits
llvm-commits at lists.llvm.org
Wed May 14 23:51:59 PDT 2025
================
@@ -174,3 +174,147 @@ clang.
TODO(mtrofin):
- logging, and the use in interactive mode.
- discuss an example (like the inliner)
+
+IR2Vec Embeddings
+=================
+
+IR2Vec is a program embedding approach designed specifically for LLVM IR. It
+is implemented as a function analysis pass in LLVM. The IR2Vec embeddings
+capture syntactic, semantic, and structural properties of the IR through
+learned representations. These representations are obtained as a JSON
+vocabulary that maps the entities of the IR (opcodes, types, operands) to
+n-dimensional floating point vectors (embeddings).
+
+With IR2Vec, representation at different granularities of IR, such as
+instructions, functions, and basic blocks, can be obtained. Representations
+of loops and regions can be derived from these representations, which can be
+useful in different scenarios. The representations can be useful for various
+downstream tasks, including ML-guided compiler optimizations.
+
+Currently, to use IR2Vec embeddings, the JSON vocabulary first needs to be read
+and used to obtain the vocabulary mapping. Then, use this mapping to
+derive the representations. In LLVM, this process is implemented using two
+independent passes: ``IR2VecVocabAnalysis`` and ``IR2VecAnalysis``. The former
+reads the JSON vocabulary and populates ``IR2VecVocabResult``, which is then used
+by ``IR2VecAnalysis``.
+
+``IR2VecVocabAnalysis`` is immutable and is intended to
+be run once before ``IR2VecAnalysis`` is run. In the future, we plan
+to improve this requirement by automatically generating default the vocabulary mappings
+during build time, eliminating the need for a separate file read.
+
+IR2VecAnalysis Usage
+--------------------
+
+To use IR2Vec in an LLVM-based tool or pass, interaction with the analysis
+results can be done through the following APIs:
+
+1. **Accessing the Analysis Results:**
+
+ To access the IR2Vec embeddings, obtain the ``IR2VecAnalysis``
+ result from the Function Analysis Manager (FAM).
+
+ .. code-block:: c++
+
+ #include "llvm/Analysis/IR2VecAnalysis.h"
+
+ // ... other includes and code ...
+
+ llvm::FunctionAnalysisManager &FAM = ...; // The FAM instance
+ llvm::Function &F = ...; // The function to analyze
+ auto &IR2VecResult = FAM.getResult<llvm::IR2VecAnalysis>(F);
+
+2. **Checking for Valid Results:**
+
+ Ensure that the analysis result is valid before accessing the embeddings:
+
+ .. code-block:: c++
+
+ if (IR2VecResult.isValid()) {
+ // Proceed to access embeddings
+ }
+
+3. **Retrieving Embeddings:**
+
+ The ``IR2VecResult`` provides access to embeddings (currently) at three levels:
+
+ - **Instruction Embeddings:**
+
+ .. code-block:: c++
+
+ const auto &instVecMap = IR2VecResult.getInstVecMap();
----------------
svkeerthy wrote:
Fixed it
https://github.com/llvm/llvm-project/pull/134004
More information about the llvm-commits
mailing list