[llvm] Adding IR2Vec as an analysis pass (PR #134004)
Mircea Trofin via llvm-commits
llvm-commits at lists.llvm.org
Mon May 12 11:17:43 PDT 2025
================
@@ -174,3 +174,151 @@ clang.
TODO(mtrofin):
- logging, and the use in interactive mode.
- discuss an example (like the inliner)
+
+IR2Vec Embeddings
+=================
+
+IR2Vec is a program embedding approach designed specifically for LLVM IR. It
+is implemented as a function analysis pass in LLVM. The IR2Vec embeddings
+capture syntactic, semantic, and structural properties of the IR through
+learned representations. These representations are obtained as a JSON
+vocabulary that maps the entities of the IR (opcodes, types, operands) to
+n-dimensional floating point vectors (embeddings).
+
+With IR2Vec, representation at different granularities of IR, such as
+instructions, functions, and basic blocks, can be obtained. Representations
+of loops and regions can be derived from these representations, which can be
+useful in different scenarios. The representations can be useful for various
+downstream tasks, including ML-guided compiler optimizations.
+
+Currently, to use IR2Vec embeddings, the JSON vocabulary first needs to be read
+and used to obtain the vocabulary mapping. Then, use this mapping to
+derive the representations. In LLVM, this process is implemented using two
+independent passes: ``IR2VecVocabAnalysis`` and ``IR2VecAnalysis``. The former
+reads the JSON vocabulary and populates ``IR2VecVocabResult``, which is then used
+by ``IR2VecAnalysis``.
+
+It is recommended to run ``IR2VecVocabAnalysis`` once, as the
----------------
mtrofin wrote:
nit: If you say IR2VecVocabAnalysis is immutable, that should take care of it, I think
https://github.com/llvm/llvm-project/pull/134004
More information about the llvm-commits
mailing list