[llvm] Adding IR2Vec as an analysis pass (PR #134004)

S. VenkataKeerthy via llvm-commits llvm-commits at lists.llvm.org
Wed Apr 9 14:22:01 PDT 2025


================
@@ -0,0 +1,131 @@
+//===- IR2VecAnalysis.h - IR2Vec Analysis Implementation -------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM
+// Exceptions. See the LICENSE file for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+///
+/// \file
+/// This file contains the declaration of IR2VecAnalysis that computes
+/// IR2Vec Embeddings of the program.
+///
+/// Program Embeddings are typically or derived-from a learned
+/// representation of the program. Such embeddings are used to represent the
+/// programs as input to machine learning algorithms. IR2Vec represents the
+/// LLVM IR as embeddings.
+///
+/// The IR2Vec algorithm is described in the following paper:
+///
+///   IR2Vec: LLVM IR Based Scalable Program Embeddings, S. VenkataKeerthy,
+///   Rohit Aggarwal, Shalini Jain, Maunendra Sankar Desarkar, Ramakrishna
+///   Upadrasta, and Y. N. Srikant, ACM Transactions on Architecture and
+///   Code Optimization (TACO), 2020. https://doi.org/10.1145/3418463.
+///   https://arxiv.org/abs/1909.06228
+///
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_ANALYSIS_IR2VECANALYSIS_H
+#define LLVM_ANALYSIS_IR2VECANALYSIS_H
+
+#include "llvm/ADT/MapVector.h"
+#include "llvm/IR/PassManager.h"
+#include <map>
+
+namespace llvm {
+
+class Module;
+class BasicBlock;
+class Instruction;
+class Function;
+
+namespace ir2vec {
+using Embedding = std::vector<double>;
+// FIXME: Current the keys are strings. This can be changed to
+// use integers for cheaper lookups.
+using Vocab = std::map<std::string, Embedding>;
+} // namespace ir2vec
+
+class IR2VecVocabResult;
+class IR2VecResult;
+
+/// This analysis provides the vocabulary for IR2Vec. The vocabulary provides a
+/// mapping between an entity of the IR (like opcode, type, argument, etc.) and
+/// its corresponding embedding.
+class IR2VecVocabAnalysis : public AnalysisInfoMixin<IR2VecVocabAnalysis> {
+  unsigned DIM = 0;
+  ir2vec::Vocab Vocabulary;
+  Error readVocabulary();
+
+public:
+  static AnalysisKey Key;
+  IR2VecVocabAnalysis() = default;
+  using Result = IR2VecVocabResult;
+  Result run(Module &M, ModuleAnalysisManager &MAM);
+};
+
+class IR2VecVocabResult {
+  ir2vec::Vocab Vocabulary;
+  bool Valid = false;
+  unsigned DIM = 0;
+
+public:
+  IR2VecVocabResult() = default;
+  IR2VecVocabResult(ir2vec::Vocab &&Vocabulary, unsigned Dim);
+
+  // Helper functions
+  bool isValid() const { return Valid; }
+  const ir2vec::Vocab &getVocabulary() const;
+  unsigned getDimension() const { return DIM; }
+  bool invalidate(Module &M, const PreservedAnalyses &PA,
+                  ModuleAnalysisManager::Invalidator &Inv);
+};
+
+class IR2VecResult {
+  SmallMapVector<const Instruction *, ir2vec::Embedding, 128> InstVecMap;
+  SmallMapVector<const BasicBlock *, ir2vec::Embedding, 16> BBVecMap;
+  ir2vec::Embedding FuncVector;
+  unsigned DIM = 0;
+  bool Valid = false;
+
+public:
+  IR2VecResult() = default;
+  IR2VecResult(
+      SmallMapVector<const Instruction *, ir2vec::Embedding, 128> &&InstMap,
+      SmallMapVector<const BasicBlock *, ir2vec::Embedding, 16> &&BBMap,
+      ir2vec::Embedding &&FuncVector, unsigned Dim);
+  bool isValid() const { return Valid; }
+
+  const SmallMapVector<const Instruction *, ir2vec::Embedding, 128> &
----------------
svkeerthy wrote:

Exposing these might have value even without delta patching. For instance, one may get loop level representations using these APIs..

https://github.com/llvm/llvm-project/pull/134004


More information about the llvm-commits mailing list