[llvm] Adding IR2Vec as an analysis pass (PR #134004)
S. VenkataKeerthy via llvm-commits
llvm-commits at lists.llvm.org
Wed Apr 9 14:22:01 PDT 2025
================
@@ -0,0 +1,131 @@
+//===- IR2VecAnalysis.h - IR2Vec Analysis Implementation -------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM
+// Exceptions. See the LICENSE file for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+///
+/// \file
+/// This file contains the declaration of IR2VecAnalysis that computes
+/// IR2Vec Embeddings of the program.
+///
+/// Program Embeddings are typically or derived-from a learned
+/// representation of the program. Such embeddings are used to represent the
+/// programs as input to machine learning algorithms. IR2Vec represents the
+/// LLVM IR as embeddings.
+///
+/// The IR2Vec algorithm is described in the following paper:
+///
+/// IR2Vec: LLVM IR Based Scalable Program Embeddings, S. VenkataKeerthy,
+/// Rohit Aggarwal, Shalini Jain, Maunendra Sankar Desarkar, Ramakrishna
+/// Upadrasta, and Y. N. Srikant, ACM Transactions on Architecture and
+/// Code Optimization (TACO), 2020. https://doi.org/10.1145/3418463.
+/// https://arxiv.org/abs/1909.06228
+///
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_ANALYSIS_IR2VECANALYSIS_H
+#define LLVM_ANALYSIS_IR2VECANALYSIS_H
+
+#include "llvm/ADT/MapVector.h"
+#include "llvm/IR/PassManager.h"
+#include <map>
+
+namespace llvm {
+
+class Module;
+class BasicBlock;
+class Instruction;
+class Function;
+
+namespace ir2vec {
+using Embedding = std::vector<double>;
+// FIXME: Current the keys are strings. This can be changed to
+// use integers for cheaper lookups.
+using Vocab = std::map<std::string, Embedding>;
+} // namespace ir2vec
+
+class IR2VecVocabResult;
+class IR2VecResult;
+
+/// This analysis provides the vocabulary for IR2Vec. The vocabulary provides a
+/// mapping between an entity of the IR (like opcode, type, argument, etc.) and
+/// its corresponding embedding.
+class IR2VecVocabAnalysis : public AnalysisInfoMixin<IR2VecVocabAnalysis> {
+ unsigned DIM = 0;
+ ir2vec::Vocab Vocabulary;
+ Error readVocabulary();
+
+public:
+ static AnalysisKey Key;
+ IR2VecVocabAnalysis() = default;
+ using Result = IR2VecVocabResult;
+ Result run(Module &M, ModuleAnalysisManager &MAM);
+};
+
+class IR2VecVocabResult {
+ ir2vec::Vocab Vocabulary;
+ bool Valid = false;
+ unsigned DIM = 0;
+
+public:
+ IR2VecVocabResult() = default;
+ IR2VecVocabResult(ir2vec::Vocab &&Vocabulary, unsigned Dim);
+
+ // Helper functions
+ bool isValid() const { return Valid; }
+ const ir2vec::Vocab &getVocabulary() const;
+ unsigned getDimension() const { return DIM; }
+ bool invalidate(Module &M, const PreservedAnalyses &PA,
+ ModuleAnalysisManager::Invalidator &Inv);
+};
+
+class IR2VecResult {
+ SmallMapVector<const Instruction *, ir2vec::Embedding, 128> InstVecMap;
+ SmallMapVector<const BasicBlock *, ir2vec::Embedding, 16> BBVecMap;
+ ir2vec::Embedding FuncVector;
+ unsigned DIM = 0;
+ bool Valid = false;
+
+public:
+ IR2VecResult() = default;
+ IR2VecResult(
+ SmallMapVector<const Instruction *, ir2vec::Embedding, 128> &&InstMap,
+ SmallMapVector<const BasicBlock *, ir2vec::Embedding, 16> &&BBMap,
+ ir2vec::Embedding &&FuncVector, unsigned Dim);
+ bool isValid() const { return Valid; }
+
+ const SmallMapVector<const Instruction *, ir2vec::Embedding, 128> &
----------------
svkeerthy wrote:
Exposing these might have value even without delta patching. For instance, one may get loop level representations using these APIs..
https://github.com/llvm/llvm-project/pull/134004
More information about the llvm-commits
mailing list