[llvm] Adding IR2Vec as an analysis pass (PR #134004)
S. VenkataKeerthy via llvm-commits
llvm-commits at lists.llvm.org
Fri May 16 16:16:13 PDT 2025
================
@@ -0,0 +1,202 @@
+//===- IR2Vec.h - Implementation of IR2Vec ----------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM
+// Exceptions. See the LICENSE file for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+///
+/// \file
+/// This file defines the IR2Vec vocabulary analysis(IR2VecVocabAnalysis),
+/// the core ir2vec::Embedder interface for generating IR embeddings,
+/// and related utilities like the IR2VecPrinterPass.
+///
+/// Program Embeddings are typically or derived-from a learned
+/// representation of the program. Such embeddings are used to represent the
+/// programs as input to machine learning algorithms. IR2Vec represents the
+/// LLVM IR as embeddings.
+///
+/// The IR2Vec algorithm is described in the following paper:
+///
+/// IR2Vec: LLVM IR Based Scalable Program Embeddings, S. VenkataKeerthy,
+/// Rohit Aggarwal, Shalini Jain, Maunendra Sankar Desarkar, Ramakrishna
+/// Upadrasta, and Y. N. Srikant, ACM Transactions on Architecture and
+/// Code Optimization (TACO), 2020. https://doi.org/10.1145/3418463.
+/// https://arxiv.org/abs/1909.06228
+///
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_ANALYSIS_IR2VEC_H
+#define LLVM_ANALYSIS_IR2VEC_H
+
+#include "llvm/ADT/DenseMap.h"
+#include "llvm/IR/PassManager.h"
+#include "llvm/Support/ErrorOr.h"
+#include <map>
+
+namespace llvm {
+
+class Module;
+class BasicBlock;
+class Instruction;
+class Function;
+class Type;
+class Value;
+class raw_ostream;
+
+/// IR2Vec computes two kinds of embeddings: Symbolic and Flow-aware.
+/// Symbolic embeddings capture the "syntactic" and "statistical correlation"
+/// of the IR entities. Flow-aware embeddings build on top of symbolic
+/// embeddings and additionally capture the flow information in the IR.
+/// IR2VecKind is used to specify the type of embeddings to generate.
+/// Currently, only Symbolic embeddings are supported.
+enum class IR2VecKind { Symbolic };
+
+namespace ir2vec {
+using Embedding = std::vector<double>;
+using InstEmbeddingsMap = DenseMap<const Instruction *, Embedding>;
+using BBEmbeddingsMap = DenseMap<const BasicBlock *, Embedding>;
+// FIXME: Current the keys are strings. This can be changed to
+// use integers for cheaper lookups.
+using Vocab = std::map<std::string, Embedding>;
+
+/// Embedder provides the interface to generate embeddings (vector
+/// representations) for instructions, basic blocks, and functions. The vector
+/// representations are generated using IR2Vec algorithms.
+///
+/// The Embedder class is an abstract class and it is intended to be
+/// subclassed for different IR2Vec algorithms like Symbolic and Flow-aware.
+class Embedder {
+protected:
+ const Function &F;
+ const Vocab &Vocabulary;
+
+ /// Weights for different entities (like opcode, arguments, types)
+ /// in the IR instructions to generate the vector representation.
+ const float OpcWeight, TypeWeight, ArgWeight;
+
+ /// Dimension of the vector representation; captured from the input vocabulary
+ const unsigned Dimension;
+
+ // Utility maps - these are used to store the vector representations of
+ // instructions, basic blocks and functions.
+ Embedding FuncVector;
+ BBEmbeddingsMap BBVecMap;
+ InstEmbeddingsMap InstVecMap;
+
+ Embedder(const Function &F, const Vocab &Vocabulary, unsigned Dimension);
+
+ /// Lookup vocabulary for a given Key. If the key is not found, it returns a
+ /// zero vector.
+ Embedding lookupVocab(const std::string &Key);
+
+ /// Adds two vectors: Dst += Src
+ void addVectors(Embedding &Dst, const Embedding &Src);
+
+ /// Adds Src vector scaled by Factor to Dst vector: Dst += Src * Factor
+ void addScaledVector(Embedding &Dst, const Embedding &Src, float Factor);
+
+public:
+ virtual ~Embedder() = default;
+
+ /// Top level function to compute embeddings. Given a function, it
----------------
svkeerthy wrote:
Fixed it
https://github.com/llvm/llvm-project/pull/134004
More information about the llvm-commits
mailing list