[llvm] [IR2Vec] Initial infrastructure for MIR2Vec (PR #161463)

S. VenkataKeerthy via llvm-commits llvm-commits at lists.llvm.org
Mon Oct 6 14:42:42 PDT 2025


================
@@ -0,0 +1,325 @@
+//===- MIR2Vec.cpp - Implementation of MIR2Vec ---------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM
+// Exceptions. See the LICENSE file for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+///
+/// \file
+/// This file implements the MIR2Vec algorithm for Machine IR embeddings.
+///
+//===----------------------------------------------------------------------===//
+
+#include "llvm/CodeGen/MIR2Vec.h"
+#include "llvm/ADT/Statistic.h"
+#include "llvm/CodeGen/TargetInstrInfo.h"
+#include "llvm/IR/Module.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Pass.h"
+#include "llvm/Support/Errc.h"
+#include "llvm/Support/MemoryBuffer.h"
+#include "llvm/Support/Regex.h"
+
+using namespace llvm;
+using namespace mir2vec;
+
+#define DEBUG_TYPE "mir2vec"
+
+STATISTIC(MIRVocabMissCounter,
+          "Number of lookups to MIR entities not present in the vocabulary");
+
+namespace llvm {
+namespace mir2vec {
+cl::OptionCategory MIR2VecCategory("MIR2Vec Options");
+
+// FIXME: Use a default vocab when not specified
+static cl::opt<std::string>
+    VocabFile("mir2vec-vocab-path", cl::Optional,
+              cl::desc("Path to the vocabulary file for MIR2Vec"), cl::init(""),
+              cl::cat(MIR2VecCategory));
+cl::opt<float> OpcWeight("mir2vec-opc-weight", cl::Optional, cl::init(1.0),
+                         cl::desc("Weight for machine opcode embeddings"),
+                         cl::cat(MIR2VecCategory));
+} // namespace mir2vec
+} // namespace llvm
+
+//===----------------------------------------------------------------------===//
+// Vocabulary Implementation
+//===----------------------------------------------------------------------===//
+
+Vocabulary::Vocabulary(VocabMap &&OpcodeEntries, const TargetInstrInfo *TII) {
+  // Early return for invalid inputs - creates empty/invalid vocabulary
----------------
svkeerthy wrote:

Right now we follow the existing pattern in IR2Vec. We do use static `create` method for embedder. But not for the vocab. Refactor vocab in a similar way is a good idea. Will do as a separate patch in both IR2Vec and MIR2Vec. 

https://github.com/llvm/llvm-project/pull/161463


More information about the llvm-commits mailing list