[llvm] Refactoring llvm-ir2vec.cpp for better separation of concerns in the Tooling classes (PR #170078)

Nishant Sachdeva via llvm-commits llvm-commits at lists.llvm.org
Wed Dec 10 00:52:57 PST 2025


================
@@ -366,40 +337,62 @@ class IR2VecTool {
       return;
     }
 
-    for (const Function &F : M)
+    for (const Function &F : M.getFunctionDefs())
       generateEmbeddings(F, OS);
   }
 
   /// Generate embeddings for a single function
   void generateEmbeddings(const Function &F, raw_ostream &OS) const {
-    assert(Vocab && Vocab->isValid() && "Vocabulary not initialized");
+    if (!Vocab || !Vocab->isValid()) {
+      WithColor::error(errs(), ToolName)
+          << "Vocabulary is not valid. IR2VecTool not initialized.\n";
+      return;
+    }
+
     if (F.isDeclaration()) {
       OS << "Function " << F.getName() << " is a declaration, skipping.\n";
       return;
     }
 
+    // Create embedder once for the function
+    auto Emb = Embedder::create(IR2VecEmbeddingKind, F, *Vocab);
+    if (!Emb) {
+      WithColor::error(errs(), ToolName)
+          << "Failed to create embedder for " << F.getName() << "\n";
+      return;
+    }
+
     OS << "Function: " << F.getName() << "\n";
 
     switch (Level) {
     case EmbeddingLevel::FunctionLevel:
-      getFunctionEmbedding(F).print(OS);
+      getFunctionEmbedding(*Emb).print(OS);
----------------
nishant-sachdeva wrote:

Yes.

I think it's better to have these getter functions. Without these, we'll have to duplicate all this code in the python bindings module to fetch the data structures from the IR2Vec API. 

These getters will have to be refined further to accommodate function names, etc , when we get to the python bindings, but imo, its better to put the skeleton in place now itself

https://github.com/llvm/llvm-project/pull/170078


More information about the llvm-commits mailing list