[llvm] Refactoring llvm-ir2vec.cpp for better separation of concerns in the Tooling classes (PR #170078)
Nishant Sachdeva via llvm-commits
llvm-commits at lists.llvm.org
Wed Dec 10 00:52:57 PST 2025
================
@@ -366,40 +337,62 @@ class IR2VecTool {
return;
}
- for (const Function &F : M)
+ for (const Function &F : M.getFunctionDefs())
generateEmbeddings(F, OS);
}
/// Generate embeddings for a single function
void generateEmbeddings(const Function &F, raw_ostream &OS) const {
- assert(Vocab && Vocab->isValid() && "Vocabulary not initialized");
+ if (!Vocab || !Vocab->isValid()) {
+ WithColor::error(errs(), ToolName)
+ << "Vocabulary is not valid. IR2VecTool not initialized.\n";
+ return;
+ }
+
if (F.isDeclaration()) {
OS << "Function " << F.getName() << " is a declaration, skipping.\n";
return;
}
+ // Create embedder once for the function
+ auto Emb = Embedder::create(IR2VecEmbeddingKind, F, *Vocab);
+ if (!Emb) {
+ WithColor::error(errs(), ToolName)
+ << "Failed to create embedder for " << F.getName() << "\n";
+ return;
+ }
+
OS << "Function: " << F.getName() << "\n";
switch (Level) {
case EmbeddingLevel::FunctionLevel:
- getFunctionEmbedding(F).print(OS);
+ getFunctionEmbedding(*Emb).print(OS);
----------------
nishant-sachdeva wrote:
Yes.
I think it's better to have these getter functions. Without these, we'll have to duplicate all this code in the python bindings module to fetch the data structures from the IR2Vec API.
These getters will have to be refined further to accommodate function names, etc , when we get to the python bindings, but imo, its better to put the skeleton in place now itself
https://github.com/llvm/llvm-project/pull/170078
More information about the llvm-commits
mailing list