[llvm-branch-commits] [llvm] [IR2Vec] Refactor vocabulary to use section-based storage (PR #158376)

S. VenkataKeerthy via llvm-branch-commits llvm-branch-commits at lists.llvm.org
Mon Sep 22 10:55:06 PDT 2025


================
@@ -261,55 +262,106 @@ void FlowAwareEmbedder::computeEmbeddings(const BasicBlock &BB) const {
   BBVecMap[&BB] = BBVector;
 }
 
+// ==----------------------------------------------------------------------===//
+// VocabStorage
+//===----------------------------------------------------------------------===//
+
+VocabStorage::VocabStorage(std::vector<std::vector<Embedding>> &&SectionData)
+    : Sections(std::move(SectionData)), TotalSize([&] {
+        assert(!Sections.empty() && "Vocabulary has no sections");
+        assert(!Sections[0].empty() && "First section of vocabulary is empty");
+        // Compute total size across all sections
+        size_t Size = 0;
+        for (const auto &Section : Sections)
+          Size += Section.size();
+        return Size;
+      }()),
+      Dimension([&] {
+        // Get dimension from the first embedding in the first section - all
+        // embeddings must have the same dimension
+        assert(!Sections.empty() && "Vocabulary has no sections");
+        assert(!Sections[0].empty() && "First section of vocabulary is empty");
+        return static_cast<unsigned>(Sections[0][0].size());
+      }()) {}
+
+const Embedding &VocabStorage::const_iterator::operator*() const {
+  assert(SectionId < Storage->Sections.size() && "Invalid section ID");
+  assert(LocalIndex < Storage->Sections[SectionId].size() &&
+         "Local index out of range");
+  return Storage->Sections[SectionId][LocalIndex];
+}
+
+VocabStorage::const_iterator &VocabStorage::const_iterator::operator++() {
+  ++LocalIndex;
+  // Check if we need to move to the next section
+  while (SectionId < Storage->getNumSections() &&
+         LocalIndex >= Storage->Sections[SectionId].size()) {
+    LocalIndex = 0;
----------------
svkeerthy wrote:

Seems like my changes are missing here. Makes sense to explicitly avoid empty sections. Made these changes and added assertions. 

https://github.com/llvm/llvm-project/pull/158376


More information about the llvm-branch-commits mailing list