[llvm-branch-commits] [llvm] [IR2Vec] Refactor vocabulary to use section-based storage (PR #158376)
S. VenkataKeerthy via llvm-branch-commits
llvm-branch-commits at lists.llvm.org
Mon Sep 22 10:55:06 PDT 2025
================
@@ -261,55 +262,106 @@ void FlowAwareEmbedder::computeEmbeddings(const BasicBlock &BB) const {
BBVecMap[&BB] = BBVector;
}
+// ==----------------------------------------------------------------------===//
+// VocabStorage
+//===----------------------------------------------------------------------===//
+
+VocabStorage::VocabStorage(std::vector<std::vector<Embedding>> &&SectionData)
+ : Sections(std::move(SectionData)), TotalSize([&] {
+ assert(!Sections.empty() && "Vocabulary has no sections");
+ assert(!Sections[0].empty() && "First section of vocabulary is empty");
+ // Compute total size across all sections
+ size_t Size = 0;
+ for (const auto &Section : Sections)
+ Size += Section.size();
+ return Size;
+ }()),
+ Dimension([&] {
+ // Get dimension from the first embedding in the first section - all
+ // embeddings must have the same dimension
+ assert(!Sections.empty() && "Vocabulary has no sections");
+ assert(!Sections[0].empty() && "First section of vocabulary is empty");
+ return static_cast<unsigned>(Sections[0][0].size());
+ }()) {}
+
+const Embedding &VocabStorage::const_iterator::operator*() const {
+ assert(SectionId < Storage->Sections.size() && "Invalid section ID");
+ assert(LocalIndex < Storage->Sections[SectionId].size() &&
+ "Local index out of range");
+ return Storage->Sections[SectionId][LocalIndex];
+}
+
+VocabStorage::const_iterator &VocabStorage::const_iterator::operator++() {
+ ++LocalIndex;
+ // Check if we need to move to the next section
+ while (SectionId < Storage->getNumSections() &&
+ LocalIndex >= Storage->Sections[SectionId].size()) {
+ LocalIndex = 0;
----------------
svkeerthy wrote:
Seems like my changes are missing here. Makes sense to explicitly avoid empty sections. Made these changes and added assertions.
https://github.com/llvm/llvm-project/pull/158376
More information about the llvm-branch-commits
mailing list