[llvm] r307970 - [PDB] Fix quadratic behavior when writing a BinaryItemStream

Reid Kleckner via llvm-commits llvm-commits at lists.llvm.org
Thu Jul 13 15:02:23 PDT 2017


Author: rnk
Date: Thu Jul 13 15:02:23 2017
New Revision: 307970

URL: http://llvm.org/viewvc/llvm-project?rev=307970&view=rev
Log:
[PDB] Fix quadratic behavior when writing a BinaryItemStream

Binary streams are an abstraction over a discontiguous buffer. To write
a discontiguous buffer, we want to copy each contiguous chunk
individually. Currently BinaryStreams do not expose a way to iterate
over the chunks, so the code repeatedly calls
readLongestContiguousChunk() with an increasing offset. In order to
lookup the chunk by offset, we would iterate the items list to figure
out which chunk the offset is within. This is obviously O(n^2).

Instead, pre-compute a table of offsets and do a binary search to figure
out which chunk to use. This is still only an O(n^2) to O(n log n)
improvement, but it's a very local fix that seems worth doing.

This improves self-linking lld.exe with PDBs from 90s to 10s.

Modified:
    llvm/trunk/include/llvm/Support/BinaryItemStream.h

Modified: llvm/trunk/include/llvm/Support/BinaryItemStream.h
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Support/BinaryItemStream.h?rev=307970&r1=307969&r2=307970&view=diff
==============================================================================
--- llvm/trunk/include/llvm/Support/BinaryItemStream.h (original)
+++ llvm/trunk/include/llvm/Support/BinaryItemStream.h Thu Jul 13 15:02:23 2017
@@ -62,32 +62,45 @@ public:
     return Error::success();
   }
 
-  void setItems(ArrayRef<T> ItemArray) { Items = ItemArray; }
+  void setItems(ArrayRef<T> ItemArray) {
+    Items = ItemArray;
+    computeItemOffsets();
+  }
 
   uint32_t getLength() override {
-    uint32_t Size = 0;
-    for (const auto &Item : Items)
-      Size += Traits::length(Item);
-    return Size;
+    return ItemEndOffsets.empty() ? 0 : ItemEndOffsets.back();
   }
 
 private:
-  Expected<uint32_t> translateOffsetIndex(uint32_t Offset) const {
+  void computeItemOffsets() {
+    ItemEndOffsets.clear();
+    ItemEndOffsets.reserve(Items.size());
     uint32_t CurrentOffset = 0;
-    uint32_t CurrentIndex = 0;
     for (const auto &Item : Items) {
-      if (CurrentOffset >= Offset)
-        break;
-      CurrentOffset += Traits::length(Item);
-      ++CurrentIndex;
+      uint32_t Len = Traits::length(Item);
+      assert(Len > 0 && "no empty items");
+      CurrentOffset += Len;
+      ItemEndOffsets.push_back(CurrentOffset);
     }
-    if (CurrentOffset != Offset)
+  }
+
+  Expected<uint32_t> translateOffsetIndex(uint32_t Offset) const {
+    // Make sure the offset is somewhere in our items array.
+    if (Offset >= getLength())
       return make_error<BinaryStreamError>(stream_error_code::stream_too_short);
-    return CurrentIndex;
+    auto Iter = std::lower_bound(
+        ItemEndOffsets.begin(), ItemEndOffsets.end(), Offset,
+        [](const uint32_t &A, const uint32_t &B) { return A <= B; });
+    size_t Idx = std::distance(ItemEndOffsets.begin(), Iter);
+    assert(Idx < Items.size() && "binary search for offset failed");
+    return Idx;
   }
 
   llvm::support::endianness Endian;
   ArrayRef<T> Items;
+
+  // Sorted vector of offsets to accelerate lookup.
+  std::vector<uint32_t> ItemEndOffsets;
 };
 
 } // end namespace llvm




More information about the llvm-commits mailing list