[llvm] 0a79809 - [AppleAccelTable][NFC] Refactor equal_range code

Thu Jun 8 07:27:17 PDT 2023

Author: Felipe de Azevedo Piovezan
Date: 2023-06-08T10:26:58-04:00
New Revision: 0a7980988e4bdb7c97adf8cafad2e67a59b349d7

URL: https://github.com/llvm/llvm-project/commit/0a7980988e4bdb7c97adf8cafad2e67a59b349d7
DIFF: https://github.com/llvm/llvm-project/commit/0a7980988e4bdb7c97adf8cafad2e67a59b349d7.diff

LOG: [AppleAccelTable][NFC] Refactor equal_range code

The current implementation of AppleAcceleratorTable::equal_range has a couple of
drawbacks:

1. Assumptions about how the hash table is structured are hard-coded throughout
the code. Unless you are familiar with the data layout of the table, it becomes
tricky to follow the code.
2. We currently load strings from the string table even for hashes that don't
match our current search hash. This is wasteful.
3. There is no error checking in most DataExtractor calls that can fail.

This patch cleans up (1) by making helper methods that hide the details of the
data layout from the algorithms relying on them. This should also help us in
future patches, where we want to expand the interface to allow iterating over
_all_ entries in the table, and potentially clean up the existing Iterator
class.

The changes above also fix (2), as the problem "just vanishes" when you have a
function called "idxOfHashInBucket(SearchHash)".

The changes above also fix (3), as having individual functions allow us to
expose the points in which reading data can fail. This is particularly important
as we would like to share this implementation with LLDB, which requires robust
error handling.

The changes above are also a step towards addressing a comment left by the
original author:
```
  /// TODO: Generalize the rest of the AppleAcceleratorTable interface and move it
  /// to this class.
```
I suspect a lot of these helper functions created also apply to DWARF 5's
accelerator table, so they could be moved to the base class.

The changes above also expose a bug in this implementation: the previous
algorithm looks at _one_ string inside the bucket, but never bothers checking
for collisions. When the main search loop is written as it is with this patch,
the problem becomes evident. We do not fix the issue in this patch, as it is
intended to be NFC.

Differential Revision: https://reviews.llvm.org/D151989

Added: 
    

Modified: 
    llvm/include/llvm/DebugInfo/DWARF/DWARFAcceleratorTable.h
    llvm/lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp

Removed: 
    


################################################################################
diff  --git a/llvm/include/llvm/DebugInfo/DWARF/DWARFAcceleratorTable.h b/llvm/include/llvm/DebugInfo/DWARF/DWARFAcceleratorTable.h
index 77d33b10abe5a..cc54e6d93cc4c 100644

--- a/llvm/include/llvm/DebugInfo/DWARF/DWARFAcceleratorTable.h
+++ b/llvm/include/llvm/DebugInfo/DWARF/DWARFAcceleratorTable.h
@@ -112,6 +112,78 @@ class AppleAcceleratorTable : public DWARFAcceleratorTable {
   bool dumpName(ScopedPrinter &W, SmallVectorImpl<DWARFFormValue> &AtomForms,
                 uint64_t *DataOffset) const;
 
+  /// Reads an uint32_t from the accelerator table at Offset, which is
+  /// incremented by the number of bytes read.
+  std::optional<uint32_t> readU32FromAccel(uint64_t &Offset,
+                                           bool UseRelocation = false) const;
+
+  /// Reads a StringRef from the string table at Offset.
+  std::optional<StringRef>
+  readStringFromStrSection(uint64_t StringSectionOffset) const;
+
+  /// Return the offset into the section where the Buckets begin.
+  uint64_t getBucketBase() const { return sizeof(Hdr) + Hdr.HeaderDataLength; }
+
+  /// Return the offset into the section where the I-th bucket is.
+  uint64_t getIthBucketBase(uint32_t I) const {
+    return getBucketBase() + I * 4;
+  }
+
+  /// Return the offset into the section where the hash list begins.
+  uint64_t getHashBase() const { return getBucketBase() + getNumBuckets() * 4; }
+
+  /// Return the offset into the section where the I-th hash is.
+  uint64_t getIthHashBase(uint32_t I) const { return getHashBase() + I * 4; }
+
+  /// Return the offset into the section where the offset list begins.
+  uint64_t getOffsetBase() const { return getHashBase() + getNumHashes() * 4; }
+
+  /// Return the offset into the section where the I-th offset is.
+  uint64_t getIthOffsetBase(uint32_t I) const {
+    return getOffsetBase() + I * 4;
+  }
+
+  /// Returns the index of the bucket where a hypothetical Hash would be.
+  uint32_t hashToBucketIdx(uint32_t Hash) const {
+    return Hash % getNumBuckets();
+  }
+
+  /// Returns true iff a hypothetical Hash would be assigned to the BucketIdx-th
+  /// bucket.
+  bool wouldHashBeInBucket(uint32_t Hash, uint32_t BucketIdx) const {
+    return hashToBucketIdx(Hash) == BucketIdx;
+  }
+
+  /// Reads the contents of the I-th bucket, that is, the index in the hash list
+  /// where the hashes corresponding to this bucket begin.
+  std::optional<uint32_t> readIthBucket(uint32_t I) const {
+    uint64_t Offset = getIthBucketBase(I);
+    return readU32FromAccel(Offset);
+  }
+
+  /// Reads the I-th hash in the hash list.
+  std::optional<uint32_t> readIthHash(uint32_t I) const {
+    uint64_t Offset = getIthHashBase(I);
+    return readU32FromAccel(Offset);
+  }
+
+  /// Reads the I-th offset in the offset list.
+  std::optional<uint32_t> readIthOffset(uint32_t I) const {
+    uint64_t Offset = getIthOffsetBase(I);
+    return readU32FromAccel(Offset);
+  }
+
+  /// Reads a string offset from the accelerator table at Offset, which is
+  /// incremented by the number of bytes read.
+  std::optional<uint32_t> readStringOffsetAt(uint64_t &Offset) const {
+    return readU32FromAccel(Offset, /*UseRelocation*/ true);
+  }
+
+  /// Scans through all Hashes in the BucketIdx-th bucket, attempting to find
+  /// HashToFind. If it is found, its index in the list of hashes is returned.
+  std::optional<uint32_t> idxOfHashInBucket(uint32_t HashToFind,
+                                            uint32_t BucketIdx) const;
+
 public:
   /// Apple-specific implementation of an Accelerator Entry.
   class Entry final : public DWARFAcceleratorTable::Entry {
@@ -183,10 +255,10 @@ class AppleAcceleratorTable : public DWARFAcceleratorTable {
       : DWARFAcceleratorTable(AccelSection, StringSection) {}
 
   Error extract() override;
-  uint32_t getNumBuckets();
-  uint32_t getNumHashes();
-  uint32_t getSizeHdr();
-  uint32_t getHeaderDataLength();
+  uint32_t getNumBuckets() const;
+  uint32_t getNumHashes() const;
+  uint32_t getSizeHdr() const;
+  uint32_t getHeaderDataLength() const;
 
   /// Return the Atom description, which can be used to interpret the raw values
   /// of the Accelerator Entries in this table.

diff  --git a/llvm/lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp b/llvm/lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp
index 889d3f0915b04..5b6e21b091231 100644
--- a/llvm/lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp
+++ b/llvm/lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp
@@ -57,10 +57,7 @@ Error AppleAcceleratorTable::extract() {
 
   // Check that we can read all the hashes and offsets from the
   // section (see SourceLevelDebugging.rst for the structure of the index).
-  // We need to substract one because we're checking for an *offset* which is
-  // equal to the size for an empty table and hence pointer after the section.
-  if (!AccelSection.isValidOffset(sizeof(Hdr) + Hdr.HeaderDataLength +
-                                  Hdr.BucketCount * 4 + Hdr.HashCount * 8 - 1))
+  if (!AccelSection.isValidOffset(getIthBucketBase(Hdr.BucketCount - 1)))
     return createStringError(
         errc::illegal_byte_sequence,
         "Section too small: cannot read buckets and hashes.");
@@ -78,10 +75,12 @@ Error AppleAcceleratorTable::extract() {
   return Error::success();
 }
 
-uint32_t AppleAcceleratorTable::getNumBuckets() { return Hdr.BucketCount; }
-uint32_t AppleAcceleratorTable::getNumHashes() { return Hdr.HashCount; }
-uint32_t AppleAcceleratorTable::getSizeHdr() { return sizeof(Hdr); }
-uint32_t AppleAcceleratorTable::getHeaderDataLength() {
+uint32_t AppleAcceleratorTable::getNumBuckets() const {
+  return Hdr.BucketCount;
+}
+uint32_t AppleAcceleratorTable::getNumHashes() const { return Hdr.HashCount; }
+uint32_t AppleAcceleratorTable::getSizeHdr() const { return sizeof(Hdr); }
+uint32_t AppleAcceleratorTable::getHeaderDataLength() const {
   return Hdr.HeaderDataLength;
 }
 
@@ -327,39 +326,82 @@ void AppleAcceleratorTable::ValueIterator::Next() {
 
 iterator_range<AppleAcceleratorTable::ValueIterator>
 AppleAcceleratorTable::equal_range(StringRef Key) const {
+  const auto EmptyRange = make_range(ValueIterator(), ValueIterator());
   if (!IsValid)
-    return make_range(ValueIterator(), ValueIterator());
+    return EmptyRange;
 
   // Find the bucket.
-  unsigned HashValue = djbHash(Key);
-  unsigned Bucket = HashValue % Hdr.BucketCount;
-  uint64_t BucketBase = sizeof(Hdr) + Hdr.HeaderDataLength;
-  uint64_t HashesBase = BucketBase + Hdr.BucketCount * 4;
-  uint64_t OffsetsBase = HashesBase + Hdr.HashCount * 4;
-
-  uint64_t BucketOffset = BucketBase + Bucket * 4;
-  unsigned Index = AccelSection.getU32(&BucketOffset);
-
-  // Search through all hashes in the bucket.
-  for (unsigned HashIdx = Index; HashIdx < Hdr.HashCount; ++HashIdx) {
-    uint64_t HashOffset = HashesBase + HashIdx * 4;
-    uint64_t OffsetsOffset = OffsetsBase + HashIdx * 4;
-    uint32_t Hash = AccelSection.getU32(&HashOffset);
-
-    if (Hash % Hdr.BucketCount != Bucket)
-      // We are already in the next bucket.
-      break;
+  uint32_t SearchHash = djbHash(Key);
+  uint32_t BucketIdx = hashToBucketIdx(SearchHash);
+  std::optional<uint32_t> HashIdx = idxOfHashInBucket(SearchHash, BucketIdx);
+  if (!HashIdx)
+    return EmptyRange;
+
+  std::optional<uint64_t> MaybeDataOffset = readIthOffset(*HashIdx);
+  if (!MaybeDataOffset)
+    return EmptyRange;
+
+  uint64_t DataOffset = *MaybeDataOffset;
+  if (DataOffset >= AccelSection.size())
+    return EmptyRange;
+
+  std::optional<uint32_t> StrOffset = readStringOffsetAt(DataOffset);
+
+  // Invalid input or no more strings in this hash.
+  if (!StrOffset || *StrOffset == 0)
+    return EmptyRange;
+
+  std::optional<StringRef> MaybeStr = readStringFromStrSection(*StrOffset);
+  if (!MaybeStr)
+    return EmptyRange;
+  if (Key == *MaybeStr)
+    return make_range({*this, DataOffset}, ValueIterator());
+
+  // FIXME: this shouldn't return, we haven't checked all the colliding strings
+  // in the bucket!
+  return EmptyRange;
+}
+
+std::optional<uint32_t>
+AppleAcceleratorTable::idxOfHashInBucket(uint32_t HashToFind,
+                                         uint32_t BucketIdx) const {
+  std::optional<uint32_t> HashStartIdx = readIthBucket(BucketIdx);
+  if (!HashStartIdx)
+    return std::nullopt;
 
-    uint64_t DataOffset = AccelSection.getU32(&OffsetsOffset);
-    uint64_t StringOffset = AccelSection.getRelocatedValue(4, &DataOffset);
-    if (!StringOffset)
+  for (uint32_t HashIdx = *HashStartIdx; HashIdx < getNumHashes(); HashIdx++) {
+    std::optional<uint32_t> MaybeHash = readIthHash(HashIdx);
+    if (!MaybeHash || !wouldHashBeInBucket(*MaybeHash, BucketIdx))
       break;
+    if (*MaybeHash == HashToFind)
+      return HashIdx;
+  }
+  return std::nullopt;
+}
 
-    // Finally, compare the key.
-    if (Key == StringSection.getCStr(&StringOffset))
-      return make_range({*this, DataOffset}, ValueIterator());
+std::optional<StringRef> AppleAcceleratorTable::readStringFromStrSection(
+    uint64_t StringSectionOffset) const {
+  Error E = Error::success();
+  StringRef Str = StringSection.getCStrRef(&StringSectionOffset, &E);
+  if (E) {
+    consumeError(std::move(E));
+    return std::nullopt;
+  }
+  return Str;
+}
+
+std::optional<uint32_t>
+AppleAcceleratorTable::readU32FromAccel(uint64_t &Offset,
+                                        bool UseRelocation) const {
+  Error E = Error::success();
+  uint32_t Data = UseRelocation
+                      ? AccelSection.getRelocatedValue(4, &Offset, nullptr, &E)
+                      : AccelSection.getU32(&Offset, &E);
+  if (E) {
+    consumeError(std::move(E));
+    return std::nullopt;
   }
-  return make_range(ValueIterator(), ValueIterator());
+  return Data;
 }
 
 void DWARFDebugNames::Header::dump(ScopedPrinter &W) const {