[PATCH] D63713: WIP: DataExtractor error handling

Pavel Labath via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Jun 24 06:49:40 PDT 2019


labath marked 8 inline comments as done.
labath added a comment.

Besides prototyping the "DataStream" class, this also converts debug_loc and accel table (the two pieces of code I'm familiar with) parsers to use it in order to demonstrate how would this look in action.

I have also highlighted some of the open questions in inline comments.



================
Comment at: include/llvm/DebugInfo/DWARF/DWARFAcceleratorTable.h:409-413
+    AttributeEncoding extractAttributeEncoding(DataStream &DS);
 
-    Expected<std::vector<AttributeEncoding>>
-    extractAttributeEncodings(uint32_t *Offset);
+    std::vector<AttributeEncoding> extractAttributeEncodings(DataStream &DS);
 
+    Abbrev extractAbbrev(DataStream &DS);
----------------
This shows that if reaching the end of the stream is the only kind of error one may encounter, then he doesn't need an `Expected<T>` return type, as the error flag is implicitly returned through the stream.

That may be viewed as a good thing, or as a bad one... :)

Since these are just private methods, I would say that's a good thing here.


================
Comment at: include/llvm/Support/DataExtractor.h:434
+  explicit operator bool() const { return !Error; }
+  bool eof() const { return Offset == DE.getData().size(); }
+  void skip(uint32_t Value) {
----------------
This isn't consistent with iostreams (which detect eof only after one attempts to read past it), but it seems to me that this makes using the class much simpler.


================
Comment at: include/llvm/Support/DataExtractor.h:442
+
+  uint8_t getU8() { return wrap<uint8_t>(&DataExtractor::getU8); }
+  uint8_t *getU8(uint8_t *dst, uint32_t count) {
----------------
maybe `readUXX` would be better instead of `getUXX`?


================
Comment at: include/llvm/Support/DataExtractor.h:445
+    // If count == 0, Offset is validly not updated.
+    return count ? wrap<uint8_t *>(&DataExtractor::getU8, dst, count) : dst;
+  }
----------------
As can be seen here, checking whether the offset is updated (which is currently the only way of checking for errors) is a very tricky thing, since some functions can also "succeed" while not updating the offset.


================
Comment at: include/llvm/Support/DataExtractor.h:470-474
+    uint32_t SavedOffset = Offset;
+    Ret Result = (DE.*Fun)(&Offset, Vals...);
+    if (SavedOffset == Offset)
+      Error = true;
+    return Result;
----------------
To be consistent with iostreams we should stop attempting to read data once we encounter the first error. Doing that would add another branch to the code. Not doing that opens up the possibility for some weird behavior, where we first try to read a uint64_t and fail, but then try to read a uint8_t and succeed because the stream happened to have one more byte left. This could be mitigated by setting Offset to UINT32_MAX on failure.


================
Comment at: lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp:72
-
-  for (unsigned i = 0; i < NumAtoms; ++i) {
-    uint16_t AtomType = AccelSection.getU16(&Offset);
----------------
This code had a bug where it did not check the value of the NumAtoms field. So it could end up returning millions of empty/invalid Atoms while still claiming the accelerator table is valid.


================
Comment at: lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp:441-444
-  if (*Offset >= EntriesBase) {
-    return createStringError(errc::illegal_byte_sequence,
-                             "Incorrectly terminated abbreviation table.");
-  }
----------------
This check wasn't fully correct, because we could still cross the `EntriesBase` boundary after reading the two ULEB fields.


================
Comment at: lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp:485-488
+  DataExtractor AbbrevExtractor(
+      DS.getExtractor().getData().substr(AbbrevBase, Hdr.AbbrevTableSize),
+      DS.getExtractor().isLittleEndian(), DS.getExtractor().getAddressSize());
+  DataStream AbbrevStream(AbbrevExtractor, 0);
----------------
Some nice way of slicing the DataExtractor/DataStream would also be handy. That would also be useful when parsing DIEs from a DWARF unit (to make sure we don't cross into the neighbouring unit).


Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63713/new/

https://reviews.llvm.org/D63713





More information about the llvm-commits mailing list