[PATCH] D63713: WIP: DataExtractor error handling

Tue Jun 25 06:49:29 PDT 2019

labath added a comment.

In D63713#1556126 <https://reviews.llvm.org/D63713#1556126>, @dblaikie wrote:

> I'm kind of thinking a change to the cursor as Greg suggested might be better than wrapping the stream further - and probably using llvm::Error.
>
> & yeah, I think using Error and saying "parsing failures are problems that must be handled" rather than allowing for accidentally ignoring failures,
>
> Amounting to something like this:
>
>   Cursor C;
>   DataExtractor D = ...;
>   D.getU32(C);
>   D.getU64(C);
>   ...
>   if (Error E = C.getError())
>     ...;
>   
>
> That way early exits (if your record type has optional fields, etc) you can't accidentally forget your error checking (because if there's an Error inside the Cursor it'd have to be checked otherwise it'll assert that it's unchecked, etc)

Yeah, using a cursor API seems fine. I've updated the patch to use something like you desribed above. Let me know what you think.

> But yeah, that's a huge amount of refactoring to add all the error handling into existing callers.

I think this thing can be done incrementally. The first step could be to add a `Error *Err = nullptr` argument to all existing DataExtractor functions. This way, anyone who wishes it, can already detect errors by passing an error object. Then, the cursor api becomes just syntactic sugar on top of that, and can be introduced gradually. Once all users have been ported, the non-cursor API can be removed (or more likely replaced by something returning Expected<T>, since the non-cursor api is still handier to use for one-off parses like those in debug_str and similar..

================
Comment at: lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp:60
+    auto AtomForm = static_cast<dwarf::Form>(DS.getU16());
+    HdrData.Atoms.push_back(std::make_pair(AtomType, AtomForm));
+  }
----------------
probinson wrote:
> NumAtoms still isn't validated before being used, and could consume all available memory before you get to the error check.
Indeed. Thanks for catching that. I'll have to remember to go through the comments on this patch and create regression tests once I'm done prototyping...

================
Comment at: lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp:389
+  AugmentationStringSize = alignTo(DS.getU32(), 4);
   AugmentationString.resize(AugmentationStringSize);
+  DS.getU8(reinterpret_cast<uint8_t *>(AugmentationString.data()),
----------------
probinson wrote:
> You're now resizing the string without validating the new size.
Good catch.

This is a pattern that has been present a couple of times in this diff alone, so I've created a special DataExtractor function for safely reading a sequence of integers into a vector.

================
Comment at: lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp:441

-    Result.emplace_back(*AttrEncOr);
+    Result.emplace_back(AttrEnc);
   }
----------------
probinson wrote:
> labath wrote:
> > This check wasn't fully correct, because we could still cross the `EntriesBase` boundary after reading the two ULEB fields.
> This depends on `extractAttributeEncoding()`returning a sentinel if you run off the end of the stream.  That depends on `sentinalAttrEnc()` using the same values that `getULEB128()` returns when it runs off the end.  These dependencies need to be documented.
Documenting them is easy. However, I would say that the more interesting question at this stage is "do they make the code more readable?". I would say "yes", because without this, correctly checking for EOF is nearly impossible. However, there may still be room for improvement here...

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63713/new/

https://reviews.llvm.org/D63713