[Lldb-commits] [PATCH] D23545: Minidump parsing

Tue Aug 16 10:01:16 PDT 2016

zturner added a comment.

LLVM does have something similar to DataExtractor, but unfortunately it's not present in the correct library for you to easily be able to reuse it.  I actually own the code in question, so I'm willing to fix that for you if you're interested, but at the same time the code is very simple.  LLVM has some endian-aware data types that are very convenient to work with and mostly eliminate the need to worry about endianness while parsing, which is what DataExtractor mostly does for you.  To use LLVM's types, for example, you would change this:

  struct MinidumpHeader
  {
      uint32_t signature;
      uint32_t version;
      uint32_t streams_count;
      RVA stream_directory_rva; // offset of the stream directory
      uint32_t checksum;
      uint32_t time_date_stamp; // time_t format
      uint64_t flags;

      static bool
      SignatureMatchAndSetByteOrder(DataExtractor &data, lldb::offset_t *offset);

      static llvm::Optional<MinidumpHeader>
      Parse(const DataExtractor &data, lldb::offset_t *offset);
  };

to this:

  struct MinidumpHeader
  {
      support::ulittle32_t signature;
      support::ulittle32_t version;
      support::ulittle32_t streams_count;
      support::ulittle32_t stream_directory_rva; // offset of the stream directory
      support::ulittle32_t checksum;
      support::ulittle32_t time_date_stamp; // time_t format
      support::ulittle64_t flags;
  };

All you have to do now is `reinterpret_cast` the buffer to a `MiniDumpHeader*` and you're good to go.  So pretty much the entirety of the `DataExtractor` class boils down to a single template function:

  template<typename T>
  Error consumeObject(ArrayRef<uint8_t> &Buffer, const T *&Object) {
    if (Buffer.size() < sizeof(T))
      return make_error<StringError>("Insufficient buffer!");
    Object = reinterpret_cast<const T*>(Buffer.data());
    Buffer = Buffer.drop_front(sizeof(T));
    return Error::success();
  }

For starters, this is nice because it means you're not copying memory around unnecessarily.  You're just pointing to the memory that's already there.  With DataExtractor you are always copying bytes around.  Dump files can be large (even minidumps!) and copying all this memory around is inefficient.

It also makes the syntax cleaner.  You have to call different functions on DataExtractor depending on what you want to extract.  `GetU8` or `GetU16` for example.  This one function works with almost everything.  A few simple template specializations and overloads can make it even more powerful.  For example:

  Error consumeObject(ArrayRef<uint8_t> &Buffer, StringRef &ZeroString) {
     ZeroString = StringRef(reinterpret_cast<const char *>(Buffer.front()));
     Buffer = Buffer.drop_front(ZeroString.size() + 1);
     return Error::success();
  }

I have some more comments on the CL, but I have to run to a meeting, so I will be back later.

https://reviews.llvm.org/D23545