[Lldb-commits] [PATCH] D23545: Minidump parsing

Tue Aug 16 11:13:14 PDT 2016

zturner added inline comments.

================
Comment at: source/Plugins/Process/minidump/MinidumpParser.cpp:29
@@ +28,3 @@
+
+    lldb::ByteOrder byteorder = m_data.GetByteOrder();
+    lldb::offset_t directory_list_offset = m_header->stream_directory_rva;
----------------
I asked some people familiar with breakpad and windows history, and the answer I got was that while minidump may have support big endian prior to windows 2000 when it supported PPC and MIPS, but probably not anymore.

At this point we have full control over the minidump pipeline.  If we want to generate them from within LLDB, we can write them using little endian.  And if we want to read them, we can assume they are not from a file prior to windows 2000.  I think it's safe to assume little endian unless we have strong evidence that we need to parse a big endian minidump.

================
Comment at: source/Plugins/Process/minidump/MinidumpParser.cpp:35
@@ +34,3 @@
+    {
+        m_data.ExtractBytes(directory_list_offset, sizeof(MinidumpDirectory), byteorder, &directory);
+        directory_list_offset += sizeof(MinidumpDirectory);
----------------
Note that this will make a copy of the data.  Based on my previous email, this could be written as follows:

const MiniDumpDirectory *directory = nullptr;

```
for (uint32_t i=0; i < m_header->streams_count; ++i) {
  if (auto EC = consumeObject(data, directory))
    return EC;
  m_directory_map[directory.stream_type = directory->location;
}
```

No copying is involved here.  You would need to make a few changes to do this but I think it is worth it.  Same concept applies throughout the rest of this file, so I won't mention it again.

================
Comment at: source/Plugins/Process/minidump/MinidumpParser.h:13-25
@@ +12,15 @@
+
+// C includes
+
+// C++ includes
+#include <cstring>
+#include <unordered_map>
+
+// Other libraries and framework includes
+#include "lldb/Core/ArchSpec.h"
+#include "lldb/Core/DataExtractor.h"
+#include "llvm/ADT/Optional.h"
+
+// Project includes
+#include "MinidumpTypes.h"
+
----------------
Please include files in the following order:

1. Files in the same directory

2. LLDB includes

3. LLVM includes

4. C and C++ system includes

Although this isn't consistent with the rest of the codebase, it was recently agreed that this is what we want to move towards, so all new code should follow this pattern.

================
Comment at: source/Plugins/Process/minidump/MinidumpParser.h:61
@@ +60,3 @@
+    llvm::Optional<MinidumpHeader> m_header;
+    std::unordered_map<int, MinidumpLocationDescriptor> m_directory_map;
+};
----------------
Please use `llvm::DenseMap`.  `std::unordered_map` should probably not be used for anything ever.

================
Comment at: source/Plugins/Process/minidump/MinidumpParser.h:67
@@ +66,1 @@
+#endif // liblldb_MinidumpParser_h_
\ No newline at end of file

----------------
Newline here.

================
Comment at: source/Plugins/Process/minidump/MinidumpTypes.h:40
@@ +39,3 @@
+
+enum MinidumpStreamType
+{
----------------
amccarth wrote:
>     // Reference:  https://msdn.microsoft.com/en-us/library/windows/desktop/ms680394.aspx
In LLVM where we parse PDB and CodeView debug info types, we have adopted the convention that these are all `enum classes`, and we continue to use LLVM naming style instead of using Microsoft all caps style.  So I would write this as:

```
enum class MiniDumpStreamType : uint32_t {
  Unused = 0,
  Reserved0 = 1,
  Reserved1 = 2,
  ThreadList = 3,
  ...
};
```

The `: uint32_t` is important since enums are signed by default on Microsoft compilers.

================
Comment at: source/Plugins/Process/minidump/MinidumpTypes.h:85
@@ +84,3 @@
+{
+    MINIDUMP_CPU_ARCHITECTURE_X86 = 0,         /* PROCESSOR_ARCHITECTURE_INTEL */
+    MINIDUMP_CPU_ARCHITECTURE_MIPS = 1,        /* PROCESSOR_ARCHITECTURE_MIPS */
----------------
Same goes for these, as well as a few more enums later on.  I would make these all enum classes and name them using the LLVM naming conventions.  

================
Comment at: source/Plugins/Process/minidump/MinidumpTypes.h:147
@@ +146,3 @@
+    MINIDUMP_CPU_ARM_ELF_HWCAP_IDIVT = (1 << 18),
+};
+
----------------
Enum classes don't play nicely with flags, but LLVM has a file called `llvm/ADT/BitmaskEnum.h` which improves this significantly.  So these can also be an enum class, just use the proper macro from that file so that bitwise operations become possible.

================
Comment at: source/Plugins/Process/minidump/MinidumpTypes.h:151
@@ +150,3 @@
+{
+    uint32_t signature;
+    uint32_t version;
----------------
As mentioned in my previous response, all these could be come LLVM endian aware types from `llvm/Support/Endian.h' which would allow you to `reinterpret_cast` straight from the underlying buffer.

================
Comment at: source/Plugins/Process/minidump/MinidumpTypes.h:165-166
@@ +164,4 @@
+};
+const int MINIDUMP_HEADER_SIZE = 3 * 4 + sizeof(RVA) + 2 * 4 + 8;
+static_assert(sizeof(MinidumpHeader) == MINIDUMP_HEADER_SIZE, "sizeof MinidumpHeader is not correct!");
+
----------------
The computation here seems unnecessary to me.  How about just `static_assert(sizeof(MinidumpHeader) == 32)`;

https://reviews.llvm.org/D23545