[PATCH] Add llvm-pdbdump to tools

Fri Jan 23 14:39:48 PST 2015

Additional thoughts and comments to what Chandler posted.

> Index: tools/CMakeLists.txt
> ===================================================================
> --- tools/CMakeLists.txt
> +++ tools/CMakeLists.txt
> @@ -61,6 +61,10 @@
>
>  add_llvm_tool_subdirectory(llvm-go)
>
> +if(MSVC)
> +  add_llvm_tool_subdirectory(llvm-pdbdump)
> +endif()
> +
>  if(NOT CYGWIN AND LLVM_ENABLE_PIC)
>    add_llvm_tool_subdirectory(lto)
>    add_llvm_tool_subdirectory(llvm-lto)
> Index: tools/LLVMBuild.txt
> ===================================================================
> --- tools/LLVMBuild.txt
> +++ tools/LLVMBuild.txt
> @@ -16,7 +16,7 @@
>  ;===------------------------------------------------------------------------===;
>
>  [common]
> -subdirectories = bugpoint llc lli llvm-ar llvm-as llvm-bcanalyzer llvm-cov llvm-diff llvm-dis llvm-dwarfdump llvm-extract llvm-jitlistener llvm-link llvm-lto llvm-mc llvm-nm llvm-objdump llvm-profdata llvm-rtdyld llvm-size macho-dump opt llvm-mcmarkup verify-uselistorder dsymutil
> +subdirectories = bugpoint llc lli llvm-ar llvm-as llvm-bcanalyzer llvm-cov llvm-diff llvm-dis llvm-dwarfdump llvm-extract llvm-jitlistener llvm-link llvm-lto llvm-mc llvm-nm llvm-objdump llvm-pdbdump llvm-profdata llvm-rtdyld llvm-size macho-dump opt llvm-mcmarkup verify-uselistorder dsymutil
>
>  [component_0]
>  type = Group
> Index: tools/llvm-pdbdump/CMakeLists.txt
> ===================================================================
> --- /dev/null
> +++ tools/llvm-pdbdump/CMakeLists.txt
> @@ -0,0 +1,16 @@
> +set(LLVM_LINK_COMPONENTS
> +  Support
> +  )
> +
> +set(MSVC_DIA_SDK_DIR "$ENV{VSINSTALLDIR}DIA SDK")
> +include_directories(${MSVC_DIA_SDK_DIR}/include)
> +if (CMAKE_SIZEOF_VOID_P EQUAL 8)
> +  link_directories(${MSVC_DIA_SDK_DIR}/lib/amd64)
> +else()
> +  link_directories(${MSVC_DIA_SDK_DIR}/lib)
> +endif()
> +
> +add_llvm_tool(llvm-pdbdump
> +  llvm-pdbdump.cpp
> +  )
> +target_link_libraries(llvm-pdbdump diaguids)
> \ No newline at end of file
> Index: tools/llvm-pdbdump/COMExtras.h
> ===================================================================
> --- /dev/null
> +++ tools/llvm-pdbdump/COMExtras.h
> @@ -0,0 +1,264 @@
> +//===- COMExtras.h - Helper files for COM operations -------------*- C++-*-===//
> +//
> +//                     The LLVM Compiler Infrastructure
> +//
> +// This file is distributed under the University of Illinois Open Source
> +// License. See LICENSE.TXT for details.
> +//
> +//===----------------------------------------------------------------------===//
> +
> +#ifndef LLVM_TOOLS_LLVM_PDBDUMP_COMEXTRAS_H
> +#define LLVM_TOOLS_LLVM_PDBDUMP_COMEXTRAS_H
> +
> +#include "llvm/ADT/ArrayRef.h"
> +#include "llvm/ADT/SmallVector.h"
> +
> +template <class F> struct function_traits;

This nit applies to the entire patch: prefer typename over class in
template delcarations. Also, this should all be namespaced.

> +
> +template <class R, class... Args>
> +struct function_traits<R (*)(Args...)> : public function_traits<R(Args...)> {};

Variadic templates will be a problem until we drop MSVC 2012 support.

> +
> +template <class R, class... Args> struct function_traits<R(Args...)> {
> +  using return_type = R;
> +
> +  using args_tuple = std::tuple<Args...>;

You should probably be including <tuple> instead of relying on it
being pulled in by other ADT headers.

> +};
> +
> +template <class C, class R, class... Args>
> +struct function_traits<R (__stdcall C::*)(Args...)>
> +    : public function_traits<R(Args...)> {};
> +
> +template <class FuncTraits, std::size_t arg> struct function_arg {
> +  // Writing function_arg as a separate class that accesses the tuple from
> +  // function_traits is necessary due to what appears to be a bug in MSVC.
> +  // If you write a nested class inside function_traits like this:
> +  // template<std::size_t ArgIndex>
> +  // struct Argument
> +  // {
> +  //   typedef typename
> +  //     std::tuple_element<ArgIndex, std::tuple<Args...>>::type type;
> +  // };
> +  // MSVC encounters a parsing error.

What parsing error does MSVC hit?

> +  typedef
> +      typename std::tuple_element<arg, typename FuncTraits::args_tuple>::type
> +          type;
> +};
> +
> +template <class T> struct remove_double_pointer {};
> +template <class T> struct remove_double_pointer<T **> { typedef T type; };
> +
> +//=============================================================================
> +// class ComIterator<>
> +//
> +// A common idiom in the COM world is to have an enumerator interface, say
> +// IMyEnumerator.  It's responsible for enumerating over some child data type,
> +// say IChildType.  You do the enumeration by calling IMyEnumerator::Next()
> +// to get a pointer to a pointer to the child type.  Eventually it fails,
> +// and that means you're at the end.
> +//
> +// ComIterator represents a single point-in-time of this iteration.  It is
> +// used by ComEnumerator to support iterating in this fashion via range-based
> +// for loops and other common C++ paradigms.
> +//=============================================================================
> +template <class EnumeratorType, std::size_t ArgIndex> class ComIterator {
> +private:

No need for the private access specifier.

> +  using FunctionTraits = function_traits<decltype(&EnumeratorType::Next)>;
> +  typedef typename function_arg<FunctionTraits, ArgIndex>::type FuncArgType;
> +  // FuncArgType is now something like ISomeCOMInterface **.  Remove both
> +  // pointers, so we can make a CComPtr<T> out of it.
> +  typedef typename remove_double_pointer<FuncArgType>::type EnumDataType;
> +
> +public:
> +  explicit ComIterator(CComPtr<EnumeratorType> Enumerator,
> +                       CComPtr<EnumDataType> Current)
> +      : EnumeratorObject(Enumerator), CurrentItem(Current) {}
> +  ComIterator() {}
> +
> +  ComIterator &operator++() {
> +    // EnumeratorObject->Next() expects CurrentItem to be NULL.
> +    CurrentItem.Release();
> +    ULONG Count = 0;
> +    HRESULT hr = EnumeratorObject->Next(1, &CurrentItem, &Count);
> +    if (FAILED(hr) || Count == 0)
> +      *this = ComIterator();
> +
> +    return *this;
> +  }
> +
> +  CComPtr<EnumDataType> operator*() { return CurrentItem; }
> +
> +  bool operator==(const ComIterator &other) const {
> +    return (EnumeratorObject == other.EnumeratorObject) &&
> +           (CurrentItem == other.CurrentItem);
> +  }
> +
> +  bool operator!=(const ComIterator &other) const { return !(*this == other); }
> +
> +  ComIterator &operator=(const ComIterator &other) {
> +    EnumeratorObject = other.EnumeratorObject;
> +    CurrentItem = other.CurrentItem;
> +    return *this;
> +  }

Since this is an iterator, it should expose other iterator
functionality (like operator++(int), and the iterator typedefs
value_type, difference_type, pointer, reference, and
iterator_category).

> +
> +private:
> +  CComPtr<EnumeratorType> EnumeratorObject;
> +  CComPtr<EnumDataType> CurrentItem;

Might as well stick these up with the other private items.

> +};
> +
> +//=============================================================================
> +// class ComEnumerator<>
> +//
> +// ComEnumerator is the top-level "range class" used to support iteration via
> +// range-based for loops.  It simply provides a begin() and end() method which
> +// return appropriately constructed ComIterator<> classes.
> +//=============================================================================
> +template <class EnumeratorType, std::size_t ArgIndex> class ComEnumerator {
> +private:

Private access specifier not required.

> +  typedef function_traits<decltype(&EnumeratorType::Next)> FunctionTraits;
> +  typedef typename function_arg<FunctionTraits, ArgIndex>::type FuncArgType;
> +  typedef typename remove_double_pointer<FuncArgType>::type EnumDataType;
> +
> +public:
> +  ComEnumerator(CComPtr<EnumeratorType> Enumerator)
> +      : EnumeratorObject(Enumerator) {}
> +
> +  ComIterator<EnumeratorType, ArgIndex> begin() {
> +    if (!EnumeratorObject)
> +      return end();
> +
> +    EnumeratorObject->Reset();
> +    ULONG Count = 0;
> +    CComPtr<EnumDataType> FirstItem;
> +    HRESULT hr = EnumeratorObject->Next(1, &FirstItem, &Count);
> +    return (FAILED(hr) || Count == 0)
> +        ? end()
> +        : ComIterator<EnumeratorType, ArgIndex>(EnumeratorObject, FirstItem);
> +  }
> +
> +  ComIterator<EnumeratorType, ArgIndex> end() {
> +    return ComIterator<EnumeratorType, ArgIndex>();
> +  }
> +
> +private:
> +  CComPtr<EnumeratorType> EnumeratorObject;

Might as well move this up as well.

> +};
> +
> +//=============================================================================
> +// class ComDataRecordIterator<>
> +//
> +// Similar to ComIterator<>, but uses a

I'm dying to know what this uses. ;-)

> +//=============================================================================
> +template <class EnumeratorType> class ComDataRecordIterator {
> +public:
> +  explicit ComDataRecordIterator(CComPtr<EnumeratorType> enumerator,
> +                                 uint32_t currentRecord)
> +      : Enumerator(enumerator), CurrentRecord(currentRecord) {}
> +  ComDataRecordIterator() {}
> +
> +  ComDataRecordIterator &operator++() {
> +    // Release the current item so that Enumerator->Next() is happy.
> +    ++CurrentRecord;
> +    ReadNextRecord();

This weirds me out a bit. Since ReadNextRecord can fail, but you are
incrementing CurrentRecord regardless, doesn't this mean you can get
to end() but not compare equal to something returned by end()?

> +    return *this;
> +  }
> +
> +  llvm::ArrayRef<uint8_t> operator*() {
> +    if (CurrentRecord == 0)
> +      ReadNextRecord();
> +
> +    return llvm::ArrayRef<uint8_t>(RecordData.begin(), RecordData.end());
> +  }
> +
> +  bool operator==(const ComDataRecordIterator &other) const {
> +    return (Enumerator == other.Enumerator) &&
> +           (CurrentRecord == other.CurrentRecord);
> +  }
> +
> +  bool operator!=(const ComDataRecordIterator &other) const {
> +    return !(*this == other);
> +  }

Same comments here regarding the iterator interface as above.

> +
> +private:
> +  void ReadNextRecord() {
> +    RecordData.clear();
> +    ULONG Count = 0;
> +    DWORD RequiredBufferSize;
> +    HRESULT hr = Enumerator->Next(1, 0, &RequiredBufferSize, nullptr, &Count);
> +    if (hr == S_OK) {

Please use the SUCCEEDED macro.

> +      RecordData.resize(RequiredBufferSize);
> +      DWORD BytesRead = 0;
> +      hr = Enumerator->Next(1, RequiredBufferSize, &BytesRead,
> +                            RecordData.data(), &Count);
> +    }
> +    if (hr == S_FALSE) {

Please use the FAILED macro. This code will fail to end the
enumeration if Next() fails for any reason other than S_FALSE.

> +      // This is the end of the enumeration.
> +      RecordData.clear();
> +    }
> +  }
> +
> +  CComPtr<EnumeratorType> Enumerator;
> +  uint32_t CurrentRecord;
> +  llvm::SmallVector<uint8_t, 128> RecordData;

Why 128? It seems rather large, and like something the caller might
have a much better idea over.

> +};
> +
> +//=============================================================================
> +// class ComEnumerator<>
> +//
> +// ComEnumerator is the top-level "range class" used to support iteration via
> +// range-based for loops.  It simply provides a begin() and end() method which
> +// return appropriately constructed ComIterator<> classes.

This comment is incorrect copypasta.

> +//=============================================================================
> +template <class EnumeratorType> class ComDataRecordEnumerator {
> +public:
> +  ComDataRecordEnumerator(CComPtr<EnumeratorType> enumerator)
> +      : Enumerator(enumerator) {}
> +
> +  ComDataRecordIterator<EnumeratorType> begin() {
> +    if (Enumerator)
> +      Enumerator->Reset();
> +    return ComDataRecordIterator<EnumeratorType>(Enumerator, 0);
> +  }
> +
> +  ComDataRecordIterator<EnumeratorType> end() {
> +    LONG NumElts = 0;
> +    HRESULT hr = Enumerator->get_Count(&NumElts);
> +    return (FAILED(hr))
> +               ? ComDataRecordIterator<EnumeratorType>(Enumerator, 0)
> +               : ComDataRecordIterator<EnumeratorType>(Enumerator, NumElts);
> +  }
> +
> +private:
> +  CComPtr<EnumeratorType> Enumerator;
> +};
> +
> +//=============================================================================
> +// function com_enumerator<>
> +//
> +// com_enumerator puts together all the magic C++ incantations to deduce all
> +// necessary types (enumerator, child type) automatically.  You need only write
> +//   for (auto item : com_enumerator(MyEnumerator))
> +//   {
> +//   }
> +//=============================================================================
> +template <class EnumeratorType>
> +ComEnumerator<EnumeratorType, 1>
> +com_enumerator(CComPtr<EnumeratorType> Enumerator) {
> +  return ComEnumerator<EnumeratorType, 1>(Enumerator);
> +}

make_com_enumerator to be more STL-like?

> +
> +//=============================================================================
> +// function com_data_record_enumerator<>
> +//
> +// com_data_record_enumerator returns a ComDataRecordEnumerator appropriately
> +// parameterized so that it can be used in a range-based for loop.  You need
> +// only write
> +//   for (auto record : com_data_record_enumerator(MyEnumerator))
> +//   {
> +//   }
> +//=============================================================================
> +template <class EnumeratorType>
> +ComDataRecordEnumerator<EnumeratorType>
> +com_data_record_enumerator(CComPtr<EnumeratorType> Enumerator) {
> +  return ComDataRecordEnumerator<EnumeratorType>(Enumerator);
> +}

make_com_data_record_enumerator is a mouthful, but yay consistency in
suggestions?

> +#endif
> Index: tools/llvm-pdbdump/LLVMBuild.txt

The rest of the patch seems like it's a separate commit from COM
functionality, but I'm not strongly tied to that idea.

> ===================================================================
> --- /dev/null
> +++ tools/llvm-pdbdump/LLVMBuild.txt
> @@ -0,0 +1,23 @@
> +;===- ./tools/llvm-pdbdump/LLVMBuild.txt -----------------------*- Conf -*--===;
> +;
> +;                     The LLVM Compiler Infrastructure
> +;
> +; This file is distributed under the University of Illinois Open Source
> +; License. See LICENSE.TXT for details.
> +;
> +;===------------------------------------------------------------------------===;
> +;
> +; This is an LLVMBuild description file for the components in this subdirectory.
> +;
> +; For more information on the LLVMBuild system, please see:
> +;
> +;   http://llvm.org/docs/LLVMBuild.html
> +;
> +;===------------------------------------------------------------------------===;
> +
> +[component_0]
> +type = Tool
> +name = llvm-pdbdump
> +parent = Tools
> +required_libraries =
> +
> Index: tools/llvm-pdbdump/llvm-pdbdump.cpp
> ===================================================================
> --- /dev/null
> +++ tools/llvm-pdbdump/llvm-pdbdump.cpp
> @@ -0,0 +1,152 @@
> +//===- llvm-pdbdump.cpp - Dump debug info from a PDB file -------*- C++ -*-===//
> +//
> +//                     The LLVM Compiler Infrastructure
> +//
> +// This file is distributed under the University of Illinois Open Source
> +// License. See LICENSE.TXT for details.
> +//
> +//===----------------------------------------------------------------------===//
> +//
> +// Dumps debug information present in PDB files.  This utility makes use of
> +// the Microsoft Windows SDK, so will not compile or run on non-Windows
> +// platforms.
> +//
> +//===----------------------------------------------------------------------===//
> +
> +#define NTDDI_VERSION NTDDI_VISTA
> +#define _WIN32_WINNT _WIN32_WINNT_VISTA
> +#define WINVER _WIN32_WINNT_VISTA
> +#ifndef NOMINMAX
> +#define NOMINMAX
> +#endif
> +
> +#include <atlbase.h>
> +#include <windows.h>
> +#include <dia2.h>
> +
> +#include "llvm/ADT/ArrayRef.h"
> +#include "llvm/Support/CommandLine.h"
> +#include "llvm/Support/ConvertUTF.h"
> +#include "llvm/Support/Format.h"
> +#include "llvm/Support/ManagedStatic.h"
> +#include "llvm/Support/raw_ostream.h"
> +#include "llvm/Support/Process.h"
> +#include "llvm/Support/PrettyStackTrace.h"
> +#include "llvm/Support/Signals.h"
> +
> +#include "COMExtras.h"
> +
> +using namespace llvm;
> +
> +namespace llvm {
> +namespace sys {
> +namespace windows {
> +extern std::error_code UTF8ToUTF16(StringRef utf8,
> +                                   SmallVectorImpl<wchar_t> &utf16);
> +extern std::error_code UTF16ToUTF8(const wchar_t *utf16, size_t utf16_len,
> +                                   SmallVectorImpl<char> &utf8);

Ugh, this really needs to find a reasonable home. I'm starting to
suspect that now is the time.

> +}
> +}
> +}
> +
> +namespace opts {
> +cl::list<std::string> InputFilenames(cl::Positional,
> +                                     cl::desc("<input PDB files>"),
> +                                     cl::OneOrMore);
> +
> +// -streams, -s
> +cl::opt<bool> Streams("streams", cl::desc("Display data stream information"));
> +cl::alias StreamsShort("s", cl::desc("Alias for --streams"),
> +                       cl::aliasopt(Streams));
> +
> +// stream-data, -S
> +cl::opt<bool> StreamData("stream-data",
> +                         cl::desc("Dumps stream record data as bytes"));
> +cl::alias StreamDataShort("S", cl::desc("Alias for --stream-data"),
> +                          cl::aliasopt(StreamData));
> +}
> +
> +static void dumpDataStreams(IDiaSession *session) {
> +  CComPtr<IDiaEnumDebugStreams> DebugStreams = nullptr;
> +  if (session->getEnumDebugStreams(&DebugStreams) == S_OK) {

Should use the SUCCEEDED macro.

> +    LONG Count = 0;
> +    if (FAILED(DebugStreams->get_Count(&Count)))
> +      return;
> +    outs() << "Data Streams [count=" << Count << "]\n";
> +
> +    llvm::SmallString<32> Name8;
> +
> +    for (auto Stream : com_enumerator(DebugStreams)) {
> +      BSTR Name16;
> +      if (Stream->get_name(&Name16) != S_OK)

Should use the FAILED macro.

> +        continue;
> +      if (!llvm::sys::windows::UTF16ToUTF8(Name16, SysStringLen(Name16), Name8))
> +        outs() << "  " << Name8;
> +      ::SysFreeString(Name16);
> +      if (SUCCEEDED(Stream->get_Count(&Count))) {
> +        outs() << " [" << Count << " records]\n";
> +        if (opts::StreamData) {
> +          int RecordIndex = 0;
> +          for (auto StreamRecord : com_data_record_enumerator(Stream)) {
> +            outs() << "    Record " << RecordIndex << " ["
> +                   << StreamRecord.size() << " bytes]";
> +            for (size_t i = 0; i < StreamRecord.size(); ++i) {

Why not a range-based for loop here as well?

> +              outs() << " "
> +                     << llvm::format_hex(StreamRecord[i], 2, true, false);
> +            }
> +            outs() << "\n";
> +            ++RecordIndex;
> +          }
> +        }
> +      } else
> +        outs() << "\n";
> +    }
> +  }
> +  outs().flush();
> +}
> +
> +static void dumpInput(StringRef Path) {
> +  SmallVector<wchar_t, 128> path_utf16;
> +  std::error_code EC = llvm::sys::windows::UTF8ToUTF16(Path, path_utf16);
> +  CComPtr<IDiaDataSource> source;
> +  HRESULT hr =
> +      ::CoCreateInstance(CLSID_DiaSource, nullptr, CLSCTX_INPROC_SERVER,
> +                         __uuidof(IDiaDataSource), (void **)&source);
> +  if (FAILED(hr))
> +    return;
> +  if (FAILED(source->loadDataFromPdb(path_utf16.data())))
> +    return;
> +  CComPtr<IDiaSession> session;
> +  if (FAILED(source->openSession(&session)))
> +    return;
> +  if (opts::Streams || opts::StreamData) {
> +    dumpDataStreams(session);
> +  }
> +}
> +
> +int main(int argc_, const char *argv_[]) {
> +  // Print a stack trace if we signal out.
> +  sys::PrintStackTraceOnErrorSignal();
> +  PrettyStackTraceProgram X(argc_, argv_);
> +
> +  SmallVector<const char *, 256> argv;
> +  llvm::SpecificBumpPtrAllocator<char> ArgAllocator;
> +  std::error_code EC = llvm::sys::Process::GetArgumentVector(
> +      argv, llvm::makeArrayRef(argv_, argc_), ArgAllocator);
> +  if (EC) {
> +    llvm::errs() << "error: couldn't get arguments: " << EC.message() << '\n';
> +    return 1;
> +  }
> +
> +  llvm_shutdown_obj Y; // Call llvm_shutdown() on exit.
> +
> +  cl::ParseCommandLineOptions(argv.size(), argv.data(), "LLVM PDB Dumper\n");
> +
> +  CoInitializeEx(nullptr, COINIT_MULTITHREADED);
> +
> +  std::for_each(opts::InputFilenames.begin(), opts::InputFilenames.end(),
> +                dumpInput);
> +
> +  CoUninitialize();
> +  return 0;
> +}
>

~Aaron

On Fri, Jan 23, 2015 at 4:48 PM, Zachary Turner <zturner at google.com> wrote:
> ================
> Comment at: tools/CMakeLists.txt:64
> @@ -63,1 +63,3 @@
>
> +if(MSVC)
> +  add_llvm_tool_subdirectory(llvm-pdbdump)
> ----------------
> chandlerc wrote:
>> Shouldn't this be checking if the host OS is Windows rather than the compiler?
> COM uses a bunch of Microsoft extensions and stuff.  I'm pretty sure this won't compile with MinGW, for example.  Is there a better check for "any MSVC compatible compiler, including clang"?
>
> ================
> Comment at: tools/llvm-pdbdump/COMExtras.h:18-19
> @@ +17,4 @@
> +
> +template <class R, class... Args>
> +struct function_traits<R (*)(Args...)> : public function_traits<R(Args...)> {};
> +
> ----------------
> chandlerc wrote:
>> MSVC 2012 doesn't support variadiac templates I thought?
> Umm..  Crap?  Can we drop support for VS 2012 yet?  lol.  I'm not really sure how to implement this without it.
>
> ================
> Comment at: tools/llvm-pdbdump/llvm-pdbdump.cpp:44-47
> @@ +43,6 @@
> +namespace windows {
> +extern std::error_code UTF8ToUTF16(StringRef utf8,
> +                                   SmallVectorImpl<wchar_t> &utf16);
> +extern std::error_code UTF16ToUTF8(const wchar_t *utf16, size_t utf16_len,
> +                                   SmallVectorImpl<char> &utf8);
> +}
> ----------------
> chandlerc wrote:
>> If these aren't provided by Support, they should be. If they are, just include the header rather than declaring them yourself.
> They're part of that stuff that I used to frequently complain about which is in Support, but not exposed through a header file since it's Windows-specific.
>
> ================
> Comment at: tools/llvm-pdbdump/llvm-pdbdump.cpp:127
> @@ +126,3 @@
> +
> +int main(int argc_, const char *argv_[]) {
> +  // Print a stack trace if we signal out.
> ----------------
> chandlerc wrote:
>> Why the _s?
> Because I use llvm::Sys::Process::GetArgumentVector() so there ends up being a second argv.  I used _ to make it clear that they're not going to be used, and that the argv vector should be used instead.
>
> http://reviews.llvm.org/D7153
>
> EMAIL PREFERENCES
>   http://reviews.llvm.org/settings/panel/emailpreferences/
>
>