[llvm-commits] [PATCH] Teach llvm-objdump to dump Win64 exception tables

Kai kai at redstar.de
Tue Nov 27 12:32:31 PST 2012


Hi Sean!

I reworked the function names and the other stuff you mentioned.
But I did not invent an "UnwindInfoIterator". I expect that future 
patches extend the capabilities of the tool in other places.
In fact I am already thinking about how to output the language specific 
EH data. This would be a future patch in which I also correct the 
repeated reinterpret_cast<> expressions.

You are right that the changes to WinEH64.h have different root causes. 
In fact they were done by two different people! I noted this in the 
commit messages.

Thanks for all your very helpful comments!!!

Regards
Kai

On 25.11.2012 22:45, Sean Silva wrote:
> Just a few tiny nits and the first patch looks ready to be committed
> (although I'd like Michael to check off on it).
>
>> The type casts are necessary, otherwise the output is garbage. I think the
>> ulittle8_t value is interpreted as a char, not a number.
>
> You might want to note that in a comment.
>
> +  case UOP_SaveNonVolBig:
> +  case UOP_SaveXMM128Big:
> +    UsedSlots = 3;
> +    break;
> +  case UOP_AllocLarge:
> +    UsedSlots =  (UnwindCode.getOpInfo() == 0) ? 2 : 3;
> +    break;
> +  }
> +  return UsedSlots;
> +}
>
> You can get rid of UsedSlots here and just return the size directly.
>
> +static void printUnwindCode(ArrayRef<UnwindCode> UCs) {
>
> +static void printCOFFUnwindCode(ArrayRef<UnwindCode> UCs) {
>
> I think these names need to be adjusted. The calling sequence
> printCOFFUnwindCode -> printUnwindCode doesn't make sense, since it
> seems like a more specific function is calling a more generic
> function. I'm not super familiar with the terminology here, but
> shouldn't these be plural, since they are printing multiple unwind
> codes `ArrayRef<UnwindCode> UCs`? The naming should also reflect that
> "printCOFFUnwindCode" prints multiple of whatever "printUnwindCode"
> prints. From the comment, it seems like "printUnwindCode" prints
> "unwind code entry"'s, so how about renaming them to
> printCOFFUnwindCodeEntries and printCOFFUnwindCodeEntry?
>
> Also, depending on how useful this would be in other places, (in a
> future patch set) you could write an "unwind code entry iterator"
> which is a forward iterator whose value_type is a class
> UnwindCodeEntry. I'm not sure how useful that would be, but that seems
> like a really clean way to work with these if you need to do something
> more complicated with them besides just printing them.
>
> +// Prints one unwind code entry. Because an entry can occupy up to 3 slots in
> +// the codes array, this function requires that the correct number slots is
> +// provided.
>
> Even though the check has already been done in the caller, it probably
> wouldn't hurt to `assert(UCs.size() >= getNumUsedSlots(UCs.front()))`
> at the start of this function for a bit of bulletproofing.
>
> +  uint8_t getVersion() const {
> +  return VersionAndFlags & 0x07;
> +  }
> +  uint8_t getFlags() const {
> +  return (VersionAndFlags >> 3) & 0x1f;
> +  }
> +  uint8_t getFrameRegister() const {
> +  return FrameRegisterAndOffset & 0x0f;
> +  }
> +  uint8_t getFrameOffset() const {
> +  return (FrameRegisterAndOffset >> 4) & 0x0f;
> +  }
>
> Indent the bodies here.
>
> Regarding 02-win64eh.diff, it looks like there are a number of
> different changes smashed into a single patch. Your original patch
> only contains changes from uint64_t to uint32_t. What was the problem
> that was fixed by that? I think it should be a separate patch so it is
> clear what is being changed and why. The change from bitfields to
> functions should be broken off into its own patch too. Breaking
> changes up like this is fundamental to LLVM's incremental development
> style; the goal of each patch is to communicate a single focused,
> commitable change to the reviewer.
>
>     void *getLanguageSpecificData() {
> -    return reinterpret_cast<void *>(&unwindCodes[(numCodes+1) & ~1]);
> +    return reinterpret_cast<void *>(&UnwindCodes[(NumCodes+1) & ~1]);
>     }
>
> You could clean up this function to make this be templated on the
> return type and have it directly return a pointer to the desired type,
> which should greatly reduce the number of reinterpret_cast<>'s in the
> surrounding code. You can do this in a separate patch (now or in the
> future).
>
> So in summary, (as far as I understand what is trying to be
> accomplished in this patch set) I think that you should break these
> changes into three patches:
>
> 1. fix the uint64_t to uint32_t size issues (include a good commit
> message explaining why this is needed)
> 2. cleanup the bitfields to functions
> 3. The changes in 01-llvm-objdump.diff (with the small fixes as requested above)
> [4. (optional) make getLanguageSpecificData be templated on the return
> type and clean up the reinterpret casts (you can put this anywhere in
> the patch set)]
>
> Also, I see that you are using git. I recommend that you use
> git-format-patch to make the patches, since that will preserve your
> commit message (otherwise the committer might make one up that you
> don't like). Don't be afraid to write nice long commit messages
> explaining the change and why it is needed ;)
>
> -- Sean Silva
>
> On Sun, Nov 25, 2012 at 11:01 AM, Kai <kai at redstar.de> wrote:
>> Hi Sean!
>>
>> The hint to use ArrayRef's turned out to be really good. I have the feeling
>> that a lot of the complexity is gone.
>>
>> I also implemented your other suggestions.
>>
>> The type casts are necessary, otherwise the output is garbage. I think the
>> ulittle8_t value is interpreted as a char, not a number. The casts make sure
>> that the value is formatted as a number. However I changed the old-style
>> cast to a static_cast<>.
>>
>> Regards
>> Kai
>>
>>
>> On 24.11.2012 21:47, Sean Silva wrote:
>>>
>>> This is a lot better!
>>>
>>> +unsigned printUnwindCode(const UnwindCode *UnwindCodes, unsigned
>>> SlotsAvail) {
>>>
>>> Use ArrayRef's instead of pointer-and-length. Here and in any callers.
>>>
>>> +void printCOFFUnwindCode(const UnwindCode *UnwindCodes, unsigned
>>> NumCodes) {
>>> +  for (unsigned i = 0; i < NumCodes; ) {
>>> +    i += printUnwindCode(&UnwindCodes[i], NumCodes - i);
>>> +  }
>>> +}
>>>
>>> The `NumCodes-i` here is not immediately obvious; better to use an
>>> ArrayRef. How about
>>>
>>> void printCOFFUnwindCode(ArrayRef<UnwindCode> UCs) {
>>>     for (const UnwindCode *I = UCs.begin(), *E = UCs.end(); I < E;)
>>>      I += printUnwindCode(ArrayRef<UnwindCode>(I, E));
>>> }
>>>
>>> ?
>>>
>>> +  switch (UnwindCodes[0].getUnwindOp()) {
>>> +  default: llvm_unreachable("Invalid unwind code");
>>> +  case UOP_PushNonVol:
>>> +  case UOP_AllocSmall:
>>> +  case UOP_SetFPReg:
>>> +  case UOP_PushMachFrame:
>>> +    UsedSlots = 1;
>>> +    break;
>>> +  case UOP_SaveNonVol:
>>> +  case UOP_SaveXMM128:
>>> +    UsedSlots = 2;
>>> +    break;
>>> +  case UOP_SaveNonVolBig:
>>> +  case UOP_SaveXMM128Big:
>>> +    UsedSlots = 3;
>>> +    break;
>>> +  case UOP_AllocLarge:
>>> +    UsedSlots =  (UnwindCodes[0].getOpInfo() == 0) ? 2 : 3;
>>> +    break;
>>> +  }
>>>
>>> How about breaking this out into a static function getNumUsedSlots()?
>>> Same for the other switch in this function. You can probably then
>>> inline what remains of this function (which should be very little)
>>> into the loop in printCOFFUnwindCode().
>>>
>>>
>>> +namespace {
>>>
>>> For functions, use static. See
>>> <http://llvm.org/docs/CodingStandards.html#anonymous-namespaces>.
>>>
>>> +  if (UsedSlots > SlotsAvail) {
>>> +    outs() << "Unwind data corrupted\n";
>>> +    return SlotsAvail;
>>> +  }
>>>
>>> Can you diagnose the situation better? Maybe say something like
>>> "encountered unwind op <...> of size <N> slots, but only <X> remaining
>>> in buffer".
>>>
>>> +        outs() << "  Size of prolog: " << (int) UI->PrologSize << "\n";
>>> +        outs() << "  Number of Codes: " << (int) UI->NumCodes << "\n";
>>>
>>> Why the casts here?
>>>
>>> +    if (Name == ".pdata") {
>>> +      Pdata = Obj->getCOFFSection(SI);
>>>
>>> I feel like you could use an early exit/continue to reduce the nesting
>>> here.
>>>
>>> +      unsigned i = 0;
>>> +      while ((Contents.size() - i) >= sizeof(RuntimeFunction)) {
>>> +        const RuntimeFunction *RF =
>>> +            reinterpret_cast<const RuntimeFunction *>(Contents.data() +
>>> i);
>>>
>>> This loop is wacky. Can you rewrite it as a regular for loop over
>>> RuntimeFunction*'s? It's really hard to tell what is actually being
>>> iterated over.
>>>
>>> +  resolveSymbolName(Rels, Offset, Sym);
>>>
>>> not checking the error code?
>>>
>>> -- Sean Silva
>>
>>
>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>

-------------- next part --------------
>From 5a30baabcb910806ff0d187eae59a6b21bfd0507 Mon Sep 17 00:00:00 2001
From: kai <kai at redstar.de>
Date: Mon, 26 Nov 2012 16:40:05 +0100
Subject: [PATCH 1/3] Change member types of RuntimeFunction and UnwindInfo
 from uint64_t to uint32_t.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

These members represent addresses. According to MSDN, they are image relative, that is, they are 32-bit offsets from the starting address of the image that contains the function table entry.
See MSDN for more information:
RUNTIME_FUNCTION: http://msdn.microsoft.com/en-us/library/ft9x1kdx.aspx
UNWIND_INFO: http://msdn.microsoft.com/en-us/library/ddssxxy8.aspx

Patch by João Matos.
---
 include/llvm/Support/Win64EH.h | 43 +++++++++++++++++++++++++++++++-----------
 1 file changed, 32 insertions(+), 11 deletions(-)

diff --git a/include/llvm/Support/Win64EH.h b/include/llvm/Support/Win64EH.h
index 8d74e10..a2c993b 100644
--- a/include/llvm/Support/Win64EH.h
+++ b/include/llvm/Support/Win64EH.h
@@ -60,9 +60,9 @@ enum {
 
 /// RuntimeFunction - An entry in the table of functions with unwind info.
 struct RuntimeFunction {
-  uint64_t startAddress;
-  uint64_t endAddress;
-  uint64_t unwindInfoOffset;
+  uint32_t startAddress;
+  uint32_t endAddress;
+  uint32_t unwindInfoOffset;
 };
 
 /// UnwindInfo - An entry in the exception table.
@@ -75,22 +75,43 @@ struct UnwindInfo {
           frameOffset:4;
   UnwindCode unwindCodes[1];
 
+  /* The data after unwindCodes depends on flags.
+   * If UNW_ExceptionHandler or UNW_TerminateHandler is set then follows
+   * the address of the language-specific exception handler.
+   * If UNW_ChainInfo is set then follows a RuntimeFunction which defines
+   * the chained unwind info.
+   * For more information please see MSDN at:
+   * http://msdn.microsoft.com/en-us/library/ddssxxy8.aspx
+   */
+
+  /// getLanguageSpecificData - Return pointer to language specific data part
+  /// of UnwindInfo.
   void *getLanguageSpecificData() {
     return reinterpret_cast<void *>(&unwindCodes[(numCodes+1) & ~1]);
   }
-  uint64_t getLanguageSpecificHandlerOffset() {
-    return *reinterpret_cast<uint64_t *>(getLanguageSpecificData());
-  }
-  void setLanguageSpecificHandlerOffset(uint64_t offset) {
-    *reinterpret_cast<uint64_t *>(getLanguageSpecificData()) = offset;
+
+  /// getLanguageSpecificHandlerOffset - Return image-relativ offset of
+  /// language-specific exception handler
+  uint32_t getLanguageSpecificHandlerOffset() {
+    return *reinterpret_cast<uint32_t *>(getLanguageSpecificData());
   }
-  RuntimeFunction *getChainedFunctionEntry() {
-    return reinterpret_cast<RuntimeFunction *>(getLanguageSpecificData());
+
+  /// setLanguageSpecificHandlerOffset - Set image-relativ offset of
+  /// language-specific exception handler
+  void setLanguageSpecificHandlerOffset(uint32_t offset) {
+    *reinterpret_cast<uint32_t *>(getLanguageSpecificData()) = offset;
   }
+
+  /// getExceptionData - Return pointer to exception-specific data.
   void *getExceptionData() {
-    return reinterpret_cast<void *>(reinterpret_cast<uint64_t *>(
+    return reinterpret_cast<void *>(reinterpret_cast<uint32_t *>(
                                                   getLanguageSpecificData())+1);
   }
+
+  /// getChainedFunctionEntry - Return pointer to chained unwind info.
+  RuntimeFunction *getChainedFunctionEntry() {
+    return reinterpret_cast<RuntimeFunction *>(getLanguageSpecificData());
+  }
 };
 
 
-- 
1.8.0.msysgit.0

-------------- next part --------------
>From 720546c88ee1c4fc3c259118e9a6e5bbe692da21 Mon Sep 17 00:00:00 2001
From: kai <kai at redstar.de>
Date: Tue, 27 Nov 2012 06:44:30 +0100
Subject: [PATCH 2/3] Make Win64.h platform-neutral.

The standard types unit8_t, uint16_t and uint32_t are replaced with their counterparts from Endian.h. Accessor fucntions are introduced to replace bit fields.

Patch by Kai Nacke.
---
 include/llvm/Support/Win64EH.h | 48 +++++++++++++++++++++++++++++-------------
 1 file changed, 33 insertions(+), 15 deletions(-)

diff --git a/include/llvm/Support/Win64EH.h b/include/llvm/Support/Win64EH.h
index a2c993b..a50ac3005 100644
--- a/include/llvm/Support/Win64EH.h
+++ b/include/llvm/Support/Win64EH.h
@@ -17,6 +17,7 @@
 #define LLVM_SUPPORT_WIN64EH_H
 
 #include "llvm/Support/DataTypes.h"
+#include "llvm/Support/Endian.h"
 
 namespace llvm {
 namespace Win64EH {
@@ -39,11 +40,17 @@ enum UnwindOpcodes {
 /// or part thereof.
 union UnwindCode {
   struct {
-    uint8_t codeOffset;
-    uint8_t unwindOp:4,
-            opInfo:4;
+    support::ulittle8_t CodeOffset;
+    support::ulittle8_t UnwindOpAndOpInfo;
   } u;
-  uint16_t frameOffset;
+  support::ulittle16_t FrameOffset;
+
+  uint8_t getUnwindOp() const {
+    return u.UnwindOpAndOpInfo & 0x0F;
+  }
+  uint8_t getOpInfo() const {
+    return (u.UnwindOpAndOpInfo >> 4) & 0x0F;
+  }
 };
 
 enum {
@@ -60,20 +67,31 @@ enum {
 
 /// RuntimeFunction - An entry in the table of functions with unwind info.
 struct RuntimeFunction {
-  uint32_t startAddress;
-  uint32_t endAddress;
-  uint32_t unwindInfoOffset;
+  support::ulittle32_t StartAddress;
+  support::ulittle32_t EndAddress;
+  support::ulittle32_t UnwindInfoOffset;
 };
 
 /// UnwindInfo - An entry in the exception table.
 struct UnwindInfo {
-  uint8_t version:3,
-          flags:5;
-  uint8_t prologSize;
-  uint8_t numCodes;
-  uint8_t frameRegister:4,
-          frameOffset:4;
-  UnwindCode unwindCodes[1];
+  support::ulittle8_t VersionAndFlags;
+  support::ulittle8_t PrologSize;
+  support::ulittle8_t NumCodes;
+  support::ulittle8_t FrameRegisterAndOffset;
+  UnwindCode UnwindCodes[1];
+
+  uint8_t getVersion() const {
+    return VersionAndFlags & 0x07;
+  }
+  uint8_t getFlags() const {
+    return (VersionAndFlags >> 3) & 0x1f;
+  }
+  uint8_t getFrameRegister() const {
+    return FrameRegisterAndOffset & 0x0f;
+  }
+  uint8_t getFrameOffset() const {
+    return (FrameRegisterAndOffset >> 4) & 0x0f;
+  }
 
   /* The data after unwindCodes depends on flags.
    * If UNW_ExceptionHandler or UNW_TerminateHandler is set then follows
@@ -87,7 +105,7 @@ struct UnwindInfo {
   /// getLanguageSpecificData - Return pointer to language specific data part
   /// of UnwindInfo.
   void *getLanguageSpecificData() {
-    return reinterpret_cast<void *>(&unwindCodes[(numCodes+1) & ~1]);
+    return reinterpret_cast<void *>(&UnwindCodes[(NumCodes+1) & ~1]);
   }
 
   /// getLanguageSpecificHandlerOffset - Return image-relativ offset of
-- 
1.8.0.msysgit.0

-------------- next part --------------
>From 9aa629d906b45db3c403b278ca1bdcb28bab26a3 Mon Sep 17 00:00:00 2001
From: kai <kai at redstar.de>
Date: Tue, 27 Nov 2012 20:13:04 +0100
Subject: [PATCH 3/3] Add dump of Win64 EH unwind data.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The new command line option -unwind-info dumps the Win64 EH unwind data to the console. This is a nice feature if you need to debug generated EH data (e.g. from LLVM).
Includes a test case.

Initial patch by João Matos, extensions and rework by Kai Nacke.
---
 test/tools/llvm-objdump/win64-unwind-data.s | 106 +++++++++
 tools/llvm-objdump/CMakeLists.txt           |   1 +
 tools/llvm-objdump/COFFDump.cpp             | 355 ++++++++++++++++++++++++++++
 tools/llvm-objdump/llvm-objdump.cpp         |  29 ++-
 tools/llvm-objdump/llvm-objdump.h           |   9 +
 5 files changed, 497 insertions(+), 3 deletions(-)
 create mode 100644 test/tools/llvm-objdump/win64-unwind-data.s
 create mode 100644 tools/llvm-objdump/COFFDump.cpp

diff --git a/test/tools/llvm-objdump/win64-unwind-data.s b/test/tools/llvm-objdump/win64-unwind-data.s
new file mode 100644
index 0000000..1e4c742
--- /dev/null
+++ b/test/tools/llvm-objdump/win64-unwind-data.s
@@ -0,0 +1,106 @@
+// This test checks that the unwind data is dumped by llvm-objdump.
+// RUN: llvm-mc -triple x86_64-pc-win32 -filetype=obj %s | llvm-objdump -u - | FileCheck %s
+
+// CHECK:      Unwind info:
+// CHECK:      Function Table:
+// CHECK-NEXT: Start Address: .text
+// CHECK-NEXT: End Address: .text + 0x001b
+// CHECK-NEXT: Unwind Info Address: .xdata
+// CHECK-NEXT: Version: 1
+// CHECK-NEXT: Flags: 1 UNW_ExceptionHandler
+// CHECK-NEXT: Size of prolog: 18
+// CHECK-NEXT: Number of Codes: 8
+// CHECK-NEXT: Frame register: RBX
+// CHECK-NEXT: Frame offset: 0
+// CHECK-NEXT: Unwind Codes:
+// CHECK-NEXT: 0x00: UOP_SetFPReg
+// CHECK-NEXT: 0x0f: UOP_PushNonVol RBX
+// CHECK-NEXT: 0x0e: UOP_SaveXMM128 XMM8 [0x0000]
+// CHECK-NEXT: 0x09: UOP_SaveNonVol RSI [0x0010]
+// CHECK-NEXT: 0x04: UOP_AllocSmall 24
+// CHECK-NEXT: 0x00: UOP_PushMachFrame w/o error code
+// CHECK:      Function Table:
+// CHECK-NEXT: Start Address: .text + 0x0012
+// CHECK-NEXT: End Address: .text + 0x0012
+// CHECK-NEXT: Unwind Info Address: .xdata + 0x001c
+// CHECK-NEXT: Version: 1
+// CHECK-NEXT: Flags: 4 UNW_ChainInfo
+// CHECK-NEXT: Size of prolog: 0
+// CHECK-NEXT: Number of Codes: 0
+// CHECK-NEXT: No frame pointer used
+// CHECK:      Function Table:
+// CHECK-NEXT: Start Address: .text + 0x001b
+// CHECK-NEXT: End Address: .text + 0x001c
+// CHECK-NEXT: Unwind Info Address: .xdata + 0x002c
+// CHECK-NEXT: Version: 1
+// CHECK-NEXT: Flags: 0
+// CHECK-NEXT: Size of prolog: 0
+// CHECK-NEXT: Number of Codes: 0
+// CHECK-NEXT: No frame pointer used
+// CHECK:      Function Table:
+// CHECK-NEXT: Start Address: .text + 0x001c
+// CHECK-NEXT: End Address: .text + 0x0039
+// CHECK-NEXT: Unwind Info Address: .xdata + 0x0034
+// CHECK-NEXT: Version: 1
+// CHECK-NEXT: Flags: 0
+// CHECK-NEXT: Size of prolog: 14
+// CHECK-NEXT: Number of Codes: 6
+// CHECK-NEXT: No frame pointer used
+// CHECK-NEXT: Unwind Codes:
+// CHECK-NEXT: 0x0e: UOP_AllocLarge 8454128
+// CHECK-NEXT: 0x07: UOP_AllocLarge 8190
+// CHECK-NEXT: 0x00: UOP_PushMachFrame w/o error code
+
+    .text
+    .globl func
+    .def func; .scl 2; .type 32; .endef
+    .seh_proc func
+func:
+    .seh_pushframe @code
+    subq $24, %rsp
+    .seh_stackalloc 24
+    movq %rsi, 16(%rsp)
+    .seh_savereg %rsi, 16
+    movups %xmm8, (%rsp)
+    .seh_savexmm %xmm8, 0
+    pushq %rbx
+    .seh_pushreg 3
+    mov %rsp, %rbx
+    .seh_setframe 3, 0
+    .seh_endprologue
+    .seh_handler __C_specific_handler, @except
+    .seh_handlerdata
+    .long 0
+    .text
+    .seh_startchained
+    .seh_endprologue
+    .seh_endchained
+    lea (%rbx), %rsp
+    pop %rbx
+    addq $24, %rsp
+    ret
+    .seh_endproc
+
+// Test emission of small functions.
+    .globl smallFunc
+    .def smallFunc; .scl 2; .type 32; .endef
+    .seh_proc smallFunc
+smallFunc:
+    ret
+    .seh_endproc
+
+// Function with big stack allocation.
+    .globl smallFunc
+    .def allocFunc; .scl 2; .type 32; .endef
+    .seh_proc smallFunc
+allocFunc:
+    .seh_pushframe @code
+    subq $65520, %rsp
+    .seh_stackalloc 65520
+    sub $8454128, %rsp
+    .seh_stackalloc 8454128
+    .seh_endprologue
+    add $8454128, %rsp
+    addq $65520, %rsp
+    ret
+    .seh_endproc
diff --git a/tools/llvm-objdump/CMakeLists.txt b/tools/llvm-objdump/CMakeLists.txt
index f3b2e1f..5001435 100644
--- a/tools/llvm-objdump/CMakeLists.txt
+++ b/tools/llvm-objdump/CMakeLists.txt
@@ -9,6 +9,7 @@ set(LLVM_LINK_COMPONENTS
 
 add_llvm_tool(llvm-objdump
   llvm-objdump.cpp
+  COFFDump.cpp
   MachODump.cpp
   MCFunction.cpp
   )
diff --git a/tools/llvm-objdump/COFFDump.cpp b/tools/llvm-objdump/COFFDump.cpp
new file mode 100644
index 0000000..1218afa
--- /dev/null
+++ b/tools/llvm-objdump/COFFDump.cpp
@@ -0,0 +1,355 @@
+//===-- COFFDump.cpp - COFF-specific dumper ---------------------*- C++ -*-===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
+///
+/// \file
+/// \brief This file implements the COFF-specific dumper for llvm-objdump.
+/// It outputs the Win64 EH data structures as plain text.
+/// The encoding of the unwind codes is decribed in MSDN:
+/// http://msdn.microsoft.com/en-us/library/ck9asaa9.aspx
+///
+//===----------------------------------------------------------------------===//
+
+#include "llvm-objdump.h"
+#include "llvm/Object/COFF.h"
+#include "llvm/Object/ObjectFile.h"
+#include "llvm/Support/Format.h"
+#include "llvm/Support/SourceMgr.h"
+#include "llvm/Support/raw_ostream.h"
+#include "llvm/Support/system_error.h"
+#include "llvm/Support/Win64EH.h"
+#include <algorithm>
+#include <cstring>
+
+using namespace llvm;
+using namespace object;
+using namespace llvm::Win64EH;
+
+// Returns the name of the unwind code.
+static StringRef getUnwindCodeTypeName(uint8_t Code) {
+  switch(Code) {
+  default: llvm_unreachable("Invalid unwind code");
+  case UOP_PushNonVol: return "UOP_PushNonVol";
+  case UOP_AllocLarge: return "UOP_AllocLarge";
+  case UOP_AllocSmall: return "UOP_AllocSmall";
+  case UOP_SetFPReg: return "UOP_SetFPReg";
+  case UOP_SaveNonVol: return "UOP_SaveNonVol";
+  case UOP_SaveNonVolBig: return "UOP_SaveNonVolBig";
+  case UOP_SaveXMM128: return "UOP_SaveXMM128";
+  case UOP_SaveXMM128Big: return "UOP_SaveXMM128Big";
+  case UOP_PushMachFrame: return "UOP_PushMachFrame";
+  }
+}
+
+// Returns the name of a referenced register.
+static StringRef getUnwindRegisterName(uint8_t Reg) {
+  switch(Reg) {
+  default: llvm_unreachable("Invalid register");
+  case 0: return "RAX";
+  case 1: return "RCX";
+  case 2: return "RDX";
+  case 3: return "RBX";
+  case 4: return "RSP";
+  case 5: return "RBP";
+  case 6: return "RSI";
+  case 7: return "RDI";
+  case 8: return "R8";
+  case 9: return "R9";
+  case 10: return "R10";
+  case 11: return "R11";
+  case 12: return "R12";
+  case 13: return "R13";
+  case 14: return "R14";
+  case 15: return "R15";
+  }
+}
+
+// Calculates the number of array slots required for the unwind code.
+static unsigned getNumUsedSlots(const UnwindCode &UnwindCode) {
+  switch (UnwindCode.getUnwindOp()) {
+  default: llvm_unreachable("Invalid unwind code");
+  case UOP_PushNonVol:
+  case UOP_AllocSmall:
+  case UOP_SetFPReg:
+  case UOP_PushMachFrame:
+    return 1;
+  case UOP_SaveNonVol:
+  case UOP_SaveXMM128:
+    return 2;
+  case UOP_SaveNonVolBig:
+  case UOP_SaveXMM128Big:
+    return 3;
+  case UOP_AllocLarge:
+    return (UnwindCode.getOpInfo() == 0) ? 2 : 3;
+  }
+}
+
+// Prints one unwind code. Because an unwind code can occupy up to 3 slots in
+// the unwind codes array, this function requires that the correct number of
+// slots is provided.
+static void printUnwindCode(ArrayRef<UnwindCode> UCs) {
+  assert(UCs.size() >= getNumUsedSlots(UCs[0]));
+  outs() <<  format("    0x%02x: ", unsigned(UCs[0].u.CodeOffset))
+         << getUnwindCodeTypeName(UCs[0].getUnwindOp());
+  switch (UCs[0].getUnwindOp()) {
+  case UOP_PushNonVol:
+    outs() << " " << getUnwindRegisterName(UCs[0].getOpInfo());
+    break;
+  case UOP_AllocLarge:
+    if (UCs[0].getOpInfo() == 0) {
+      outs() << " " << UCs[1].FrameOffset;
+    } else {
+      outs() << " " << UCs[1].FrameOffset
+                       + (((uint32_t) UCs[2].FrameOffset) << 16);
+    }
+    break;
+  case UOP_AllocSmall:
+    outs() << " " << ((UCs[0].getOpInfo()+1) * 8);
+    break;
+  case UOP_SetFPReg:
+    outs() << " ";
+    break;
+  case UOP_SaveNonVol:
+    outs() << " " << getUnwindRegisterName(UCs[0].getOpInfo())
+           << format(" [0x%04x]", 8 * UCs[1].FrameOffset);
+    break;
+  case UOP_SaveNonVolBig:
+    outs() << " " << getUnwindRegisterName(UCs[0].getOpInfo())
+           << format(" [0x%08x]", UCs[1].FrameOffset
+                    + (((uint32_t) UCs[2].FrameOffset) << 16));
+    break;
+  case UOP_SaveXMM128:
+    outs() << " XMM" << static_cast<uint32_t>(UCs[0].getOpInfo())
+           << format(" [0x%04x]", 16 * UCs[1].FrameOffset);
+    break;
+  case UOP_SaveXMM128Big:
+    outs() << " XMM" << UCs[0].getOpInfo()
+           << format(" [0x%08x]", UCs[1].FrameOffset
+                           + (((uint32_t) UCs[2].FrameOffset) << 16));
+    break;
+  case UOP_PushMachFrame:
+    outs() << " " << (UCs[0].getOpInfo() ? "w/o" : "w")
+           << " error code";
+    break;
+  }
+  outs() << "\n";
+}
+
+static void printAllUnwindCodes(ArrayRef<UnwindCode> UCs) {
+  for (const UnwindCode *I = UCs.begin(), *E = UCs.end(); I < E; ) {
+    unsigned UsedSlots = getNumUsedSlots(*I);
+    if (UsedSlots > UCs.size()) {
+      outs() << "Unwind data corrupted: Encountered unwind op "
+             << getUnwindCodeTypeName((*I).getUnwindOp())
+             << " which requires " << UsedSlots
+             << " slots, but only " << UCs.size()
+             << " remaining in buffer";
+      return ;
+    }
+    printUnwindCode(ArrayRef<UnwindCode>(I, E));
+    I += UsedSlots;
+  }
+}
+
+// Given a symbol sym this functions returns the address and section of it.
+static error_code resolveSectionAndAddress(const COFFObjectFile *Obj,
+                                           const SymbolRef &Sym,
+                                           const coff_section *&ResolvedSection,
+                                           uint64_t &ResolvedAddr) {
+  if (error_code ec = Sym.getAddress(ResolvedAddr)) return ec;
+  section_iterator iter(Obj->begin_sections());
+  if (error_code ec = Sym.getSection(iter)) return ec;
+  ResolvedSection = Obj->getCOFFSection(iter);
+  return object_error::success;
+}
+
+// Given a vector of relocations for a section and an offset into this section
+// the function returns the symbol used for the relocation at the offset.
+static error_code resolveSymbol(const std::vector<RelocationRef> &Rels,
+                                uint64_t Offset, SymbolRef &Sym) {
+  for (std::vector<RelocationRef>::const_iterator I = Rels.begin(),
+                                                  E = Rels.end();
+                                                  I != E; ++I) {
+    uint64_t Ofs;
+    if (error_code ec = I->getOffset(Ofs)) return ec;
+    if (Ofs == Offset) {
+      if (error_code ec = I->getSymbol(Sym)) return ec;
+      break;
+    }
+  }
+  return object_error::success;
+}
+
+// Given a vector of relocations for a section and an offset into this section
+// the function resolves the symbol used for the relocation at the offset and
+// returns the section content and the address inside the content pointed to
+// by the symbol.
+static error_code getSectionContents(const COFFObjectFile *Obj,
+                                     const std::vector<RelocationRef> &Rels,
+                                     uint64_t Offset,
+                                     ArrayRef<uint8_t> &Contents,
+                                     uint64_t &Addr) {
+  SymbolRef Sym;
+  if (error_code ec = resolveSymbol(Rels, Offset, Sym)) return ec;
+  const coff_section *Section;
+  if (error_code ec = resolveSectionAndAddress(Obj, Sym, Section, Addr))
+    return ec;
+  if (error_code ec = Obj->getSectionContents(Section, Contents)) return ec;
+  return object_error::success;
+}
+
+// Given a vector of relocations for a section and an offset into this section
+// the function returns the name of the symbol used for the relocation at the
+// offset.
+static error_code resolveSymbolName(const std::vector<RelocationRef> &Rels,
+                                    uint64_t Offset, StringRef &Name) {
+  SymbolRef Sym;
+  if (error_code ec = resolveSymbol(Rels, Offset, Sym)) return ec;
+  if (error_code ec = Sym.getName(Name)) return ec;
+  return object_error::success;
+}
+
+static void printCOFFSymbolAddress(llvm::raw_ostream &Out,
+                                   const std::vector<RelocationRef> &Rels,
+                                   uint64_t Offset, uint32_t Disp) {
+  StringRef Sym;
+  if (error_code ec = resolveSymbolName(Rels, Offset, Sym)) {
+    error(ec);
+    return ;
+  }
+  Out << Sym;
+  if (Disp > 0)
+    Out << format(" + 0x%04x", Disp);
+}
+
+void llvm::printCOFFUnwindInfo(const COFFObjectFile *Obj) {
+  const coff_file_header *Header;
+  if (error(Obj->getHeader(Header))) return;
+
+  if (Header->Machine != COFF::IMAGE_FILE_MACHINE_AMD64) {
+    errs() << "Unsupported image machine type "
+              "(currently only AMD64 is supported).\n";
+    return;
+  }
+
+  const coff_section *Pdata = 0;
+
+  error_code ec;
+  for (section_iterator SI = Obj->begin_sections(),
+                        SE = Obj->end_sections();
+                        SI != SE; SI.increment(ec)) {
+    if (error(ec)) return;
+
+    StringRef Name;
+    if (error(SI->getName(Name))) continue;
+
+    if (Name != ".pdata") continue;
+
+    Pdata = Obj->getCOFFSection(SI);
+    std::vector<RelocationRef> Rels;
+    for (relocation_iterator RI = SI->begin_relocations(),
+                              RE = SI->end_relocations();
+                              RI != RE; RI.increment(ec)) {
+      if (error(ec)) break;
+      Rels.push_back(*RI);
+    }
+
+    // Sort relocations by address.
+    std::sort(Rels.begin(), Rels.end(), RelocAddressLess);
+
+    ArrayRef<uint8_t> Contents;
+    if (error(Obj->getSectionContents(Pdata, Contents))) continue;
+    if (Contents.empty()) continue;
+
+    ArrayRef<RuntimeFunction> RFs(
+                  reinterpret_cast<const RuntimeFunction *>(Contents.data()),
+                                  Contents.size() / sizeof(RuntimeFunction));
+    for (const RuntimeFunction *I = RFs.begin(), *E = RFs.end(); I < E; ++I) {
+      const uint64_t SectionOffset = std::distance(RFs.begin(), I)
+                                     * sizeof(RuntimeFunction);
+
+      outs() << "Function Table:\n";
+
+      outs() << "  Start Address: ";
+      printCOFFSymbolAddress(outs(), Rels, SectionOffset +
+                              offsetof(RuntimeFunction, StartAddress),
+                              I->StartAddress);
+      outs() << "\n";
+
+      outs() << "  End Address: ";
+      printCOFFSymbolAddress(outs(), Rels, SectionOffset +
+                              offsetof(RuntimeFunction, EndAddress),
+                              I->EndAddress);
+      outs() << "\n";
+
+      outs() << "  Unwind Info Address: ";
+      printCOFFSymbolAddress(outs(), Rels, SectionOffset +
+                              offsetof(RuntimeFunction, UnwindInfoOffset),
+                              I->UnwindInfoOffset);
+      outs() << "\n";
+
+      ArrayRef<uint8_t> XContents;
+      uint64_t UnwindInfoOffset = 0;
+      if (error(getSectionContents(Obj, Rels, SectionOffset +
+                                  offsetof(RuntimeFunction, UnwindInfoOffset),
+                                  XContents, UnwindInfoOffset))) continue;
+      if (XContents.empty()) continue;
+
+      UnwindInfoOffset += I->UnwindInfoOffset;
+      if (UnwindInfoOffset > XContents.size()) continue;
+
+      const Win64EH::UnwindInfo *UI =
+                            reinterpret_cast<const Win64EH::UnwindInfo *>
+                              (XContents.data() + UnwindInfoOffset);
+
+      // The casts to int are required in order to output the value as number.
+      // Without the casts the value would be interpreted as char data (which
+      // results in garbage output).
+      outs() << "  Version: " << static_cast<int>(UI->getVersion()) << "\n";
+      outs() << "  Flags: " << static_cast<int>(UI->getFlags());
+      if (UI->getFlags()) {
+          if (UI->getFlags() & UNW_ExceptionHandler)
+            outs() << " UNW_ExceptionHandler";
+          if (UI->getFlags() & UNW_TerminateHandler)
+            outs() << " UNW_TerminateHandler";
+          if (UI->getFlags() & UNW_ChainInfo)
+            outs() << " UNW_ChainInfo";
+      }
+      outs() << "\n";
+      outs() << "  Size of prolog: "
+             << static_cast<int>(UI->PrologSize) << "\n";
+      outs() << "  Number of Codes: "
+             << static_cast<int>(UI->NumCodes) << "\n";
+      // Maybe this should move to output of UOP_SetFPReg?
+      if (UI->getFrameRegister()) {
+        outs() << "  Frame register: "
+                << getUnwindRegisterName(UI->getFrameRegister())
+                << "\n";
+        outs() << "  Frame offset: "
+                << 16 * UI->getFrameOffset()
+                << "\n";
+      } else {
+        outs() << "  No frame pointer used\n";
+      }
+      if (UI->getFlags() & (UNW_ExceptionHandler | UNW_TerminateHandler)) {
+        // FIXME: Output exception handler data
+      } else if (UI->getFlags() & UNW_ChainInfo) {
+        // FIXME: Output chained unwind info
+      }
+
+      if (UI->NumCodes)
+        outs() << "  Unwind Codes:\n";
+
+      printAllUnwindCodes(ArrayRef<UnwindCode>(&UI->UnwindCodes[0],
+                          UI->NumCodes));
+
+      outs() << "\n\n";
+      outs().flush();
+    }
+  }
+}
diff --git a/tools/llvm-objdump/llvm-objdump.cpp b/tools/llvm-objdump/llvm-objdump.cpp
index ddfcca3..282bf01 100644
--- a/tools/llvm-objdump/llvm-objdump.cpp
+++ b/tools/llvm-objdump/llvm-objdump.cpp
@@ -104,9 +104,16 @@ static cl::opt<bool>
 NoShowRawInsn("no-show-raw-insn", cl::desc("When disassembling instructions, "
                                            "do not print the instruction bytes."));
 
+static cl::opt<bool>
+UnwindInfo("unwind-info", cl::desc("Display unwind information"));
+
+static cl::alias
+UnwindInfoShort("u", cl::desc("Alias for --unwind-info"),
+                cl::aliasopt(::UnwindInfo));
+
 static StringRef ToolName;
 
-static bool error(error_code ec) {
+bool llvm::error(error_code ec) {
   if (!ec) return false;
 
   outs() << ToolName << ": error reading file: " << ec.message() << ".\n";
@@ -165,7 +172,7 @@ void llvm::DumpBytes(StringRef bytes) {
   outs() << output;
 }
 
-static bool RelocAddressLess(RelocationRef a, RelocationRef b) {
+bool llvm::RelocAddressLess(RelocationRef a, RelocationRef b) {
   uint64_t a_addr, b_addr;
   if (error(a.getAddress(a_addr))) return false;
   if (error(b.getAddress(b_addr))) return false;
@@ -573,6 +580,19 @@ static void PrintSymbolTable(const ObjectFile *o) {
   }
 }
 
+static void PrintUnwindInfo(const ObjectFile *o) {
+  outs() << "Unwind info:\n\n";
+
+  if (const COFFObjectFile *coff = dyn_cast<COFFObjectFile>(o)) {
+    printCOFFUnwindInfo(coff);
+  } else {
+    // TODO: Extract DWARF dump tool to objdump.
+    errs() << "This operation is only currently supported "
+              "for COFF object files.\n";
+    return;
+  }
+}
+
 static void DumpObject(const ObjectFile *o) {
   outs() << '\n';
   outs() << o->getFileName()
@@ -588,6 +608,8 @@ static void DumpObject(const ObjectFile *o) {
     PrintSectionContents(o);
   if (SymbolTable)
     PrintSymbolTable(o);
+  if (::UnwindInfo)
+    PrintUnwindInfo(o);
 }
 
 /// @brief Dump each object file in \a a;
@@ -666,7 +688,8 @@ int main(int argc, char **argv) {
       && !Relocations
       && !SectionHeaders
       && !SectionContents
-      && !SymbolTable) {
+      && !SymbolTable
+      && !::UnwindInfo) {
     cl::PrintHelpMessage();
     return 2;
   }
diff --git a/tools/llvm-objdump/llvm-objdump.h b/tools/llvm-objdump/llvm-objdump.h
index aa71b77..9f5a8c3 100644
--- a/tools/llvm-objdump/llvm-objdump.h
+++ b/tools/llvm-objdump/llvm-objdump.h
@@ -17,12 +17,21 @@
 
 namespace llvm {
 
+namespace object {
+  class COFFObjectFile;
+  class RelocationRef;
+}
+class error_code;
+
 extern cl::opt<std::string> TripleName;
 extern cl::opt<std::string> ArchName;
 
 // Various helper functions.
+bool error(error_code ec);
+bool RelocAddressLess(object::RelocationRef a, object::RelocationRef b);
 void DumpBytes(StringRef bytes);
 void DisassembleInputMachO(StringRef Filename);
+void printCOFFUnwindInfo(const object::COFFObjectFile* o);
 
 class StringRefMemoryObject : public MemoryObject {
   virtual void anchor();
-- 
1.8.0.msysgit.0



More information about the llvm-commits mailing list