[llvm-dev] Status of stack walking in LLVM on Win64?

Sun Jul 3 22:34:39 PDT 2016

> Message: 3
> Date: Sun, 3 Jul 2016 17:49:50 -0700
> From: Michael Lewis via llvm-dev <llvm-dev at lists.llvm.org>
> To: Hayden Livingston <halivingston at gmail.com>
> Cc: llvm-dev <llvm-dev at lists.llvm.org>
> Subject: Re: [llvm-dev] Status of stack walking in LLVM on Win64?
> Message-ID:
> <CAEm7p3svyOi6JU6r_RCCtRfGhTgTHeRw-SR0iD+9Edv2pi71Dw at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> On Sun, Jul 3, 2016 at 2:17 PM, Hayden Livingston <halivingston at gmail.com>
> wrote:
>
>> For JITs it would appear that there is a patch needed for some kind of
>> relocations.
>>
>> https://llvm.org/bugs/show_bug.cgi?id=24233
>>
>> Is the patch really needed? What does it do? I'm not an expert here so
>> asking.
>>
>
>
> I'm not really interested in the JIT case as I said originally, so I can't
> answer that question.
>
>
>
>>
>> On Sun, Jul 3, 2016 at 2:48 AM, David Majnemer via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>>> I can confirm that LLVM emits correct data when used in an AoT
>> configuration
>>> for x64, exception handling would be totally broken without it.
>>>
>>
>
>
> Two points of clarification:
>
> - Are you talking about Win64 or just x64 in general (i.e. *nix/MacOS)?
> Again given the presence of bugs going back to 2015 (including one linked
> in this thread) and other scant data from the list, I really can't tell
> what the expected state of this functionality is on Win64.
>
> - Are you referring to data generated by LLVM that is embedded in COFF
> object files and then placed in the binary image by the linker? This data
> is at a minimum relocated by link.exe on Windows as near as I can tell. I
> do not want a dependency on link.exe. I can handle doing my own relocations
> prior to emitting the final image, but I want to know if there's a turnkey
> implementation of this already or if I have to roll my own here.
>
> Thanks,
>
>
>
> - Mike

 Windows/x64 ABI is pretty well documented. 

 - The parameter passing is probably not the same as any other system.
   (Unless people are using LLVM for UEFI development?) 
   Ignoring floating point, the first four integer parameters
   are in rcx, rdx, r8, r9. The rest are on the stack. 

 - The exception handling might *resemble* other systems, but
   surely has unique details.

 - Ghere is absolutely an unremovable dependency on a linker;
   it doesn't have to be the Microsoft linker, I believe GNU ld
   already implements this.

   The documentation should be used.

   I can summarize and such, but it is documented.

   Roughly, ignoring parameter passing and focusing only on exception handling,
   it goes like this:

   - At any point in any program, "the stack" must be "unwindable".
       I've never seen this clearly described.
       It boils down to really "non volatile registers must be restorable"
       by "a runtime" via a documented/standardized metadata, such as to
       appear as if control was returned to any function on the call stack,
       w/o running any generated code in any of the functions between
       the current stack location and the resumed-to location.

       The stack pointer is often called out specially, but in fact
       it is just another non volatile register and not really a special case.

     So then some details:
       a "leaf function" is a function that does not change any non volatile registers,
       including the stack pointer. Leaf functions can do pretty much anything,
       but they must not change any non volatile registers -- which is a severe
       restriction. Have locals essentially makes you non-leaf -- even if you
       don't call anything. A leaf function is *not* a function that makes no calls,
       but calls do make a function a non-leaf, as it changes the stack pointer.

       The slight exception here is that all functions, including leaves, do have
       4*8 bytes of scratch space in the stack available to them -- so local
       variables can be had, in that space and in volatile registers.

      The stack is walked from a leaf function merely by reading from rsp. 
      A leaf function can make a syscall, so they aren't necessarily at the bottom of the stack. 

      non-leaf functions are the interesting ones.
      They can change rsp, including such as via a call, and can change non-volatile
      registers, but all such changes (or rather, the saving of said registers) must
      be described by metadata, and the metadata
      must be findable -- via looking up a code address on the stack.

      Roughly speaking, all dlls have "pdata" -- procedure data.
      There are 3 UINT32s per non-leaf function.
      These are offsets into the image. Images are limited to 4GB in size.
      They are to the start of the function, end of the function, and to additional metadata.
      The additional metadata is called "xdata" or exception data.
      The offset to the metadata be be absent or 0, but that should be rare/nonexistant
      in practise -- it is for revealing leaf functions to static analysis for example.

      The "xdata" is then what describes how to restore non volatile registers,
      such as the order to pop them, or what offset they were saved at to the
      frame pointer or stack pointer (and which register if any is the frame pointer -- it doesn't have to be rbp,
      and most functions don't have one.)

      There are restrictions on code generation -- rsp changes and non volatile saves
      must be describable with this metadata. There is a notion of the end of the prologue,
      at this point all non volatiles that will be changed have been saved, and rsp changes
      are done. This is misleading though in that almost arbitrary code can be interleaved
      within the prologue, i.e. changes to volatile registers.

      As well, as a background, generally Windows/x64 functions don't change rsp,
      except in their prologue and the call instruction.
      They are not "pushy/poppp". However if a function uses _alloca, that
      is a contradiction. Such functions must have a frame pointer, such as rbp,
      though it doesn't have to be rbp and often is not.

      There is also a notion of chaining the data. This is useful when
      a function has "early out" paths that only change some non volatiles.

      Also there is allowance for discontiguous functions.

      Also there is no metadata for epilogues. If an exception occurs in an epilogue,
      the runtime actually look at the code being run, detects it is an epilogue
      and simulates it. As such, epilogue code generation is constrained.
      (and breakpoints within epilogues mess things up!)

      To repeat -- the unwindability is from any single instruction, be in the
      middle of a prologue, middle of an epilogue, or in the body of a function
      outside of prologue/epilogue.

      This unwindabilty serves both exception dispatch and debugger stack walking,
      and other things, like sampling profiler stack walking, or "leak tracking
      stack walking" -- stack walking is always possible, modulo bugs.
      The most common bugs are probably in hand written assemble, since
      assembly programmers have to do basically the work themselves.

      There is provision for providing the pdata at runtime for JITed code.

      The linker has to combine all the pdata and place a pointer (offset) to it
      in a documented place in the PE, similar to how imports and exports and base
      relocations are recorded.

      Anyway, see the documentation.

      - Jay