[PATCH] D86527: [3/5] [MC] [Win64EH] Produce well-formed xdata records when info is missing

Thu Aug 27 00:26:31 PDT 2020

mstorsjo added a comment.

In D86527#2240604 <https://reviews.llvm.org/D86527#2240604>, @efriedma wrote:

>> letting you append handler specific data which is supposed to follow directly after the unwind data itself
>
> Oh, that's the part I was missing; thanks.  So in well-formed code, .seh_handlerdata should come after an .seh_endprologue, and there shouldn't be any .seh_* directives or instructions between the .seh_handlerdata and the .seh_endproc?

It's actually even a bit stricter/worse than that. Not only does the xdata record contain the unwind opcodes themselves, but it also contains the function length field. So ideally `.seh_handlerdata` comes after `.seh_endfunclet`, so that the full function length is known.

In practice, there can be cases where `.seh_handlerdata` comes before `.seh_endfunclet` (or functions without that altogether), and then we need to set the function length up to the current point, as I do in D86528 <https://reviews.llvm.org/D86528>. This means that the actual unwindable region of the function only is up to this point. So if we have `.seh_handlerdata` directly after the prologue, one can't actually unwind from the body of the function, only within the prologue itself. So ideally `.seh_handlerdata` really should be as far to the end of the function as possible.

Then there's real world messes like https://github.com/mingw-w64/mingw-w64/blob/master/mingw-w64-crt/crt/crtexe.c#L179-L198, where `.seh_handlerdata` is injected via inline assembly in C code. This works fine in x86_64, because the function length itself isn't embedded in the xdata record, but is handled via the BeginAddress/EndAddress pair in the pdata record. But for the aarch64 case, that code needs to be adjusted to move the `.seh_handlerdata` bit to the end of the function. (I'll try to get that fixed after these patches settle.) It won't cover the epilogue of the function, but would at least cover the body.

>> For the contrieved .seh_handlerdata case, we could avoid outputting the unwind info itself, leave the orphaned handler specific data in the section, and not hook up the pdata entry.
>
> That's probably makes the most sense, yes.

Ok, will try to do that then.

>> Can we output warnings from this layer?
>
> Technically yes, but you've lost the source location by the time you get this deep, so it wouldn't be pretty.  Probably we should do some primitive tracking of the SEH state in the asmparser, and emit a warning from there.

Even without the source location, just giving the function name might be context enough - you'd probably only have this in cases with assembly involved anyway. But it's probably not necessary.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D86527/new/

https://reviews.llvm.org/D86527