[PATCH] D111519: [WIP] [RISCV] Emit cfi directives for function epilogue

Sun Nov 14 22:41:19 PST 2021

jrtc27 added a comment.

No. That is -O0 code and won't exhibit the issue, because the issue happens due to optimisation. Nor is main the function I believe to be the problem, since once you've unwound to main (which depends on the //callee// to have correct CFI, not the caller) there's no more unwinding to do, either implicitly (since the exception has been caught) or explicitly (since main itself performs no further actions that will request unwinding).

See https://godbolt.org/z/5xvTY9EKf. Observe that two has had its epilogue scheduled before its landing pad (corresponding to the call site that is the call to three):

  .Ltmp10:
  .Ltmp4:
          .loc    1 15 1                          # example.cpp:15:1
          ld      s0, 0(sp)                       # 8-byte Folded Reload
          # after this patch: .cfi_def_cfa sp, 16
          ld      ra, 8(sp)                       # 8-byte Folded Reload
          # after this patch: .cfi_restore ra
          # after this patch: .cfi_restore s0
          addi    sp, sp, 16
          # after this patch: .cfi_def_cfa_offset 0
          ret
  .LBB1_2:
  .Ltmp5:
  .Ltmp11:
          .loc    1 15 1 is_stmt 0                # example.cpp:15:1
          sext.w  a1, a1
          addi    a2, zero, 1
          mv      s0, a0
          .loc    1 12 5 is_stmt 1                # example.cpp:12:5
          bne     a1, a2, .LBB1_6
          mv      a0, s0
          call    __cxa_begin_catch
  ...

This means that, if you have CFI in the epilogue as with this patch (annotated above with what I believe would be emitted), it is bogusly applied to the call to `__cxa_throw` (for throwing the 42 in the catch).

Now let's consider how that unwinds. The runtime and unwinder will unwind fine all the way back to the call to `__cxa_throw` (specifically the byte after; all the range checks are `start < x <= end` not the normal `start <= x < end` since jalr writes the //next// instruction to the link register) during the search phase. Once there, the runtime will ask the unwinder to unwind another frame, as there is no exception handler there for an int. But the unwinder, trusting the CFI which says ra has already been restored (and s0), will leave ra (and s0) untouched, so after one iteration of unwinding ra still points at the byte after two's call to `__cxa_throw`. So it sees if there's an exception handler there for an int, and the answer is still no. So it unwinds again, trusting the CFI, and again leaves ra untouched, which again still points at the byte after the call to `__cxa_throw`. And so on and so on, forever trying to unwind out of two to find an exception handler but stuck in it with no escape. It's possible I've got the details slightly wrong, but they aren't really the point, the general idea of how this breaks is, and why it is crucial that all C++ call sites have CFI that will unwind accurately if followed.

GCC's code for the same sequence is:

  .LVL2:
  .LEHE0:
          .loc 1 15 1
          ld      ra,8(sp)
          .cfi_remember_state
          .cfi_restore 1
          ld      s0,0(sp)
          .cfi_restore 8
          addi    sp,sp,16
          .cfi_def_cfa_offset 0
          jr      ra
  .L7:
          .cfi_restore_state
          .loc 1 12 7
          li      a5,1
          bne     a1,a5,.L11
  .LBB2:
          .loc 1 12 18 discriminator 1
          call    __cxa_begin_catch
  ...

This is almost the same CFI as what this patch produces (GCC interleaves the `.cfi_restore`s with the corresponding loads but that doesn't really matter, so long as the stack slot and stack pointer aren't clobbered it's valid to defer a `.cfi_restore`) with the key difference that, because the epilogue has been scheduled before the landing pad, a `.cfi_remember_state` is emitted before any other CFI directives in the epilogue and a `.cfi_restore_state` is emitted as the first thing in the landing pad, restoring the state to the remembered state, namely that before the epilogue, and thus ensuring that the unwinder sees CFI that says ra is still stored on the stack (along with s0). That is the crucial part missing in LLVM, and the reason that we cannot emit CFI in epilogues.

(In case you're looking at main's assembly and notice that it too has its epilogue scheduled before its landing pad, yes that results in incorrect CFI for the landing pad, but no it doesn't matter in this case unless you're debugging, since all it does is call `__cxa_begin_catch` and `__cxa_end_catch` that do reference counting of the exception object behind the scenes, they don't need to unwind)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D111519/new/

https://reviews.llvm.org/D111519