[Lldb-commits] [lldb] Reapply "[lldb/aarch64] Fix unwinding when signal interrupts a leaf f… (PR #92503)

Wed May 22 01:20:00 PDT 2024

labath wrote:

I'm not quite sure what you have in mind, but I can tell you what's been going through my mind in the context of the `m_all_registers_available` check in `lldb/source/Target/RegisterContextUnwind.cpp` . The way I see it, this check (at least the part about the RA register(*)) is heuristic that's impossible to get always right. Like, I could construct a test case using functions with non-standard ABIs where a non-leaf function legitimately has a `lr=<same>` rule. Such code would execute correctly but lldb would refuse to unwind it due to the `lr=<same>` restriction.

The only thing needed to construct such a test case is one (possibly leaf) function, which "returns" to a register other than `lr` (it could even return to a memory address). Then, its caller could "call" that function by storing the return address to some other place, and leaving it's own return address register (i.e, `lr`) untouched. (I don't know why anyone would do such a thing, since it would likely mess up the CPUs branch predictor, but dwarf is perfectly capable of expressing code like this)

Another interesting case is that of a function (an abi-respecting function this time), which chooses to save the `lr` to a different (non-volatile) register, instead of the usual stack location. This function could then call other functions as usual, but we wouldn't be able to unwind from it in a single step -- to get its value of `lr` (i.e., the `pc` of the frame *above* it), we would need to find where has the frame *below* stored the register that `lr` was saved to. (I also don't know of anyone writing code like this, but unlike the previous case, I can imagine some very specific situations where such an optimization might be profitable.)

All of this is to say that I don't think there is a way to change this piece of code to be correct all the time -- we'd just be trading one set of edge cases for the other. I think that the most correct solution would be to remove this check altogether. I'm not sure why it exists, but I expect it has something to do with preventing looping stacks. However, if I remember correctly, we already have some checks to prevent stack loops (if not, then we should have, as there are other ways to create stack loops), so I think it should be possible to let the `lr=<same>` (*) rule through here and catch erroneous cases further down the road. However, I also don't have any plans to pursue this direction.

(*) I'm only talking about the `lr` rule everywhere. I *think* that a `pc=<same>` rule would always be an error (even in signal handlers), so we should be able to keep that here. OTOH, if our loop detection code is robust enough, then there should be no harm in letting this through either...

https://github.com/llvm/llvm-project/pull/92503