[Lldb-commits] [lldb] [lldb] Improve unwinding for discontinuous functions (PR #111409)

Tue Oct 8 01:07:38 PDT 2024

labath wrote:

Thanks for the quick response, Jason.

I think it's quite possible that you haven't run into this situation before, because the final representation depends on what tool was used to split the functions. Judging by the name (`cold.1`), I think you're using the llvm "hot-cold-split" pass -- which does not generate this kind of output. Instead it creates a separate function, with its own dwarf description (DW_TAG_subprogram) and everything. This is good for unwinding, in that the "functions" stay continuous, but maybe not so good for other things (FWICS, the synthetic DW_TAG_subprogram does not contain any variable information).

The thing that produces this output is the `-fbasic-block-sections` flag (a.k.a "propeller"). Here we have a single DW_TAG_subprogram, which has a DW_AT_ranges attribute which all of the (discontinuous) parts of the function. However, this flag is pretty new, and does not work on darwin yet (probably because noone implemented it there). In principle, I don't think the propeller is doing anything wrong (DW_AT_ranges exists so that we could describe situations like this), but it does break some assumptions in lldb.

The problem begins in `lldb_private::Function`, which assumes that a single address range (`m_address_range`) is enough to describe it. The variable contains a comment ("The function address range that covers the widest range needed to contain all blocks") which could be interpreted to mean that one should expect it to also contain some unrelated code, but I don't know if that was the intention, and it's definitely not how the unwinder uses this information.

I've considered (and that's something I'd like to do independently of this patch) changing the `lldb_private::Function` interface to vend discontinuous ranges, but that still wouldn't directly help the unwinder, as we'd need to handle the discontinuity there as well. What it would allow us is to handle this situation with more finesse. We could e.g. check whether the function contains more than one address range, and then choose which range (and from which source) to use for caching. I think this is a viable path forward if you think this change is too broad.

> From a symbol table point of view, if the symbol names haven't been stripped, the function symbol will have the range of the main function body only; the .cold.1 function would be a separate Symbol. In a stripped binary (no symbol name), I know ObjectFileMachO will use the eh_frame start addresses to create fake symbol table names & entries, I don't know if ObjectFileELF does that, but in that case a Symbol would be equivalent to eh_frame as a source of information.

ObjectFileELF does that too. The problems begin only when debug info is present because lldb_private::Function will claim the maximal range, as I've described above. However, this creation of synthetic symbols from unwind info is perhaps a good reason for why changing the order of range sources is safe(ish). (Although it could be a reason for not querying the eh_frame for range information at all)

https://github.com/llvm/llvm-project/pull/111409