[Lldb-commits] FreeBSD kernel debugging fixes

Wed Sep 20 16:37:35 PDT 2017

Jason,

I'm performing address to symbol resolution after setting load addresses for all sections.  It correctly identifies the module
and section where the address resides, and in many cases gives correct results.  However, because it translates the
load address to a file address before indexing into the symtab, overlapping file addresses for sections in the same module
can cause the wrong name to be returned, such as returning a symbol in the bss section even though the address is
in the text section.

An alternative way to fix this would be to split m_file_addr_to_index into per-section maps, but that doesn't solve the
problem of ResolveFileAddress being unusable, or the general expectation within lldb that file addresses uniquely
identify something within a module.

-- Brian
________________________________________
From: Jason Molenda [jmolenda at apple.com]
Sent: Wednesday, September 20, 2017 4:18 PM
To: Koropoff, Brian
Cc: lldb-commits at lists.llvm.org
Subject: Re: [Lldb-commits] FreeBSD kernel debugging fixes

Right, we always record symbol addresses as the offset to the section that contains them.  The Address class in lldb is used everywhere for this.  The Target has a SectionLoadList which tells us where each Section is loaded in memory -- this is how you translate an Address object to a load address.

When sections have not been given their load addresses yet, lldb will treat file addresses == load addresses.  Which sounds like what you're seeing.  So we are often in the situation where an address->symbol lookup results in multiple symbols being matched; they are all overlapping at this point.

As soon as the sections are given load addresses in the Target, then this overlapping problem is resolved.

gdb didn't have the difference between load address and file address and we had to play games with shuffling things around to arbitrary addresses so they don't overlap.  (and from what I can recall, changing the load address of a binary in lldb meant going through the symbol table to update all the addresses -- we wanted to separate the symbol table addresses from the load addresses in a given target, so we came up with this system.)

Are you trying to do address->symbol resolution before you know where the binaries are actually loaded in the address space?  Or are you missing the part that sets the load addresses for the sections in the Target?  I suspect it's the latter.

> On Sep 20, 2017, at 4:12 PM, Koropoff, Brian <Brian.Koropoff at dell.com> wrote:
>
> Jason,
>
> I'm setting the load addresses appropriately for all sections in my script.  The problem is that the symbol map
> is internally indexed by the "file address", which is the virtual address that the ELF section asks to
> be loaded at, regardless of what the actual load address turns out to be:
>
> https://github.com/llvm-mirror/lldb/blob/master/source/Symbol/Symtab.cpp#L878
>
> Symbol lookup proceeds via the file address:
>
> https://github.com/llvm-mirror/lldb/blob/master/source/Core/Module.cpp#L510
>
> From what I can gather, the use of file addresses is to avoid needing to recompute the symtab
> when a load address is changed.  This implementation detail means that file addresses
> must be non-overlapping even if the load addresses are correctly set.  The generation of synthetic
> file addresses has the added benefit of permitting offline symbolication by (module, file address) pair
> without needing to know the load map, which appears to be an intended use case, e.g.
> SBModule::ResolveFileAddress():
>
> https://github.com/llvm-mirror/lldb/blob/master/include/lldb/API/SBModule.h#L120
>
> Regards,
> Brian Koropoff
> Dell EMC
>
> ________________________________________
> From: Jason Molenda [jmolenda at apple.com]
> Sent: Wednesday, September 20, 2017 3:47 PM
> To: Koropoff, Brian
> Cc: lldb-commits at lists.llvm.org
> Subject: Re: [Lldb-commits] FreeBSD kernel debugging fixes
>
> Regarding the overlapping files -- when lldb first loads multiple binaries (but does not have a running process), it doesn't know where to set the load addresses of these binaries so they are all 0-based (or if they have a specified load address in the object file, at that address).
>
> We rely on the dynamic linker on the system to tell us where libc.so is, and then we update the target's section load list with that address.
>
> For macos kernel debugging, we have a DynamicLoaderDarwinKernel that knows how to load all the modules ("kexts") at the correct addresses for the program.  The user can also do this manually in command line lldb, like
>
> target modules add <name of binary>
> target modules load -f <name of binary> -s <slide to be applied to the load address>
>
> but it is correct behavior that in the absence of being told where the binaries are loaded in memory, lldb will load them all at their base address, often 0 in the modern days of pic code.
>
>
> I haven't looked at the patch, but a long time ago I did hack that sounds similar to yours for gdb, where I would assign binaries random addresses until we had connected to a live process & learned where they should be.  So address -> symbol resolution would work.  It never worked great and we decided to avoid doing that in lldb.
>
>
>> On Sep 20, 2017, at 3:41 PM, Koropoff, Brian via lldb-commits <lldb-commits at lists.llvm.org> wrote:
>>
>> Greetings.  I'm submitting a few patches that resolve issues I
>> encountered when using lldb to symbolicate FreeBSD kernel backtraces.
>> The problems mostly centered around FreeBSD kernel modules actually
>> being relocatable (.o) ELF Files.
>>
>> The major problems:
>>
>> - Relocations were not being applied to the DWARF debug info despite
>>  there being code to do this.  Several issues prevented it from working:
>>
>>  * Relocations are computed at the same time as the symbol table, but
>>    in the case of split debug files, symbol table parsing always
>>    redirects to the primary object file, meaning that relocations
>>    would never be applied in the debug file.
>>
>>  * There's actually no guarantee that the symbol table has been
>>    parsed yet when trying to parse debug information.
>>
>>  * When actually applying relocations, it will segfault because the
>>    object files are not mapped with MAP_PRIVATE and PROT_WRITE.
>>
>> - LLDB returned invalid results when performing ordinary
>>  address-to-symbol resolution. It turned out that the addresses
>>  specified in the section headers were all 0, so LLDB believed all the
>>  sections had overlapping "file addresses" and would sometimes
>>  return a symbol from the wrong section.
>>
>> I rearranged some of the symbol table parsing code to ensure
>> relocations would get applied consistently and added manual calls to
>> make sure it happens before trying to use DWARF info, but it feels
>> kind of hacky.  I'm open to suggestions for refactoring it.
>>
>> I solved the file address problem by computing synthetic addresses for
>> the sections in object files so that they would not overlap in LLDB's
>> lookup maps.
>>
>> With all these changes I'm able to successfully symbolicate backtraces
>> that pass through FreeBSD kernel modules.  Let me know if there is a
>> better/cleaner way to achieve any of these fixes.
>>
>> --
>>
>> Brian Koropoff
>> Dell EMC
>> <0001-ObjectFile-ELF-use-private-memory-mappings.patch><0002-ObjectFile-ELF-ensure-relocations-are-done-for-split.patch><0003-SymbolFile-DWARF-force-application-of-relocations.patch><0004-ObjectFile-ELF-create-synthetic-file-addresses-for-r.patch>_______________________________________________
>> lldb-commits mailing list
>> lldb-commits at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits
>