[lldb-dev] Advice on debugging DSP and Harvard architectures
Greg Clayton
gclayton at apple.com
Mon Jun 2 13:48:17 PDT 2014
> On Jun 2, 2014, at 4:06 AM, Matthew Gardiner <mg11 at csr.com> wrote:
>
> Greg Clayton wrote:
>> Addresses in LLDB are represented by
>>
>> class lldb_private::Address {
>> lldb::SectionWP m_section_wp; ///< The section for the address, can be NULL.
>> std::atomic<lldb::addr_t> m_offset; ///< Offset into section if \a m_section_wp is valid...
>> }
>>
>> The section class:
>>
>> class lldb_private::Section {
>> ObjectFile *m_obj_file; // The object file that data for this section should be read from
>> lldb::SectionType m_type; // The type of this section
>> lldb::SectionWP m_parent_wp; // Weak pointer to parent section
>> ConstString m_name; // Name of this section
>> lldb::addr_t m_file_addr; // The absolute file virtual address range of this section if m_parent == NULL,
>> // offset from parent file virtual address if m_parent != NULL
>> lldb::addr_t m_byte_size; // Size in bytes that this section will occupy in memory at runtime
>> lldb::offset_t m_file_offset; // Object file offset (if any)
>> lldb::offset_t m_file_size; // Object file size (can be smaller than m_byte_size for zero filled sections...)
>> SectionList m_children; // Child sections
>> bool m_fake:1, // If true, then this section only can contain the address if one of its
>> // children contains an address. This allows for gaps between the children
>> // that are contained in the address range for this section, but do not produce
>> // hits unless the children contain the address.
>> m_encrypted:1, // Set to true if the contents are encrypted
>> m_thread_specific:1;// This section is thread specific
>>
>> };
>>
>> The section type "m_type" is one of:
>>
>> typedef enum SectionType
>> {
>> eSectionTypeInvalid,
>> eSectionTypeCode,
>> eSectionTypeContainer, // The section contains child sections
>> eSectionTypeData,
>> eSectionTypeDataCString, // Inlined C string data
>> eSectionTypeDataCStringPointers, // Pointers to C string data
>> eSectionTypeDataSymbolAddress, // Address of a symbol in the symbol table
>> eSectionTypeData4,
>> eSectionTypeData8,
>> eSectionTypeData16,
>> eSectionTypeDataPointers,
>> eSectionTypeDebug,
>> eSectionTypeZeroFill,
>> eSectionTypeDataObjCMessageRefs, // Pointer to function pointer + selector
>> eSectionTypeDataObjCCFStrings, // Objective C const CFString/NSString objects
>> eSectionTypeDWARFDebugAbbrev,
>> eSectionTypeDWARFDebugAranges,
>> eSectionTypeDWARFDebugFrame,
>> eSectionTypeDWARFDebugInfo,
>> eSectionTypeDWARFDebugLine,
>> eSectionTypeDWARFDebugLoc,
>> eSectionTypeDWARFDebugMacInfo,
>> eSectionTypeDWARFDebugPubNames,
>> eSectionTypeDWARFDebugPubTypes,
>> eSectionTypeDWARFDebugRanges,
>> eSectionTypeDWARFDebugStr,
>> eSectionTypeDWARFAppleNames,
>> eSectionTypeDWARFAppleTypes,
>> eSectionTypeDWARFAppleNamespaces,
>> eSectionTypeDWARFAppleObjC,
>> eSectionTypeELFSymbolTable, // Elf SHT_SYMTAB section
>> eSectionTypeELFDynamicSymbols, // Elf SHT_DYNSYM section
>> eSectionTypeELFRelocationEntries, // Elf SHT_REL or SHT_REL section
>> eSectionTypeELFDynamicLinkInfo, // Elf SHT_DYNAMIC section
>> eSectionTypeEHFrame,
>> eSectionTypeOther
>> } SectionType;
>>
>> So we see we have eSectionTypeCode and eSectionTypeData.
>>
>> This could be used to help make the correct reads if addresses fall within known address ranges that fall into sections within a binary.
>
> Thanks for your help with this Greg. I am currently trying to understand the above structures. Probably take some time before I get it all clear in my head, though.
>> I am guessing that there are code and data reads that are not found within any sections from files right?
> I can't comment on your above question just yet, since I'm concentrating figuring out how get a "disassemble" command (from lldb) to read from the correct bus on our devices.
> We are concerned that disassembling always reads from the device (not from ELF), since:
>
> 1. we prefer to always read from the device for dis since it is easy then to spot if our users have chosen the wrong elf file.
> 2. we may try to debug without symbol files. This is a corner case however.
> 3. we may encounter self-modifying code.
>
> As a quick check I did try debugging a native 64-bit linux process on linux, and when I invoked a simple disassemble from address command (e.g. di -s 0x4004f0 -c 10), I did observe that the target's memory is read:
>
> #0 lldb_private::Process::ReadMemoryFromInferior
> #1 lldb_private::MemoryCache::Read
> #2 lldb_private::Process::ReadMemory
> #3 .lldb_private::Target::ReadMemory
> ...
> #5 lldb_private::Disassembler::Disassemble
You are correct, we always use the memory from the device because relocations might have been performed on data and code references.
> (I'll try debugging using a remote target, shortly, for comparision...)
>
> Whilst debugging, I did observe that in the parameter:
> "const Address &start_address" of #5 lldb_private::Disassembler::Disassemble
> that the m_section_wp data is 0x0. In your reply, do you suggest that I arrange that this data is populated with a valid section pointer whose m_type is eSectionTypeCode?
No, some addresses will resolve to a section that is "eSectionTypeCode" + offset, but others might not resolve this way. Kind of like a variable, as a global variable, will exist in a section whose type is eSectionTypeData and it will have an offset, but a lot of data, like anything on the stack on heap, won't resolve to a section + offset.
So it is probably safe to say that your data might be on the stack or heap and in that case you can't resolve in a lldb_private::Address. In this case it will have no section and it will have an absolute offset which is the address itself.
>
>>
>> If so we would need to change all functions that take a "load address ("lldb::addr_t load_addr") into something that takes a load address + segment which should be a new struct type that like:
>>
>> ResolvedAddress {
>> lldb::addr_t addr;
>> lldb::segment_t segment;
>> };
>>
>> Then all things like Read/Write memory in the process would need to be switched over to use the ResolvedAddress.
>>
>> The lldb_private::Address function that is:
>>
>> lldb::addr_t
>> Address::GetLoadAddress (Target *target) const;
>>
>> Would now need to be switched over to:
>>
>> ResolvedAddress
>> Address::GetLoadAddress (Target *target) const;
>>
>> We would need an invalid segment ID (like UINT32_MAX or UINT64_MAX) to indicate that there is no segment.
>
> I couldn't find segment_t in my checkout. So I assume that you're floating this as an idea for me to try out :-)
Yes. segment_t would be a uint32_t or a uint64_t. A uint64_t is probably best in case a segment identifier on a system is actually a pointer to a segment structure.
> I could certainly give it a try with my working copy... and let you know how I get on.
>
> Were you suggesting that the value of segment_t for our Harvard case would be hard-coded somewhere in our Target code, and if the m_section_wp of the Address object is a valid code section, then we'd pull out this constant?
If you have an address that was in a code or data section you could just use the lldb_private::Address as is, but when we are asked to resolve it into a ResolvedAddress, you would lookup the lldb::segment_t for code or data and return it:
ResolvedAddress
Address::GetLoadAddress (Target *target) const
{
ResolvedAddress resolved_addr; // Initialize with invalid value
SectionSP section_sp (GetSection()); SectionSP section_sp (GetSection());
if (section_sp)
{
if (target)
{
addr_t sect_load_addr = section_sp->GetLoadBaseAddress (target);
if (sect_load_addr != LLDB_INVALID_ADDRESS)
{
resolved_addr.addr = sect_load_addr + m_offset;
// new function for section which knows its segment based off of the section type
resolved_addr.segment = section_sp->GetSegment();
}
}
}
else if (!SectionWasDeletedPrivate())
{
// We don't have a section so the offset is the load address
resolved_addr.addr = m_offset;
// Given a raw address how could be ever determine the right segment????
resolved_addr.segment = ????;
}
return resolved_addr;
}
Notice above the "resolved_addr.segment = section_sp->GetSegment();"
This would be a new function that you would add to lldb_private::Section. As you build your sections you can probably set the segment ID correctly.
The big problem is in the "else if()" clause, there is no way to take a raw address and set its segment correctly. And this is the biggest drawback of the current attempted solution. There is not a 1 to 1 mapping from a "load" address to a section + offset address. This poses a huge problem for debuggers.
The only way to solve this would be to replace all "lldb::addr_t" in all of the sources, which is defined currently as:
namespace lldb
{
typedef uint64_t addr_t;
}
To be:
namespace lldb
{
typedef struct addr_t {
uint64_t addr;
uint64_t segment;
}
}
Now everywhere that used to take or return a lldb::addr_t would return this new struct.
>> So all in all this would be quite a big fix that would involve a lot of the code.
>
> Indeed, but from my perspective probably a good way for me to learn more of the code-base.
Yes, so probably the best way is to replace lldb::addr_t with the struct I showed above and add all sorts of operators (+, -, +=, -=, <, <=, etc) to the struct to it can behave just like an integer when needed.
>
>> This would be a big change. Not sure how other debuggers handle 24 bit bytes. It might be better to leave this as is and if you read 3 bytes of memory from your DSP, you get 9 bytes back.
>>
>> If a variable for your DSP is referenced in DWARF, what does the byte size show? The actual size in 8 bit bytes, or the size in 24 bit bytes?
>
> I'm not sure on this one, Greg. I'm leaving it for one of my colleagues to research this further, then get back.
>> It would be great to enable support for these kinds of architectures in LLDB, and it will take some work, but we should be able to make it happen.
> Indeed. I'll keep you posted with my progress on the above.
Yes, I would start with redefining lldb::addr_t to the struct and get that compiling and passing the test suite. All instances of lldb::addr_t would always contain an invalid segment ID and thus all memory read/write calls would do what they do now.
Then you start to try and get your DSP debugger to start resolving addresses correctly with the right segment ID. I believe the GDB remote protocol has memory read/write packets that can take a segment ID.
Greg
More information about the lldb-dev
mailing list