[lldb-dev] Advice on debugging DSP and Harvard architectures

Mon Jun 2 13:48:17 PDT 2014

> On Jun 2, 2014, at 4:06 AM, Matthew Gardiner <mg11 at csr.com> wrote:
> 
> Greg Clayton wrote:
>> Addresses in LLDB are represented by
>> 
>> class lldb_private::Address {
>>     lldb::SectionWP m_section_wp;   	///< The section for the address, can be NULL.
>>     std::atomic<lldb::addr_t> m_offset; ///< Offset into section if \a m_section_wp is valid...
>> }
>> 
>> The section class:
>> 
>> class lldb_private::Section {
>>     ObjectFile      *m_obj_file;        // The object file that data for this section should be read from
>>     lldb::SectionType m_type;           // The type of this section
>>     lldb::SectionWP m_parent_wp;        // Weak pointer to parent section
>>     ConstString     m_name;             // Name of this section
>>     lldb::addr_t    m_file_addr;        // The absolute file virtual address range of this section if m_parent == NULL,
>>                                         // offset from parent file virtual address if m_parent != NULL
>>     lldb::addr_t    m_byte_size;        // Size in bytes that this section will occupy in memory at runtime
>>     lldb::offset_t  m_file_offset;      // Object file offset (if any)
>>     lldb::offset_t  m_file_size;        // Object file size (can be smaller than m_byte_size for zero filled sections...)
>>     SectionList     m_children;         // Child sections
>>     bool            m_fake:1,           // If true, then this section only can contain the address if one of its
>>                                         // children contains an address. This allows for gaps between the children
>>                                         // that are contained in the address range for this section, but do not produce
>>                                         // hits unless the children contain the address.
>>                     m_encrypted:1,      // Set to true if the contents are encrypted
>>                     m_thread_specific:1;// This section is thread specific
>> 
>> };
>> 
>> The section type "m_type" is one of:
>> 
>>     typedef enum SectionType
>>     {
>>         eSectionTypeInvalid,
>>         eSectionTypeCode,
>>         eSectionTypeContainer,              // The section contains child sections
>>         eSectionTypeData,
>>         eSectionTypeDataCString,            // Inlined C string data
>>         eSectionTypeDataCStringPointers,    // Pointers to C string data
>>         eSectionTypeDataSymbolAddress,      // Address of a symbol in the symbol table
>>         eSectionTypeData4,
>>         eSectionTypeData8,
>>         eSectionTypeData16,
>>         eSectionTypeDataPointers,
>>         eSectionTypeDebug,
>>         eSectionTypeZeroFill,
>>         eSectionTypeDataObjCMessageRefs,    // Pointer to function pointer + selector
>>         eSectionTypeDataObjCCFStrings,      // Objective C const CFString/NSString objects
>>         eSectionTypeDWARFDebugAbbrev,
>>         eSectionTypeDWARFDebugAranges,
>>         eSectionTypeDWARFDebugFrame,
>>         eSectionTypeDWARFDebugInfo,
>>         eSectionTypeDWARFDebugLine,
>>         eSectionTypeDWARFDebugLoc,
>>         eSectionTypeDWARFDebugMacInfo,
>>         eSectionTypeDWARFDebugPubNames,
>>         eSectionTypeDWARFDebugPubTypes,
>>         eSectionTypeDWARFDebugRanges,
>>         eSectionTypeDWARFDebugStr,
>>         eSectionTypeDWARFAppleNames,
>>         eSectionTypeDWARFAppleTypes,
>>         eSectionTypeDWARFAppleNamespaces,
>>         eSectionTypeDWARFAppleObjC,
>>         eSectionTypeELFSymbolTable,       // Elf SHT_SYMTAB section
>>         eSectionTypeELFDynamicSymbols,    // Elf SHT_DYNSYM section
>>         eSectionTypeELFRelocationEntries, // Elf SHT_REL or SHT_REL section
>>         eSectionTypeELFDynamicLinkInfo,   // Elf SHT_DYNAMIC section
>>         eSectionTypeEHFrame,
>>         eSectionTypeOther
>>              } SectionType;
>> 
>> So we see we have eSectionTypeCode and eSectionTypeData.
>> 
>> This could be used to help make the correct reads if addresses fall within known address ranges that fall into sections within a binary.
> 
> Thanks for your help with this Greg. I am currently trying to understand the above structures. Probably take some time before I get it all clear in my head, though.
>> I am guessing that there are code and data reads that are not found within any sections from files right?
> I can't comment on your above question just yet, since I'm concentrating figuring out how get a "disassemble" command (from lldb) to read from the correct bus on our devices.
> We are concerned that disassembling always reads from the device (not from ELF), since:
> 
> 1. we prefer to always read from the device for dis since it is easy then to spot if our users have chosen the wrong elf file.
> 2. we may try to debug without symbol files. This is a corner case however.
> 3. we may encounter self-modifying code.
> 
> As a quick check I did try debugging a native 64-bit linux process on linux, and when I invoked a simple disassemble from address command (e.g. di -s 0x4004f0 -c 10), I did observe that the target's memory is read:
> 
> #0 lldb_private::Process::ReadMemoryFromInferior
> #1 lldb_private::MemoryCache::Read
> #2 lldb_private::Process::ReadMemory
> #3 .lldb_private::Target::ReadMemory
> ...
> #5 lldb_private::Disassembler::Disassemble

You are correct, we always use the memory from the device because relocations might have been performed on data and code references.

> (I'll try debugging using a remote target, shortly, for comparision...)
> 
> Whilst debugging, I did observe that in the parameter:
> "const Address &start_address" of #5 lldb_private::Disassembler::Disassemble
> that the m_section_wp data is 0x0. In your reply, do you suggest that I arrange that this data is populated with a valid section pointer whose m_type is eSectionTypeCode?

No, some addresses will resolve to a section that is "eSectionTypeCode" + offset, but others might not resolve this way. Kind of like a variable, as a global variable, will exist in a section whose type is eSectionTypeData and it will have an offset, but a lot of data, like anything on the stack on heap, won't resolve to a section + offset. 

So it is probably safe to say that your data might be on the stack or heap and in that case you can't resolve in a lldb_private::Address. In this case it will have no section and it will have an absolute offset which is the address itself.

> 
>> 
>> If so we would need to change all functions that take a "load address ("lldb::addr_t load_addr") into something that takes a load address + segment which should be a new struct type that like:
>> 
>> ResolvedAddress {
>>   lldb::addr_t addr;
>>   lldb::segment_t segment;
>> };
>> 
>> Then all things like Read/Write memory in the process would need to be switched over to use the ResolvedAddress.
>> 
>> The lldb_private::Address function that is:
>> 
>>     lldb::addr_t
>>     Address::GetLoadAddress (Target *target) const;
>> 
>> Would now need to be switched over to:
>> 
>>     ResolvedAddress
>>     Address::GetLoadAddress (Target *target) const;
>> 
>> We would need an invalid segment ID (like UINT32_MAX or UINT64_MAX) to indicate that there is no segment.
> 
> I couldn't find segment_t in my checkout. So I assume that you're floating this as an idea for me to try out :-)

Yes. segment_t would be a uint32_t or a uint64_t. A uint64_t is probably best in case a segment identifier on a system is actually a pointer to a segment structure.

> I could certainly give it a try with my working copy... and let you know how I get on.
> 
> Were you suggesting that the value of segment_t  for our Harvard case would be hard-coded somewhere in our Target code, and if the m_section_wp of the Address object is a valid code section, then we'd pull out this constant?

If you have an address that was in a code or data section you could just use the lldb_private::Address as is, but when we are asked to resolve it into a ResolvedAddress, you would lookup the lldb::segment_t for code or data and return it:

ResolvedAddress
Address::GetLoadAddress (Target *target) const
{
    ResolvedAddress resolved_addr; // Initialize with invalid value
    SectionSP section_sp (GetSection());    SectionSP section_sp (GetSection());
    if (section_sp)
    {
        if (target)
        {
            addr_t sect_load_addr = section_sp->GetLoadBaseAddress (target);

            if (sect_load_addr != LLDB_INVALID_ADDRESS)
            {
		resolved_addr.addr = sect_load_addr + m_offset;
                // new function for section which knows its segment based off of the section type
                resolved_addr.segment = section_sp->GetSegment();
            }
        }
    }
    else if (!SectionWasDeletedPrivate())
    {
        // We don't have a section so the offset is the load address
        resolved_addr.addr = m_offset;
        // Given a raw address how could be ever determine the right segment????
        resolved_addr.segment = ????; 
    }
    return resolved_addr;
}

Notice above the "resolved_addr.segment = section_sp->GetSegment();"

This would be a new function that you would add to lldb_private::Section. As you build your sections you can probably set the segment ID correctly.

The big problem is in the "else if()" clause, there is no way to take a raw address and set its segment correctly. And this is the biggest drawback of the current attempted solution. There is not a 1 to 1 mapping from a "load" address to a section + offset address. This poses a huge problem for debuggers.

The only way to solve this would be to replace all "lldb::addr_t" in all of the sources, which is defined currently as:

namespace lldb 
{
    typedef uint64_t addr_t;
}

To be:

namespace lldb 
{
    typedef struct addr_t {
        uint64_t addr;
        uint64_t segment;
    }
}

Now everywhere that used to take or return a lldb::addr_t would return this new struct. 

>> So all in all this would be quite a big fix that would involve a lot of the code.
> 
> Indeed, but from my perspective probably a good way for me to learn more of the code-base.

Yes, so probably the best way is to replace lldb::addr_t with the struct I showed above and add all sorts of operators  (+, -, +=, -=, <, <=, etc) to the struct to it can behave just like an integer when needed.

> 
>> This would be a big change. Not sure how other debuggers handle 24 bit bytes. It might be better to leave this as is and if you read 3 bytes of memory from your DSP, you get 9 bytes back.
>> 
>> If a variable for your DSP is referenced in DWARF, what does the byte size show? The actual size in 8 bit bytes, or the size in 24 bit bytes?
> 
> I'm not sure on this one, Greg. I'm leaving it for one of my colleagues to research this further, then get back.
>> It would be great to enable support for these kinds of architectures in LLDB, and it will take some work, but we should be able to make it happen.
> Indeed. I'll keep you posted with my progress on the above.

Yes, I would start with redefining lldb::addr_t to the struct and get that compiling and passing the test suite. All instances of lldb::addr_t would always contain an invalid segment ID and thus all memory read/write calls would do what they do now. 

Then you start to try and get your DSP debugger to start resolving addresses correctly with the right segment ID. I believe the GDB remote protocol has memory read/write packets that can take a segment ID.

Greg