[lldb-dev] Advice on debugging DSP and Harvard architectures

Fri May 30 16:03:34 PDT 2014

> On May 29, 2014, at 11:45 PM, Matthew Gardiner <mg11 at csr.com> wrote:
> 
> Hi folks,
> 
> I have been researching using lldb and a custom gdbserver stub to debug some of our processors. I have already played with ArchSpec.h/.cpp/Elf.h to add a processor definition, and since our tools output DWARF information, have already managed to use lldb to dump line tables.
> 
> On the gdbserver side of things I'm managing to get register read and writes to work, but I fear an issue may arise in memory reads, since the DSPs Harvard architecture dictates separate address spaces. Therefore, when we attempt to read 1) code memory to disassemble, and 2) data memory (for variables decode etc.) the stub will receive an 'm' request but interpretation of the address field is ambiguous, as it could refer to either the CODE or DATA bus.

Addresses in LLDB are represented by 

class lldb_private::Address {
    lldb::SectionWP m_section_wp;   	///< The section for the address, can be NULL.
    std::atomic<lldb::addr_t> m_offset; ///< Offset into section if \a m_section_wp is valid...
}

The section class:

class lldb_private::Section {
    ObjectFile      *m_obj_file;        // The object file that data for this section should be read from
    lldb::SectionType m_type;           // The type of this section
    lldb::SectionWP m_parent_wp;        // Weak pointer to parent section
    ConstString     m_name;             // Name of this section
    lldb::addr_t    m_file_addr;        // The absolute file virtual address range of this section if m_parent == NULL,
                                        // offset from parent file virtual address if m_parent != NULL
    lldb::addr_t    m_byte_size;        // Size in bytes that this section will occupy in memory at runtime
    lldb::offset_t  m_file_offset;      // Object file offset (if any)
    lldb::offset_t  m_file_size;        // Object file size (can be smaller than m_byte_size for zero filled sections...)
    SectionList     m_children;         // Child sections
    bool            m_fake:1,           // If true, then this section only can contain the address if one of its
                                        // children contains an address. This allows for gaps between the children
                                        // that are contained in the address range for this section, but do not produce
                                        // hits unless the children contain the address.
                    m_encrypted:1,      // Set to true if the contents are encrypted
                    m_thread_specific:1;// This section is thread specific

};

The section type "m_type" is one of:

    typedef enum SectionType
    {
        eSectionTypeInvalid,
        eSectionTypeCode,
        eSectionTypeContainer,              // The section contains child sections
        eSectionTypeData,
        eSectionTypeDataCString,            // Inlined C string data
        eSectionTypeDataCStringPointers,    // Pointers to C string data
        eSectionTypeDataSymbolAddress,      // Address of a symbol in the symbol table
        eSectionTypeData4,
        eSectionTypeData8,
        eSectionTypeData16,
        eSectionTypeDataPointers,
        eSectionTypeDebug,
        eSectionTypeZeroFill,
        eSectionTypeDataObjCMessageRefs,    // Pointer to function pointer + selector
        eSectionTypeDataObjCCFStrings,      // Objective C const CFString/NSString objects
        eSectionTypeDWARFDebugAbbrev,
        eSectionTypeDWARFDebugAranges,
        eSectionTypeDWARFDebugFrame,
        eSectionTypeDWARFDebugInfo,
        eSectionTypeDWARFDebugLine,
        eSectionTypeDWARFDebugLoc,
        eSectionTypeDWARFDebugMacInfo,
        eSectionTypeDWARFDebugPubNames,
        eSectionTypeDWARFDebugPubTypes,
        eSectionTypeDWARFDebugRanges,
        eSectionTypeDWARFDebugStr,
        eSectionTypeDWARFAppleNames,
        eSectionTypeDWARFAppleTypes,
        eSectionTypeDWARFAppleNamespaces,
        eSectionTypeDWARFAppleObjC,
        eSectionTypeELFSymbolTable,       // Elf SHT_SYMTAB section
        eSectionTypeELFDynamicSymbols,    // Elf SHT_DYNSYM section
        eSectionTypeELFRelocationEntries, // Elf SHT_REL or SHT_REL section
        eSectionTypeELFDynamicLinkInfo,   // Elf SHT_DYNAMIC section
        eSectionTypeEHFrame,
        eSectionTypeOther

    } SectionType;

So we see we have eSectionTypeCode and eSectionTypeData.

This could be used to help make the correct reads if addresses fall within known address ranges that fall into sections within a binary. I am guessing that there are code and data reads that are not found within any sections from files right?

If so we would need to change all functions that take a "load address ("lldb::addr_t load_addr") into something that takes a load address + segment which should be a new struct type that like:

ResolvedAddress {
  lldb::addr_t addr;
  lldb::segment_t segment;
};

Then all things like Read/Write memory in the process would need to be switched over to use the ResolvedAddress. 

The lldb_private::Address function that is:

    lldb::addr_t
    Address::GetLoadAddress (Target *target) const;

Would now need to be switched over to:

    ResolvedAddress
    Address::GetLoadAddress (Target *target) const;

We would need an invalid segment ID (like UINT32_MAX or UINT64_MAX) to indicate that there is no segment.

So all in all this would be quite a big fix that would involve a lot of the code.

> It seems that the commonly adopted approach so far (i.e. with gdb) is to produce a larger single address space by adding an offset to the memory address and arranging for the stub to interpret the presence/absence of the offset and act accordingly. (I have indeed read of this approach being employed for AVR processors). This technique is workable (provided the DSPs continue to have fairly small physical memories), but certainly has drawbacks i) changes to the debugger to add the offset, ii) increased packet size (e.g. all code read addresses having highest bit set), iii) increased compression/decompression due to RLE on the response.
> 
> So is the "add an offset" technique still the best way forward to solve this problem? How about adding a new request (along with a query request to interrogate the stub for support) for code reads - is this an option? (If so, I'd be happy to do the work...)
> 
> Another issue which I'm looking at, is that some of our DSPs have 24-bit bytes. (That is, a single data address reads back 24-bits of data). At this moment in time, I'm not altogether sure just how problematic this will be for lldb. I've looked into the g_core_definitions table, and I can't see an entry for this, (presumably it would either be a 1 or an 8, depending whether it's measured in host bytes, or bits). I assume that all the architectures in the table so far have 8-bit bytes. Is anyone else out there looking at using lldb to debug targets with non-8-bit bytes?

This would be a big change. Not sure how other debuggers handle 24 bit bytes. It might be better to leave this as is and if you read 3 bytes of memory from your DSP, you get 9 bytes back. 

If a variable for your DSP is referenced in DWARF, what does the byte size show? The actual size in 8 bit bytes, or the size in 24 bit bytes?

> So, summarising, I'm wondering if anyone has any ideas/advice on the above questions, that is, using lldb on harvard architectures and on non-standard-byte-size architectures.

It would be great to enable support for these kinds of architectures in LLDB, and it will take some work, but we should be able to make it happen.

> All comments welcome,
> Matthew Gardiner
> 
> 
> Member of the CSR plc group of companies. CSR plc registered in England and Wales, registered number 4187346, registered office Churchill House, Cambridge Business Park, Cowley Road, Cambridge, CB4 0WZ, United Kingdom
> More information can be found at www.csr.com. Keep up to date with CSR on our technical blog, www.csr.com/blog, CSR people blog, www.csr.com/people, YouTube, www.youtube.com/user/CSRplc, Facebook, www.facebook.com/pages/CSR/191038434253534, or follow us on Twitter at www.twitter.com/CSR_plc.
> New for 2014, you can now access the wide range of products powered by aptX at www.aptx.com.
> _______________________________________________
> lldb-dev mailing list
> lldb-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev