[lldb-dev] Proposal: How to modify lldb for non 8-bit bytes for kalimba processors

Thu Aug 28 00:19:49 PDT 2014

Hi folks,

One of the challenges that I need to resolve regarding debugging kalimba 
processors, is that certain variants have different notions of the size 
(in bits) of a byte, compared to a lot of more mainstream processors. 
What I'm referring to is the size of a minimum addressable unit, when 
the processor accesses memory. For example, on a kalimba architecture 
version 3, a "byte" (minimum addressable unit) from the data bus is 
24-bits, so if the processor reads from address 8001 it reads 24-bits, 
and from address 8002 the next 24-bits are read, and so on... (this also 
means that for this variant a char, int, long, pointer are 24-bits in 
size). For kalimba architecture version 4, however, we have the minimum 
addressable unit being 8-bits, and correspondingly more "conventional" 
sizes for primitive types.

I imagine that this will effect the kalimba lldb port is various ways. 
The most obvious one, and hence the one I'd like to solve first, is that 
way in which raw memory read/write are implemented. As an example when I 
ask lldb to read 4 "bytes" (addressable units worth of data) from a 
kalimba with 8-bit bytes I expect to see this:

(lldb) memory read --count 4 0x0328
0x00000328: 00 07 08 08                                      ....
(lldb)

However if target processor has 24-bit bytes then I expect the same 
query to yield the following answer:

(lldb) memory read --count 4 0x0328
0x00000328: 000708 080012 095630 023480 
                                      ....
(lldb)

Just considering the above scenario leads me to believe that my first 
challenge is arranging for the remote protocol implementation (currently 
Process/gdb-remote et al) to assume Nx host bytes (N being a 
target-specific value) for each target byte accessed, and for the memory 
read and formatting code (above) to behave correctly, given the 
discrepancy between host and target byte sizes. I guess I'll see many 
other challenges - for example, frame variable decode, stack unwind etc. 
(but since *those* challenges require work on clang/llvm backend, and 
CSR have no llvm person yet, I want to concentrate on raw memory access 
first...)

For an added complication (since kalimba is a harvard architecture) 
certain kalimba variants have differing addressable unit sizes for 
memory on the code bus and data bus. Kalimba Architecture 5 has 8-bit 
addressable code, and 24-bit addressable data...

My initial idea for how to start to address the above challenge is to 
augment the CoreDefinition structure in ArchSpec.cpp as follows:

     struct CoreDefinition
     {
         ByteOrder default_byte_order;
         uint32_t addr_byte_size;
         uint32_t min_opcode_byte_size;
         uint32_t max_opcode_byte_size;
+       uint32_t code_byte_size;
+       uint32_t data_byte_size;
         llvm::Triple::ArchType machine;
         ArchSpec::Core core;
         const char * const name;
     };

Where code_byte_size and data_byte_size would specify the size in host 
(8-bit) bytes the sizes of the minimum addressable units on the 
referenced architectures. So, e.g.
For kalimba 3, with 24-bit data bytes and 32-bit code bytes we'd have 
data_byte_size=3 and code_byte_size=4
For kalimba 4, with 8-bit data bytes and 8-bit code bytes we'd have 
data_byte_size=1 and code_byte_size=1

So, then I'd update the g_core_definitions array within ArchSpec.cpp 
accordingly, such that all non-kalimbas would have 1 as the setting for 
the new datas and the kalimba entries would have those fields made to 
match the architectures.

The ArchSpec class would then require the following accessors: uint32_t 
GetCodeByteSize() and uint32_t GetDataByteSize(); to supply client code 
with the required hints to correctly implement memory accesses.

My next plan would be to "massage" the code in the execution flow from 
an (lldb) memory read invocation through to the gdb-remote comms until I 
see the memory read examples I illustrated above, working for 8-bit and 
24-bit data kalimba targets.

I'd appreciate all comments and opinions as to what I've described above 
from the lldb community. Basically, I'm curious as to what people think 
of the whole concept, e.g.

"You can't possibly do that, so many other architectures have 8-bit 
bytes, and so this proposal would make them harder to enhance, for the 
benefit of (currently) just kalimba"
"Yes, that's a good idea, lldb can accommodate the most unusual of 
architectures"

And I'm also interested in technical comments, e.g. should an instance 
of CoreDefinition be added to ArchSpec, or is just adding the extra 
byte-size attributes sufficient... or if anyone thinks that modifying 
gdb-remote is a bad idea, and that I should be creating kalimba process 
abstractions (and factor out the common code)?

thanks
Matt

Member of the CSR plc group of companies. CSR plc registered in England and Wales, registered number 4187346, registered office Churchill House, Cambridge Business Park, Cowley Road, Cambridge, CB4 0WZ, United Kingdom
More information can be found at www.csr.com. Keep up to date with CSR on our technical blog, www.csr.com/blog, CSR people blog, www.csr.com/people, YouTube, www.youtube.com/user/CSRplc, Facebook, www.facebook.com/pages/CSR/191038434253534, or follow us on Twitter at www.twitter.com/CSR_plc.
New for 2014, you can now access the wide range of products powered by aptX at www.aptx.com.