[LLVMdev] Issue with instruction decoding / disassembly

Tue Dec 18 10:28:54 PST 2012

Owen,

As I recall, we had some similar issues with custom decoders needing to cooperate on ARM. Do you remember the details?

-Jim

On Dec 18, 2012, at 2:37 AM, Richard Osborne <richard at xmos.com> wrote:

> I'm currently trying to get llvm-mc --disassemble working for the XCore backend. Up until recently there was no instruction encoding / decoding information on any of the XCore instructions so Im incrementally adding this information at the same time as adding tests for the disassembler. However I've run into a problem and I'm not sure of the best way to solve it. With some of the XCore's instruction formats operands are not encoded into bits individually. instead they are combined into a single field using arithmetic operations before being inserted in the instruction. For example:
> 
> ADD_3r is encoded as: 00010aaaaabbccdd
> 
> where:
> 
> aaaaa = op1[3...2] × 9 + op2[3...2] × 3 + op3[3..2]
> bb = op1[1..0]
> cc = op2[1..0]
> dd = op3[1..0]
> 
> op1 - op3 are all in the range 0-11 and therefore aaaaa is in the range 0 - 26
> 
> I managed to get decoding of ADD_3r instructions to work by specifying the value of the bits that are fixed in the instruction format and using a custom DecoderMethod to handle the rest. The problem comes when I try and add the INITSP_2r instruction.
> 
> INITSP_2r is encoded as: 00010aaaaab0ccdd
> 
> Again operands are not individually encoded into bits but instead they are combined into a single field using arithmetic operations. Due to the way the aaaaa is derived it is guaranteed to be greater than 27. The value of these bits is how the INITSP_2r and ADD_3r instructions should be distinguished. I tried to handle SETD_r2r the same way as ADD_3r (specifying the value of the bits that are fixed and using the a custom DecoderMethod for the rest). With this change I can disassemble INITSP_2r instructions but it breaks the decoding of ADD_3r instructions. Consider the following bit pattern:
> 
> 0001000000000000
> 
> This is an ADD_3r instruction. Before adding INITSP_2r the autogenerated decodeInstruction method would identify this as a possible ADD_3r instruction and it would call the associated decoder method (Decode3RInstruction) which returns Success. After adding INITSP_2r the autogenerated decodeInstruction method identifies this as a possible INITSP_2r instruction and it calls the associated decoder method (Decode2RInstruction) which returns Fail. At this point I'd like decodeInstruction to carry on testing to see if it can be decoded as an ADD_3r instruction but instead it stops looking at this point and returns Fail.
> 
> How should I deal with this situation? One idea I had (which I haven't tried yet) is to move the troublesome instructions into a different decoding table by setting the DecoderNamespace. This way in XCoreDisassembler::getInstruction() I can call decodeInstruction() on the first decoder table (containing INITSP_2r) and if this fails I can then call decodeInstruction() on the second decoder table (containing ADD_3r). Is this an abuse of DecoderNamespaces? Is there a better way of solving my problem?
> 
> Thanks,
> 
> Richard
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev