[llvm-dev] [M68k] Code Beads discussion

Mon Feb 8 09:22:22 PST 2021

Hi,

I wanted to discuss the machine code generation technique that the M68k
target currently uses and possible alternatives. To summarise my
understanding of the current architecture, each instruction has a set of
'code beads' which are essentially a domain-specific bytecode, with
instructions like 'emit constant bits', 'emit register number as bits', etc.

This is possibly a little bit early but it seems like the M68k target might
land soon and I figure that there shouldn't be too much risk talking about
this ahead of time. Apologies to those that read my comment on the related
bug (#48792) as this will be very repetitious.

I'm pretty new to the backend infrastructure of LLVM, so please correct me
as necessary!

I've had a look at the instruction encoding over the past few days and I'm
finding the number of instruction definitions to be overwhelming. There is
currently one instruction defined per operand per addressing mode.

I've been playing about with an LLD implementation and I'm currently
finding it hard to implement relaxation to 32-bit operands due to the
number of instruction permutations.

Additionally, my proof-of-concept disassembler is not very efficient since
it needs to consider the entire instruction length (which can be up to 22
bytes IIRC). This could be implemented as a radix-tree-style code
generator, which would be efficient, but quite complicated.

So I spent some time looking through the processor manual for the M68k and
I've observed a couple of things:
- Almost every instruction can be identified from just the first word.
- Whilst lots of memory operand modes are illegal for particular
instructions, there are very few different encodings for operands.
(Effective addressing and a few register / immediate encodings).

So, instead of code beads, could we:
- Use a MCTargetExpr subclass to represent the M68k effective addressing
operand
- Use just the first word for disassembly (possibly with code generation)
- Have a few special cases in the emitter & disassembler code for outlier
instructions
- Use TSFlags to encode the first word & mask, along with the encoding
modes of each operand

The prior art for custom MCTargetExprs seems to be for external symbols so
I'm not sure if there are any caveats there.

The code emitter would be slightly less flexible and possibly require a few
special cases (but would be easier to follow IMO).

Relaxation would be much easier to implement as there would be fewer
instructions and relaxation per-operand could be implemented once. The only
complication is knowing which EA modes are valid for a given instruction,
to know when to expand into a (load + op).

Disassembly by checking the first word (even checking the highest 4 bits in
a switch maybe) would be fast and straight-forward to follow. There would
need to be a few special cases here too though.

I've not thought much about ensuring only valid instruction permutations
get parsed and emitted.

Thanks,
Ricky Taylor,
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210208/47f7e72d/attachment.html>