[PATCH] D138323: [TableGen] RegisterInfo backend - Add abstraction layer between code generation logic and syntax output

Thu Feb 9 10:12:48 PST 2023

Rot127 added a comment.

In D138323#4072561 <https://reviews.llvm.org/D138323#4072561>, @nhaehnle wrote:

> Yes, that's indeed an issue for what you're trying to do. IMHO it is also an issue for the maintainability of these backends as they are today. So I'd say we should try to separate the state-machine generation code from the printing code. It could be moved to `CodeGenTarget`, though I think it'd be better to keep things slightly more modular and instead split up the current `DecoderEmitter` into parts:
>
> - A `Decoder` class contains all the state-machine description. Think of it as an "intermediate representation for a decoder/disassembler": it contains the data describing the decoder, but essentially no logic.
> - A separate `createDecoder` function creates a `Decoder` given a `CodeGenTarget`
> - The `emitDecoder` function writes out the generated C++ for a given `Decoder` object
>
> Does that make sense to you?

This is certainly one way of doing it. This would also fit loosely the current working (the the current `DecoderEmitter` works like this <https://reviews.llvm.org/D142054>).

But personally I would start at a more fundamental redesign.

**Preparation**

1. Enumerate backends which:
  - processes CodeGen information into more complex structures (e.g. decoder tables, lists of certain operands etc.).
  - mix up code generation and code emitting.
2. For each backend identify what self-contained **`InfoBlocks`** (I lag a better name here) it produces
  - `RegisterInfoEmitter`: Produces register enums, Subregister information tables, Register classes info tables, algorithms to get register names from the enums etc.
  - `AsmWriter`: Instruction mnemonic tables, alias mnemonics, Mnemonic selection logic etc.
  - `DecoderEmitter`: Produces a complete decoder, with all its tables and state-machine traversal algorithms. ...

**The design**

The new design would consists of small modules which generate only **one** `InfoBlock` (a single enum, a single table, code to traverse a certain table).
Additionally we add a controller module which triggers the generation of different blocks.

Complicated `InfoBlock`s which consists effectively of multiple other `InfoBlocks` (e.g. the decoder needs instruction enums, the state machine algorithm, state machine tables etc.)
can have dependencies to other `InfoBlock`s.

For each `InfoBlock` I imagine a data structure like this:

  struct {
    Name; // Unique name/ID of this information
    Data;  // The data, e.g. a vector with strings representing a register name enum or more complex stuff.
    DataType;  // Enumeration, Algorithm/Code, Table, StateMachine ...
    Dependencies;  // List of Names/IDs this InfoBlock depends on.

    void genInfo(CodeGenTarget Target, vector<InfoBlock> Dependencies); // Generates the information and stores it in `Data`.
  }

Now, our new controller module would get a list of `InfoBlocks`.  It triggers the generation of them and their dependencies.

If they need to be emitted, an Emitter can get those `InfoBlocks` and simply print them, without manipulating the data further. It can do this in whatever order, hierarchy and syntax it likes.

*Pros*

- Fine grained control what information is generated.
- `InfoBlock` generator modules can be selected for build at will, no huge backends anymore (seems to be a problem, see: https://discourse.llvm.org/t/issues-in-llvm-tblgen-high-parallelized-build/68037/16).
- Non Emitter backends can use the `InfoBlock` data as well.
- Most of the `InfoBlock` generation modules should be very small.

*Cons*

- What counts as self-containing `InfoBlock`?
- Danger of very similar modules. E.g. in the long run several `InfoBlocks` could end up generating *almost* the same information but not quite.
- It is a ton of work.

With all that said, what do you think?

The very last point ("It is a ton of work") would be a problem for me. I decided because of that to design this patch as it is. Simply because I need to be done at some point.

So just to make sure, will the refactor (as in the diff here) has any future to get upstreamed?
Or will only a more fundamental change (moving logic to `CodeGentTarget`) be accepted?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D138323/new/

https://reviews.llvm.org/D138323