[llvm] [NFC][doc] Rewrite introduction to TableGen to be more modern and approachable (PR #78124)

Sun Jan 14 21:15:49 PST 2024

https://github.com/johnwbyrd created https://github.com/llvm/llvm-project/pull/78124

I've just helped develop a novel LLVM backend (LLVM-MOS), and these changes include what I wish someone had told me about TableGen before I got started.  I've purposely written in an approachable style, which is sorely lacking in the current state of the TableGen docs.

>From c5338b74bf4b70b1824cee51621a836de07fb468 Mon Sep 17 00:00:00 2001
From: John Byrd <johnwbyrd at gmail.com>
Date: Sun, 14 Jan 2024 21:11:04 -0800
Subject: [PATCH] Rewrite introduction to TableGen to be more modern and
 approachable.

I've just helped develop a novel LLVM backend (LLVM-MOS), and these changes include what I wish someone had told me about TableGen before I got started.
---
 llvm/docs/TableGen/index.rst | 71 +++++++++++++++++++++++++++++++++---
 1 file changed, 65 insertions(+), 6 deletions(-)

diff --git a/llvm/docs/TableGen/index.rst b/llvm/docs/TableGen/index.rst
index e916c152f5a43d..69312908d44956 100644
--- a/llvm/docs/TableGen/index.rst
+++ b/llvm/docs/TableGen/index.rst
@@ -15,12 +15,71 @@ TableGen Overview
 Introduction
 ============
 
-TableGen's purpose is to help a human develop and maintain records of
-domain-specific information.  Because there may be a large number of these
-records, it is specifically designed to allow writing flexible descriptions and
-for common features of these records to be factored out.  This reduces the
-amount of duplication in the description, reduces the chance of error, and makes
-it easier to structure domain specific information.
+:intro
+
+When you're creating a compiler, an assembler, or a disassembler, you'll need
+to express every last detail of your instruction set, in a way that's least
+likely to avoid errors from repetitive typing.
+
+Enter TableGen.  TableGen is a command-line tool that reads files in the .td
+format, and spits out, usually, C++ code snippets that can be #include'd into
+other code.
+
+TableGen .td files look like C++.  But they're not C++, and you will get
+confused if you think of them as C++.  In fact, TableGen is principally a
+declarative language -- it defines records, and groups of records, and
+groups of groups.
+
+TableGen refers to a group of related records as a class. Like C++, classes can
+inherit fields and default values from parent classes, but unlike C++, there's
+no implicit concept of functions, virtual or otherwise.
+
+TableGen's brains are located in the TableGen backends.  TableGen backends
+exist for spitting out the guts of an assembler parser, GlobalISel and
+DAGISel code generators, table lookups, compiler strings, and JSON
+files containing all machine instructions for a target.  Don't make the
+mistake of confusing TableGen backends with LLVM backends, which are designed
+to render LLVM intermediate-representation (IR) code for a specific target.
+An LLVM backend typically #include's code generated by several TableGen
+backends.
+
+Although TableGen files were originally intended as a purely declarative
+language, over time, some limited support for functional behavior has been 
+added, to avoid repetition within the .td files themselves.  This
+functionality is implemented in TableGen with the bang operator (!).
+An example of this functionality, would be defining 128 similar registers
+in a couple lines of TableGen code.
+
+Originally, TableGen was intended solely for instruction modelling. 
+But as needs have changed, TableGen has also been saddled with the ability
+to generate efficient lookup tables for arbitrary data, the ability to
+check and manage error messages, and the ability to generate detailed 
+reports about a target's instruction set.  
+
+It's natural to go looking, at some point, for a detailed list of every 
+TableGen parameter in the documentation.  As of this writing, sadly, such 
+documentation does not exist.  One approach to beginning to understand what
+TableGen wants, is to review the .td files that many of the targets include.
+An object introduction to TableGen syntax for a particular target, exists in
+``llvm/include/llvm/Target/Target.td`` .   This file is typically included
+by most backends.  It will give you an idea of what sort of TableGen records
+that an LLVM backend is expected to maintain.  
+
+Here's a list of TableGen backends and what they do.  It's possible to add
+additional TableGen backends, if you need some kind of specialized report
+or generated code for LLVM's internal use, based on information in those .td
+files.
+
+You should think of using TableGen:
+
+- When you need to generate fiddly or hard-to-manage code from data structures;
+- When you're trying to represent the machine-level instructions for a new LLVM
+  target;
+- When you need a set of common descriptions to be turned into multiple bits
+  of related code.
+
+TableGen is compiled, and tends to run relatively early, whenever you build
+LLVM.  Traditionally, TableGen files have a .td extension.  
 
 The TableGen front end parses a file, instantiates the declarations, and
 hands the result off to a domain-specific `backend`_ for processing.  See