[PATCH] D46054: [TableGen] Add a general-purpose JSON backend.

Simon Tatham via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Apr 25 05:28:48 PDT 2018


simon_tatham created this revision.
simon_tatham added a reviewer: nhaehnle.
Herald added subscribers: llvm-commits, mgorny.

The aim of this back end is to output everything TableGen knows about
the record set, similarly to the default -print-records backend. But
where -print-records produces output in TableGen's input syntax
(convenient for humans to read), this back end produces it as
structured JSON data, which is convenient for loading into standard
scripting languages such as Python, in order to extract information
from the data set in an automated way. The output data also includes
some indices that the human-readable -print-records has no need for,
e.g. the list of all defs that instantiate (subclasses of) a given
class, which I expect consumers will often want to use as a starting
point.

This patch mostly works by adding the new back end as just another
function EmitJSON() alongside all the other EmitFoo functions. But in
order to allow it access to all the necessary details of the record
data, I had to add a few extra accessor functions in Record.h; in
particular, I've broken up the getAsString() methods of UnOpInit,
BinOpInit and TernOpInit by factoring out a subroutine that returns
just the operator name.

To test it, I've written a Python script that loads the JSON output
and tests properties of it based on comments in the .td source - more
or less like FileCheck, except that the CHECK: lines have Python
expressions after them instead of textual pattern matches.

This is a draft patch which I know is unfinished, because I wanted to
get some feedback on general code structure and some remaining design
decisions before doing too much detailed polishing that might need to
be redone. Here's my known list of remaining things to do:

- Consider what to do about integer values that don't fit exactly into a 'double'. This code will simply emit them as decimal integer literals, which JSON parsers are within their rights to round to the nearest double precision float, losing data. Some JSON readers (e.g. Python json.load) will deliver accurate integer values anyway, but it might be better not to rely on that, and instead output very large integers in some other form, such as a JSON object containing an identifying type field and two doubles whose sum is the desired integer, or a string representation of the integer, or both.
- Decide where all this code should live. It might be better to move a lot of it into Record.cpp in the form of getAsJSONObject() methods or something like that. That would remove the risk of forgetting to update the JSON back end if a new node type is introduced - anyone forgetting to implement that method in any new subclass of Init or RecTy would be reminded by a compile error.
- Consider adding the new -dump-json option to clang-tblgen as well as llvm-tblgen. (As I understand it they wouldn't do anything differently, but it seems asymmetric not to have both of them support it. They both have -print-records, after all.)
- Consider providing a cut-down version, enabled by another option such as '-dump-simple-json', in which all the complicated parametric expression nodes like !add and !foreach and !foldl are simply not emitted, and replaced by some kind of small object indicating that a complex expression was elided. The motivation is that I expect a lot of uses for this system would only be interested in the output fields that consist of final well-defined values of primitive type, so constructing the complicated parts is a waste of both TableGen's time and the consumer's. But I'm not sure where the line should be drawn - DAG arguments might well still need to be output in full, for example, and type information might be omittable. There may be no one good answer.
- Document the structure of the JSON output. The test script acts as informal documentation for the moment (at least I hope it's clear enough to people reviewing my work so far), but once it's finalised, I definitely want to write it up properly in the real docs, and include an example or two of the kind of analysis you can use this for.


Repository:
  rL LLVM

https://reviews.llvm.org/D46054

Files:
  include/llvm/TableGen/Record.h
  lib/TableGen/Record.cpp
  test/TableGen/JSON-check.py
  test/TableGen/JSON.td
  utils/TableGen/CMakeLists.txt
  utils/TableGen/JSONEmitter.cpp
  utils/TableGen/TableGen.cpp
  utils/TableGen/TableGenBackends.h

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D46054.143909.patch
Type: text/x-patch
Size: 34687 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20180425/9b0045f3/attachment-0001.bin>


More information about the llvm-commits mailing list