[PATCH] D140745: WIP: generate Config {Fragment structure, json schema, docs, YAML parser} from schema spec

Sam McCall via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Mon Jan 2 05:26:39 PST 2023


sammccall created this revision.
Herald added subscribers: kadircet, arphaman, mgrang.
Herald added a project: All.
sammccall updated this revision to Diff 485562.
sammccall added a comment.
sammccall updated this revision to Diff 485564.
sammccall updated this revision to Diff 485568.
sammccall added reviewers: kadircet, nridge.
sammccall published this revision for review.
Herald added a project: clang-tools-extra.
Herald added a subscriber: cfe-commits.

Restructure schema so $ref can use JSON pointers, fixing compat with yaml-language-server.

Add some VSCode extensions supported by yaml-language-server.


sammccall added a comment.

fix external: none


sammccall added a comment.

fix schema version


sammccall added a comment.

This is definitely not ready to ship, but I'd like some high-level feedback before adding tests/polish.

Most important thing to look at is `schema.yaml` which would be the source of truth, followed by maybe `Generate.cpp` and the new structure of `ConfigYAML` (but no need to go through the code details at this point).

Particular questions I'd like opinions on:

worthwhile overall?
-------------------

This isn't trivial: does it solve a big enough problem to be worth the complexity? (Benefit would be improved maintenance of existing config duplication, and being able to accept JSON-schema which IMO we shouldn't without automation).

better off parsing schema.json rather than defining a new format?
-----------------------------------------------------------------

I don't think so (json-schema is too powerful a language, and it's IMO pretty verbose and painful to edit).
But the bias towards creating new things needs to be double-checked!

does the YAML structure seem OK?
--------------------------------

I'm reasonably happy with the compromises I landed on e.g.

- distinction between lowercase for meta-properties and UpperCase for schema children
- ways to handle special casing around `If.UnknownKey` and `Index.External = None`

but maybe they only make sense in my own head

build-time generation vs checking in generated files
----------------------------------------------------

Generating at cmake-time is the obvious choice for *.inc, but the consumers of schema.json and doc.md aren't part of our build.

1. we could generate *.inc at build time, and say that it's up to some other repo(s) to run the llvm build system and produce+publish schema.json and doc.md
2. we could generate *.inc at build time, and check in schema.json and doc.md (need to update by rerunning the tool, we could have a test)
3. we could generate all files by hand and check them in

My feeling is that 1 is too burdensome to get these extra files.
2 is a bit more complicated than 3 (two ways to do things, and building the tool in host config adds some cmake complexity).
And I'm not sure it has any advantages: regenerating all the files isn't more work than regenerating half.

So I lean towards 3 but I don't have much confidence in what will be least annoying overall.

meta-schema.json
----------------

Once I got the tooling set up, having a schema+diagnostics for `schema.yaml` is pretty nice.
It provides a place to document the structure of the file, and gives a crisp answer to "is this change broken" (albeit one we won't actually have a test for).
OTOH it's not strictly necessary, and yet more stuff in the tree.

Enums in ConfigFragment.h
-------------------------

Previously these were strings in `ConfigFragment.h`, and the enums were defined in `Config.h`, which I think is where they belong in terms of dep structure.

However once `schema.yaml` knows about enums, it seems natural to generate the enum definitions and the code to parse them.

So options all have downsides:

1. put them in `ConfigFragment.h` and parse them in `ConfigYAML`, with the other generated code. Now `Config` needs to depend on `ConfigFragment`
2. generate them into `Config.h` and generate parsing code in `ConfigCompile` as today. This means we now need to inject generated code into 4 places instead of 2.
3. don't generate the enums, define & parse them by hand as today (and need to keep that code in sync)

I went with 1 in the code, but I really don't know what's best here.

Dir structure
-------------

With so many extra files, I created `config/` and I think moving existing `Config*` sources to `config/*` seems like a clear improvement, but want to get an ack on this first as renaming is annoying.


This is hacked up: the generated files should not be checked in!


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D140745

Files:
  clang-tools-extra/clangd/CMakeLists.txt
  clang-tools-extra/clangd/Config.h
  clang-tools-extra/clangd/ConfigCompile.cpp
  clang-tools-extra/clangd/ConfigFragment.h
  clang-tools-extra/clangd/ConfigYAML.cpp
  clang-tools-extra/clangd/config/CMakeLists.txt
  clang-tools-extra/clangd/config/Fragment.inc
  clang-tools-extra/clangd/config/Generate.cpp
  clang-tools-extra/clangd/config/YAML.inc
  clang-tools-extra/clangd/config/doc.md
  clang-tools-extra/clangd/config/meta-schema.json
  clang-tools-extra/clangd/config/schema.json
  clang-tools-extra/clangd/config/schema.yaml
  clang-tools-extra/clangd/unittests/ConfigCompileTests.cpp
  clang-tools-extra/clangd/unittests/ConfigYAMLTests.cpp
  clang-tools-extra/clangd/unittests/IncludeCleanerTests.cpp

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D140745.485568.patch
Type: text/x-patch
Size: 115870 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20230102/36f82bbb/attachment-0001.bin>


More information about the cfe-commits mailing list