[PATCH] D140745: WIP: generate Config {Fragment structure, json schema, docs, YAML parser} from schema spec

Kadir Cetinkaya via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Thu Jan 5 00:16:25 PST 2023


kadircet added a comment.

In D140745#4021783 <https://reviews.llvm.org/D140745#4021783>, @sammccall wrote:

> worthwhile overall?
> -------------------
>
> This isn't trivial: does it solve a big enough problem to be worth the complexity? (Benefit would be improved maintenance of existing config duplication, and being able to accept JSON-schema which IMO we shouldn't without automation).

I find the solution here valuable, especially considering all the times that we've failed to wire/test things properly when we introduced new config options and the lack of documentation in clangd-www. Also this will make introducing new options straightforward (no need to write fragments or parse logic).
Last time we discussed the concerns I had (at least the ones I can remember) were losing the readability we have in a C++ header file (when performing the fragment->config step) and difficulty about changing "parse" behavior every now and then, but considering all the bugs/outages we had ever since I think these concerns are OK to give up at this point (I also like the solution around only writing custom parse logic when needed).

> better off parsing schema.json rather than defining a new format?
> -----------------------------------------------------------------
>
> I don't think so (json-schema is too powerful a language, and it's IMO pretty verbose and painful to edit).
> But the bias towards creating new things needs to be double-checked!

I also feel like editing the YAML format is easier than the JSON one, this might be an LLVM bias but considering the project is part of LLVM, I don't think that'll be a bad thing.

> does the YAML structure seem OK?
> --------------------------------
>
> I'm reasonably happy with the compromises I landed on e.g.
>
> - distinction between lowercase for meta-properties and UpperCase for schema children
> - ways to handle special casing around `If.UnknownKey` and `Index.External = None`
>
> but maybe they only make sense in my own head

I am still a little unhappy with losing the readability through the new YAML format. but I also don't have any better alternatives and we still get to keep the Config.h so at least when writing features we still have a single C++ header to read.

> build-time generation vs checking in generated files
> ----------------------------------------------------
>
> Generating at cmake-time is the obvious choice for *.inc, but the consumers of schema.json and doc.md aren't part of our build.
>
> 1. we could generate *.inc at build time, and say that it's up to some other repo(s) to run the llvm build system and produce+publish schema.json and doc.md
> 2. we could generate *.inc at build time, and check in schema.json and doc.md (need to update by rerunning the tool, we could have a test)
> 3. we could generate all files by hand and check them in
>
> My feeling is that 1 is too burdensome to get these extra files.
> 2 is a bit more complicated than 3 (two ways to do things, and building the tool in host config adds some cmake complexity).
> And I'm not sure it has any advantages: regenerating all the files isn't more work than regenerating half.
>
> So I lean towards 3 but I don't have much confidence in what will be least annoying overall.

I am actually more inclined towards option 1). Having *.inc generation as part of the build step and not caring about them ever sounds like the better trade-off. I don't know how troublesome it's in the CMake sense (but we already do that with proto-generated files in clangd for example).
If you think that's too much work I am also fine with option 3 as you mentioned (this also has the nice side effect of checking the delta to real code after a change to the schema).
As for doc.md, it's unfortunate that we'll need to publish it in a different repository anyways, but I don't think that's a huge burden and we can probably get away with doing that only at branch cuts. Having doc.md and schema.json always available in the repo as source of truth (to aid debugging when things go wrong) sounds like a good plus to me. So I am in favor of having them in.

> meta-schema.json
> ----------------
>
> Once I got the tooling set up, having a schema+diagnostics for `schema.yaml` is pretty nice.
> It provides a place to document the structure of the file, and gives a crisp answer to "is this change broken" (albeit one we won't actually have a test for).
> OTOH it's not strictly necessary, and yet more stuff in the tree.

I am in favor of having it. this is unlikely to change so not a huge burden for maintenance. I'd still like to have some sort of schema description in the YAML file though (as a comment, both for a place to read without context switches and to serve as a template when introducing a new option).

> Enums in ConfigFragment.h
> -------------------------
>
> Previously these were strings in `ConfigFragment.h`, and the enums were defined in `Config.h`, which I think is where they belong in terms of dep structure.
>
> However once `schema.yaml` knows about enums, it seems natural to generate the enum definitions and the code to parse them.
>
> So options all have downsides:
>
> 1. put them in `ConfigFragment.h` and parse them in `ConfigYAML`, with the other generated code. Now `Config` needs to depend on `ConfigFragment`
> 2. generate them into `Config.h` and generate parsing code in `ConfigCompile` as today. This means we now need to inject generated code into 4 places instead of 2.
> 3. don't generate the enums, define & parse them by hand as today (and need to keep that code in sync)
>
> I went with 1 in the code, but I really don't know what's best here.

For me, the biggest downside of the current approach is the need to spell enum names in generated code inside `Config.h`. but I don't think managing the parser code by hand can be justified just to refer to familiar names in Config.h.
So I think all good here.

> Dir structure
> -------------
>
> With so many extra files, I created `config/` and I think moving existing `Config*` sources to `config/*` seems like a clear improvement, but want to get an ack on this first as renaming is annoying.

agreed, as discussed in the past, I think we should slowly try to define more fine grained targets for clangd (so that we can better check for layering violations, enable code reuse, enable more caching etc.)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140745/new/

https://reviews.llvm.org/D140745



More information about the cfe-commits mailing list