[PATCH] D82386: [clangd] Config: Fragments and parsing from YAML

Sam McCall via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Thu Jun 25 01:57:02 PDT 2020


sammccall added a comment.

Thanks!

> Apart from the comments(IIRC which are mostly minor), it felt like there were a few lines that aren't clang-format'd. Please make sure to run clang-format on the files at some point.

git-clang-format thinks it's clean, so please point out any examples you see!



================
Comment at: clang-tools-extra/clangd/ConfigFragment.h:9
+//
+// Various clangd features have configurable behaviour (or can be disabled).
+// The configuration system allows users to control this:
----------------
kadircet wrote:
> i think this paragraph belongs to `Config.h`
This first sentence does, and I'll put it there too, but I think one redundant sentence to establish the context is reasonable.

The next three lines belong in this file because they describe how users control config (which they do through ConfigFragment, not through Config).


================
Comment at: clang-tools-extra/clangd/ConfigFragment.h:16
+// configuration as obtained from a source like a file.
+// This is distinct from how the config is interpreted (CompiledFragment),
+// combined (ConfigProvider) and exposed to the rest of clangd (Config).
----------------
kadircet wrote:
> again i don't think these are relevant here.
> 
> maybe provide more details about any new config option should start propagating from here, as this is the closest layer to user written config,
> again i don't think these are relevant here.

I think how this code relates to the rest of the system is a pretty important thing for documentation to cover! I wish docs in general did a better job of this.
Relying on xrefs in the code is pretty noisy and low-level, and guessing at filenames doesn't scale well.

> maybe provide more details about any new config option should start propagating from here, as this is the closest layer to user written config,

Yeah - it depends whether you're looking at "here is a feature I'd like to make configurable" vs "here is some configuration syntax I want to support".
I've moved it here, but I'm not 100% sure it's the right call.


================
Comment at: clang-tools-extra/clangd/ConfigFragment.h:61
+  /// BufferName is used for the SourceMgr and diagnostics.
+  static std::vector<Fragment> parseYAML(llvm::StringRef YAML,
+                                         llvm::StringRef BufferName,
----------------
kadircet wrote:
> what about `fromYAML` and returning `vector<const Fragment>` ?
> what about fromYAML

I can live with it if you like, but I prefer parse:
 - Fragment::fromYAML reads funny as it returns multiple fragments in general
 - parse describes the action more clearly and so hints toward thinking about error handling etc

> returning vector<const Fragment> ?
why? the resulting vector is slower (has to copy instead of move on reallocation).
It prevents us from moving from the fragment when we compile it (which we do in some places)


================
Comment at: clang-tools-extra/clangd/ConfigFragment.h:65
+
+  struct SourceInfo {
+    /// Retains a buffer of the original source this fragment was parsed from.
----------------
kadircet wrote:
> why the bundling?
Oops, this is so I could give them a collective comment, and I forgot to do it...

The idea is everything else represents the directly-user-specified "body" of the configuration, but these fields are different. (The associated directory also goes here, though wasn't needed in this patch).
I wanted to wall these off a bit as the outer struct will gain many fields and so this may become less obvious. (Also makes it easier to skip over if we're generating docs etc)


================
Comment at: clang-tools-extra/clangd/ConfigFragment.h:69
+    /// Shared because multiple fragments are often parsed from one (YAML) file.
+    /// May be null, then all locations are ignored.
+    std::shared_ptr<llvm::SourceMgr> Manager;
----------------
kadircet wrote:
> maybe `invalid/absent` rather than `ignored` (or change it to say should be ignored) ? I am assuming this is referring to locations of config parameters like `Add`s and conditions.
Changed to "should be ignored". guaranteeing that the SMRanges is stronger but not actually useful.


================
Comment at: clang-tools-extra/clangd/ConfigFragment.h:73
+    /// Only valid if SourceManager is set.
+    llvm::SMLoc Location;
+  };
----------------
kadircet wrote:
> what is the use of this ?
There may be multiple fragments in a file.

When tracking how fragments are used in the logs, there should be enough information to identify the particular fragment, without having to guess.
 - during parsing, we can log in some source-specific way
 - in the Fragment form, we can log this location to identify the fragment
 - during compilation, we log this location and the pointer address of the new CompiledFragment
 - subsequently we can just log the CompiledFragment address


================
Comment at: clang-tools-extra/clangd/ConfigFragment.h:75
+  };
+  SourceInfo Source;
+
----------------
kadircet wrote:
> why not make this const ? i don't think it makes sense to modify these after creation.
const members are a pain, as they prevent moving the object (and vectors of them are doing copies, etc). For that reason I generally don't find "I don't semantically need to modify this" a good enough reason to mark a member const.

Adding a constructor muddles the idea of a dumb struct and presents an inconsistent API - the other fields *also* don't really change during the lifetime (parseYAML() is basically a big constructor)


================
Comment at: clang-tools-extra/clangd/ConfigFragment.h:77
+
+  struct ConditionFragment {
+    std::vector<Located<std::string>> PathMatch;
----------------
kadircet wrote:
> comments?
Whoops, done. There's an argument to be made that we shouldn't maintain docs in two places, and the user docs should be canonical.
Something to think about later, maybe we can generate them.

I'm always unsure whether to put the doc for "blocks" like this on the struct or the field...


================
Comment at: clang-tools-extra/clangd/ConfigFragment.h:81
+    /// The condition will evaluate to false.
+    bool UnrecognizedCondition;
+  };
----------------
kadircet wrote:
> `HasUnrecognizedCondition` ?
> 
> also default init to `true` maybe?
Renamed. I guess you meant `false` here? (an empty condition should not have this set)


================
Comment at: clang-tools-extra/clangd/ConfigFragment.h:87
+    std::vector<Located<std::string>> Add;
+  } CompileFlags;
+};
----------------
kadircet wrote:
> maybe make this an llvm:Optional too. Even though emptiness of `Add` would be OK in this case, for non-container "blocks" we might need to introduce an optional, and homogeneity would be nice when it happens.
I'd rather push this the other way: the presence of an empty block like this shouldn't have any semantic effect - I think we can be consistent about this and it helps the schema feel regular/extensible/predictable.

Condition was the only tricky case, but I don't think we need it, so I've removed the Optional on the condition instead. it's an AND, so an empty set of conditions evaluates to "unconditionally true" anyway.


================
Comment at: clang-tools-extra/clangd/ConfigYAML.cpp:29
+  llvm::SourceMgr &SM;
+  llvm::SmallString<256> Buf;
+
----------------
kadircet wrote:
> a comment for this buf, especially the reason why it's a member rather than a function-local when needed. (I suppose to get rid of const/dest penalties?)
Moved it inline instead.


================
Comment at: clang-tools-extra/clangd/ConfigYAML.cpp:40
+    Dict.handle("If", [&](Node &N) {
+      F.Condition.emplace();
+      return parse(*F.Condition, N);
----------------
kadircet wrote:
> It is a little bit subtle that "at most once" execution of this handler is assured by DictParser. Can we put some comments explaining it to either DictParser itself or to `handle` member ?
Ooh, hadn't noticed the subtlety with emplace :-)
Emplace is gone since condition is no longer optional.
Added a comment too.


================
Comment at: clang-tools-extra/clangd/ConfigYAML.cpp:74
+    llvm::StringRef Description;
+    std::vector<std::pair<llvm::StringRef, std::function<bool(Node &)>>> Keys;
+    std::function<void(llvm::StringRef)> Unknown;
----------------
kadircet wrote:
> maybe a comment for these two, suggesting the latter is used for unknown keys.
> 
> and also mention it is handlers responsibility emit any diagnostics in case of failures?
> maybe a comment for these two, suggesting the latter is used for unknown keys.

These are private fields, the public trivial setters are commented, I don't think commenting them again adds value.

> and also mention it is handlers responsibility emit any diagnostics in case of failures?

Done.


================
Comment at: clang-tools-extra/clangd/ConfigYAML.cpp:136
+    else if (auto *BS = llvm::dyn_cast<BlockScalarNode>(&N))
+      return Located<std::string>(S->getValue(Buf).str(), N.getSourceRange());
+    warning(Desc + " should be scalar", N);
----------------
kadircet wrote:
> s/S->getValue(Buf)/BS->getValue()/  (or change `auto *BS` to `auto *S`)
> 
> and a test
Nice catch. This would be a good clang-tidy check I think... variable initialized in an if and then used in an else.


================
Comment at: clang-tools-extra/clangd/ConfigYAML.cpp:150
+      for (auto &Child : *S) {
+        if (auto Value = scalarValue(Child, "List item"))
+          Result.push_back(std::move(*Value));
----------------
kadircet wrote:
> shouldn't we error/warn out if not ?
we do: scalarValue warns if it returns None and is documented to do so.


================
Comment at: clang-tools-extra/clangd/ConfigYAML.cpp:161
+  // Report a "hard" error, reflecting a config file that can never be valid.
+  bool error(const llvm::Twine &Msg, const Node &N) {
+    SM.PrintMessage(N.getSourceRange().Start, llvm::SourceMgr::DK_Error, Msg,
----------------
kadircet wrote:
> do we really want to return a boolean here (and not at the warning ?)
> 
> I can see that it's nice to `return error("xxx")` but it looks surprising as we are not returning the error but just false.
> I would make this void, up to you though.
there's just one callsite, made void for now, can revisit with more callsites.


================
Comment at: clang-tools-extra/clangd/ConfigYAML.cpp:202
+  // SM has two entries: "main" non-owning buffer, and ignored owning buffer.
+  SM->AddNewSourceBuffer(std::move(Buf), llvm::SMLoc());
+  return Result;
----------------
kadircet wrote:
> *sigh*
> 
> what a mess, Scanner in YamlParser already adds this buffer. It is unclear why though, as it adds a non-owning buffer and that buffer is never accessed.  (well apart from SMLoc to Buffer conversions when emitting diags)
> It rather makes use of pointers to the underlying data, obtained before adding that buffer. So it is the caller keeping the data alive anyways.
> 
> should we consider having an entry point that takes an SMLoc and a SourceMgr for `yaml::Stream` instead of this? That way we can add the owning entry to sourcemanager and start the stream at beginning of that buffer.
> 
> i am afraid this might bite us in the future, as SM is exposed publicly and one might try to iterate over buffers, which might come as a shock. we can postpone until that happens though, so feel free to leave it as it is.
I can't think of any reason we'd iterate over the buffers, as there's only ever one real buffer and it's identifiable as the "main file" in the usual way.
So I'd rather take the horrible API if/when it actually becomes a problem.


================
Comment at: clang-tools-extra/clangd/unittests/ConfigYAMLTests.cpp:58
+      Out.Pos.character = D.getColumnNo(); // Zero-based - bug in SourceMgr?
+      if (!D.getRanges().empty()) {
+        const auto &R = D.getRanges().front();
----------------
kadircet wrote:
> is there an explanation to what these ranges are ?
> 
> All I could find is somewhere in the implementation making sure these are relative to the `D.getLineNo` and nothing else :/
These are ranges to highlight (e.g. in console output). SourceMgr/SMDiag is modeled after clang's SourceManager/Diagnostic, where this stuff is a bit better documented.


================
Comment at: clang-tools-extra/clangd/unittests/ConfigYAMLTests.cpp:133
+  ASSERT_THAT(Diags.Diagnostics,
+              ElementsAre(DiagMessage("Unknown Condition key UnknownCondition"),
+                          DiagMessage("Unexpected token. Expected Key, Flow "
----------------
kadircet wrote:
> i think having a full matcher(i.e with kind, pos and ranges) might be more expressive here, up to you though
Done, though rather separate matchers as otherwise it's too much work to get decent error messages.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D82386/new/

https://reviews.llvm.org/D82386





More information about the cfe-commits mailing list