[PATCH] D77811: [clangd] Implement semanticTokens modifiers

Tue Feb 9 07:31:08 PST 2021

sammccall marked 2 inline comments as done.
sammccall added a comment.

Thanks for the excellent comments, and sorry for the radio silence. (Kids got quarantined recently so work time has been... scarce).

I think there are improvements to be had here but it's probably time to land this...

In D77811#2544328 <https://reviews.llvm.org/D77811#2544328>, @nridge wrote:

> I am curious what you think of the extension idea for readonly <https://reviews.llvm.org/D77811?id=320218#inline-896431>, though obviously it wouldn't be something to do in this patch.

Longer comment below, but:

- I love it
- it's not trivial to implement, I think
- not obvious whether this should replace the simple `readonly` given here, augment it, or be a different attribute - I'm probably happy with any of these.

================
Comment at: clang-tools-extra/clangd/SemanticHighlighting.cpp:136
+// Whether D is const in a loose sense (should it be highlighted as such?)
+bool isConst(const Decl *D) {
+  if (llvm::isa<EnumConstantDecl>(D) || llvm::isa<NonTypeTemplateParmDecl>(D))
----------------
nridge wrote:
> Do you think in the future it might make sense to have the `readonly` modifier reflect, at least for variables, whether //the particular reference// to the variable is a readonly reference (e.g. binding the variable to a `const X&` vs. an `X&`)?
Do I understand correctly?

```
std::vector<int> X; // X is not readonly
X.push_back(42); // X is not readonly
X.size(); // X is readonly
```

Distinguishing potentially-mutating from non-mutating uses seems really useful to me.
My only question is whether mapping this concept onto `readonly` is the right thing to do:
 - there are really three sets: const access, mutable access, and non-access (e.g. declaration, or on RHS of `using a=b`). And I think maybe the most useful thing to distinguish is mutable access vs everything else, which is awkward with an attribute called "readonly".
 - this also seems likely to diverge from how other languages use the attribute (most don't have this concept)
 - on the other hand, standard attribute names will be better supported by clients

This also might be somewhat complicated to implement :-)
I'd like to leave in this simple decl-based thing as a placeholder, and either we can replace it and add an additional "mutation" attribute later (once we work out how to implement it!) I've left a comment about this...

(Related: a while ago Google's style guide dropped its requirement to use pointers rather than references for mutable function params. Having `&x` at the callsite rather than just `x` is a useful hint, but obviously diverging from common practice has a cost. We discussed how we could use semantic highlighting to highlight where a param was being passed by mutable reference, though didn't have client-side support for it yet)

================
Comment at: clang-tools-extra/clangd/SemanticHighlighting.cpp:476
+      if (!Kind || (Result && Kind != Result))
+        continue;
+      Result = Kind;
----------------
nridge wrote:
> nridge wrote:
> > This is a change of behaviour from before, in the case where the `ReferenceLoc` has multiple targets.
> > 
> > Before, we would produce at most one highlighting token for the `ReferenceLoc`. In the case where different target decls had different highlighting kinds, we wouldn't produce any.
> > 
> > Now, it looks like we produce a separate token for every target whose kind matches the kind of the first target (and skip targets with a conflicting kind).
> > 
> > Is that the intention?
> > 
> > It seems a bit strange: if we allow multiple tokens, why couldn't they have different kinds?
> Thinking more about this, the behaviour may actually be reasonable as written.
> 
>   * Tokens with different kinds would get discarded via `resolveConflict()`.
>   * Multiple tokens with the same kind are potentially useful because they may have different modifiers, and the modifiers are then merged in `resolveConflict()`.
Oops, I don't think the part where we treat the first kind specially was intended.
I think the simplest thing is to emit all the tokens and let conflict resolution sort them out.
In practice this means:
 - if there's multiple kinds, discard all
 - if there's different sets of modifiers, take the union
Which seems reasonable to me until proven otherwise

(This part of the patch is really old so I can't be 100% sure what my intention was...)

================
Comment at: clang-tools-extra/clangd/unittests/SemanticHighlightingTests.cpp:264
+        $Class[[D]] $Field_decl[[E]];
+        static double $StaticField_decl_static[[S]];
+        static void $StaticMethod_decl_static[[bar]]() {}
----------------
nridge wrote:
> sammccall wrote:
> > nridge wrote:
> > > Presumably, the highlighting kinds `StaticField` and `StaticMethod` are going to be collapsed into `Field` and `Method` in a future change (after the removal of TheiaSemanticHighlighting, I guess)?
> > Yeah, merging any kinds that are exported with the same name should be NFC at that point.
> > 
> > Hmm, though currently StaticField --> Variable, not Field (similarly StaticMethod --> Method).
> > So we can have static fields be Variable+Static+ClassScope or Field+Static+ClassScope.
> > 
> > I can see arguments for either...
> I don't have a strong opinion on this one.
> 
> Maybe Field+Static+ClassScope, that way clients that use a generic theme (and thus do not recognize ClassScope) will color it as Field rather than Variable?
Field+Static is probably better than Variable, yeah.

In the back of my head I wonder if there will be significant clients that handle type but not modifiers, and if Variable is a better first-order description than Field. (I was surprised to find out that static members in clang are VarDecls not FieldDecls, but now I got used to the idea...)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77811/new/

https://reviews.llvm.org/D77811