[PATCH] D52273: [clangd] Initial implementation of expected types

Sam McCall via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Thu Sep 20 16:27:57 PDT 2018


sammccall added a comment.

This seems very clever, but extremely complicated - you've implemented much of C++'s conversion logic, it's not clear to me which parts are actually necessary to completion quality.
(Honestly this applies to expected types overall - it seems intuitively likely that it's a good signal, it seems less obvious that it pulls its weight if it can't be made simple).

>From the outside it seems much of it is YAGNI, and if we do then we need to build it up slowly with an eye for maintainability.
Can we start with expected type boosting (no conversions) as previously discussed, and later measure which other parts make a difference? (I think we'll need/want the simple model anyway, for this to work with Dex and other token-based indexes).



================
Comment at: clangd/ExpectedTypes.h:66
+using SHA1Array = std::array<uint8_t, 20>;
+SHA1Array computeSHA1(llvm::StringRef Input);
+
----------------
While a hash of a string might be a reasonable choice in the long term, I worry about debuggability. (With SymbolID we can just look up the symbol).

You could make the hashing an implementation detail of the index, and have the APIs speak in terms of opaque strings. But that forces the index to be able to report the full opaque string of each returned symbol (for scoring), so the index now has to have a lookup table... messy.

Another fun thing about this representation is that you're storing 20 bytes of data (+ overhead) for common types like "void" where we could get away with one.


================
Comment at: clangd/ExpectedTypes.h:66
+using SHA1Array = std::array<uint8_t, 20>;
+SHA1Array computeSHA1(llvm::StringRef Input);
+
----------------
sammccall wrote:
> While a hash of a string might be a reasonable choice in the long term, I worry about debuggability. (With SymbolID we can just look up the symbol).
> 
> You could make the hashing an implementation detail of the index, and have the APIs speak in terms of opaque strings. But that forces the index to be able to report the full opaque string of each returned symbol (for scoring), so the index now has to have a lookup table... messy.
> 
> Another fun thing about this representation is that you're storing 20 bytes of data (+ overhead) for common types like "void" where we could get away with one.
in the *short run* I'd suggest just printing the type name and using that as the representation.
I'm happy to (eventually) learn about the semantics of USRs in types, but not today :-)


================
Comment at: clangd/ExpectedTypes.h:68
+
+/// Represents a type of partially applied conversion. Should be treated as an
+/// opaque value and can only be used to check whether the types are converible
----------------
this represents a type (in the c++ sense), not a conversion, right?


================
Comment at: clangd/ExpectedTypes.h:69
+/// Represents a type of partially applied conversion. Should be treated as an
+/// opaque value and can only be used to check whether the types are converible
+/// between each other (by using the equality operator).
----------------
"convertible (using equality)" is confusing.

It sounds like "this is actually an equivalence class of types" but I think that's not true, because it's not symmetric.

Isn't the model here just "SType is a serializable token representing a type. They can be compared for equality."


================
Comment at: clangd/ExpectedTypes.h:72
+/// Representation is fixed-size, small and cheap to copy.
+class SType {
+public:
----------------
Is this a placeholder name? It's not clear what it means.

Suggest OpaqueType or ExpectedType


================
Comment at: clangd/ExpectedTypes.h:81
+  /// implementation attempts to store as little types as possible.
+  static llvm::SmallVector<SType, 2>
+  fromCompletionResult(ASTContext &Ctx, const CodeCompletionResult &R);
----------------
can we separate "get the representative set of types for R" from "encode them as SType"?
Seems like the APIs would be easier to test and understand.

(I think at least the former should be a non-member function BTW, to keep clear that SType itself isn't aware of any clever folding or whatnot)


================
Comment at: clangd/ExpectedTypes.h:82
+  static llvm::SmallVector<SType, 2>
+  fromCompletionResult(ASTContext &Ctx, const CodeCompletionResult &R);
+
----------------
coupling to CompletionResult seems premature here, can we stick to passing getExpectedType() until we know that abstraction needs to be broken?


================
Comment at: clangd/ExpectedTypes.h:91
+  ///
+  /// The result is a map from a type to a multiplier (>= 1) that denotes the
+  /// quality of conversion that had to be applied (better conversion receive
----------------
I don't understand the scale here. If better conversions get higher numbers, what number does "no conversion" get?
The code looks like worse conversions get higher numbers.
I'd suggest using an additive penalty to avoid confusion with scores, but really...

this all seems like YAGNI. Will a set do for now?


================
Comment at: clangd/ExpectedTypes.h:213
+
+void collectConvertibleFrom(ASTContext &Ctx, MockExpr Source,
+                            llvm::function_ref<void(PartialConv)> OutF);
----------------
why is implementing one of these directions not enough?

It should probably be:
As far as I can tell, derived-to-base is the tricky one here: it's an important conversion (albeit one we should leave out of the first patch), and you can't ask "what's convertible to base" since the answer is an open set you can't see.

So it seems the minimal set you need for handling pointer to base is `Type getRepresentative(Type)` and `set<Type> getRepresentativesAfterConversion(Type)` or so...


================
Comment at: clangd/ExpectedTypes.h:213
+
+void collectConvertibleFrom(ASTContext &Ctx, MockExpr Source,
+                            llvm::function_ref<void(PartialConv)> OutF);
----------------
sammccall wrote:
> why is implementing one of these directions not enough?
> 
> It should probably be:
> As far as I can tell, derived-to-base is the tricky one here: it's an important conversion (albeit one we should leave out of the first patch), and you can't ask "what's convertible to base" since the answer is an open set you can't see.
> 
> So it seems the minimal set you need for handling pointer to base is `Type getRepresentative(Type)` and `set<Type> getRepresentativesAfterConversion(Type)` or so...
names are unclear: is `collectConvertibleFrom(T)` the convertible-from types for T (i.e the types T is convertible from), or the types that are convertible from T?


Repository:
  rCTE Clang Tools Extra

https://reviews.llvm.org/D52273





More information about the cfe-commits mailing list