[PATCH] D52273: [clangd] Initial implementation of expected types

Thu Nov 15 08:13:22 PST 2018

ilya-biryukov marked 2 inline comments as done.
ilya-biryukov added inline comments.

================
Comment at: clangd/ExpectedTypes.cpp:12
+
+static llvm::Optional<QualType> toEquivClass(ASTContext &Ctx, QualType T) {
+  if (T.isNull() || T->isDependentType())
----------------
sammccall wrote:
> returning QualType vs Type*? It seems we strip all qualifiers, seems clearest for the return type to reflect that.
Done. That produces a bit more trouble at the callsites, so not sure if it's an improvement overall.

================
Comment at: clangd/ExpectedTypes.cpp:15
+    return llvm::None;
+  T = T.getCanonicalType().getNonReferenceType().getUnqualifiedType();
+  if (T->isIntegerType() && !T->isEnumeralType())
----------------
sammccall wrote:
> Maybe we want Ctx.getUnqualifiedArrayType here or (more likely?) do array-to-pointer decay?
Added array-to-pointer decays, they should improve ranking when assigning from an array to a pointer, which is nice.
Also added a FIXME that we should drop qualifiers from inner types of the pointers (since we do this for arrays). I think it's fine to leave it for the later improvements.

================
Comment at: clangd/ExpectedTypes.cpp:16
+  T = T.getCanonicalType().getNonReferenceType().getUnqualifiedType();
+  if (T->isIntegerType() && !T->isEnumeralType())
+    return QualType(Ctx.IntTy);
----------------
sammccall wrote:
> wow, "enumeral" might be my favorite c++-made-up word, displacing "emplace"...
 ¯\_(ツ)_/¯

================
Comment at: clangd/ExpectedTypes.cpp:30
+    return llvm::None;
+  QualType T = VD->getType().getCanonicalType().getNonReferenceType();
+  if (!T->isFunctionType())
----------------
sammccall wrote:
> nit: is canonicalization necessary here? you do it in toEquivClass
> (I guess dropping references is, for the function type check)
It was not important, removed it.

================
Comment at: clangd/ExpectedTypes.cpp:46
+  llvm::SmallString<128> Out;
+  if (index::generateUSRForType(T, Ctx, Out))
+    return llvm::None;
----------------
sammccall wrote:
> I think ultimately we may want to replace this with a custom walker:
>  - we may want to ignore attributes (e.g. const) or bail out in some cases
>  - generateUSRForType may not have the exact semantics we want for other random reasons
>  - we can do tricks with hash_combine to avoid actually building huge strings we don't care about
> 
> not something for this patch, but maybe a FIXME?
> 
USRs actually seems like a pretty good fit here. I'm not sure dropping attributes for internal types would make a big difference in the scoring and not sure how big of a problem the strings are, would be nice to actually learn it's a problem (in memory consumption, memory alloc rates, etc) before changing this.

It's definitely possible to do that, of course, we have a room to change the encoding whenever we want, but would avoid adding a FIXME and committing to this approach in the initial patch.

================
Comment at: clangd/ExpectedTypes.h:10
+// A simplified model of C++ types that can be used to check whether they are
+// convertible between each other without looking at the ASTs.
+// Used for code completion ranking.
----------------
ilya-biryukov wrote:
> ioeric wrote:
> > We might want to formalize what "convertible" means here. E.g. does it cover conversion between base and derived class? Does it cover double <-> int conversion?
> I want to leave it vague for now. Convertible means whatever we think is good for code completion ranking.
> Formalizing means we'll either dig into the C++ encoding or be imprecise. 
> 
> Happy to add the docs, but they'll probably get outdated on every change. Reading the code is actually simpler to get what's going on at this point.
Added a clarification that we want "convertible for the purpose of code completion".

================
Comment at: clangd/ExpectedTypes.h:29
+namespace clangd {
+/// An opaque representation of a type that can be computed based on clang AST
+/// and compared for equality. The encoding is stable between different ASTs,
----------------
ioeric wrote:
> The name seems opaque ;) Why is it `opaque`? 
Removed the "opaque" from the comment, hopefully this causes less confusion.
The idea is that users shouldn't rely on this representation in any other way than comparing it for equality.

================
Comment at: clangd/ExpectedTypes.h:32
+/// this allows the representation to be stored in the index and compared with
+/// types coming from a different AST later.
+class OpaqueType {
----------------
sammccall wrote:
> Does this need to be a separate class rather than using `std::string`?
> There are echoes of `SymbolID` here, but there were some factors that don't apply here:
>  - it was fixed-width
>  - memory layout was important as we stored lots of these in memory
>  - we hashed them a lot and wanted a specific hash function
> 
> I suspect at least initially producing a somewhat readable std::string a la USRGeneration would be enough.
Would still want to keep it as a marker type just for the sake of indicating what we return and documentation purposes.
It also adds some type safety (granted, not much) for some use-cases.

There's still an option to go strings with `rawStr()` if needed.

================
Comment at: clangd/ExpectedTypes.h:40
+
+  /// Get the raw byte representation of the type. Bitwise equality of the raw
+  /// data is equivalent to equality operators of SType itself. The raw
----------------
ilya-biryukov wrote:
> ioeric wrote:
> > What is the raw representation? A hash or the type name or USR?
> A string representation of the usr, but users shouldn't rely on it.
> The contract is: you can use it to compare for equality and nothing else, so the comment is actually accurate :-)
Clarified that we leave a room for ourselves to change the encoding we use.

================
Comment at: unittests/clangd/ExpectedTypeTest.cpp:79
+         << (Matched ? "matched" : "did not match")
+         << "\n\tTarget type: " << To.getAsString()
+         << "\n\tSource value type: " << V->getType().getAsString();
----------------
sammccall wrote:
> note that if you think it's useful you can To.dump(*L->stream())
> Maybe this is more interesting if/when we have a custom visitor.
>From the personal experience, looking at the string representation is usually to figure out what's wrong and dumping wouldn't actually help.
Will probably punt on this for now, happy to reconsider when we'll have a use-case for this.

================
Comment at: unittests/clangd/ExpectedTypeTest.cpp:92
+
+TEST_F(ExpectedTypeConversionTest, BasicTypes) {
+  build(R"cpp(
----------------
sammccall wrote:
> I really like the declarative equivalence-class setup of the tests.
> 
> A couple of suggestions:
>  - maybe store the equivalence classes as groups of strings rather than decls, and lazily grab the decls. It's easier to tersely represent them...
>  - I think the "convertibleTo" DSL obscures/abstracts the actual APIs you're testing - they build opaque types, and you're asserting equality.
>  - pairwise assertion messages may not give enough context: if you expect a == b == c, and a != b, then whether a == c and b == c are probably relevant
> 
> I'd consider actually building up the equivalence classes `map<OpaqueType, set</*decl*/string>>` and writing a `MATCHER_P2(ClassesAre, /*vector<set<string>>*/Classes, /*ParsedAST*/AST, "classes are " + testing::PrintToString(Classes))`
> 
> That way the actual and expected equivalence classes will be dumped on failure, and you can still grab the decls/types from the AST to dump their details.
Thanks, this approach works most of the time.
The 'FunctionReturns'  test actually relies on the asymmetrical nature of the API, so I had to leave the old API too, but it actually looks much nicer there.

Repository:
  rCTE Clang Tools Extra

https://reviews.llvm.org/D52273