[PATCH] D133468: [clang] Implement divergence for TypedefType and UsingType

Thu Sep 29 02:00:39 PDT 2022

sammccall accepted this revision.
sammccall added a comment.

Sorry about the delay, I've been juggling too many things :-(

TL;DR: yes, I think it's reasonable. I think the implementation complexity is justified by what we get. Thanks for explaining!

I think to minimize interface complexity, we should basically consider this to just weaken the invariants of TypedefType etc, and have comments reflecting that. Accordingly like to see `inDivergent()` inverted and named something like `typeMatchesDecl()` if it needs to be exposed at all.

But since this is just naming nits and I'm having trouble staying responsive, LGTM from my side once you feel it's clear enough.

---

In D133468#3804290 <https://reviews.llvm.org/D133468#3804290>, @mizvekov wrote:

>   template <class T> struct A { using type1 = T; };
>   using Int = int;
>   
>   using type2 = A<Int>::type1;
>
> [...] We want the underlying type of `type2` to have that `Int` sugar. [...] The problem here is that the TypedefType is currently limited to having the declaration's underlying type, which is just from an instantiation of `A<int>`, and knows nothing about camel case `Int`.

Thanks, this is compelling! Couple of observations, not objections:

- applications often print the fully-sugared type (`type2`) and/or the canonical type (`int`) with nothing in between. This might limit the benefits - I've often wished we had better & sharable heuristics for how much desugaring users want.
- in this simple example we could naively solve the problem by having the class instantiation that it was instantiated with `Int` sugar, and making the original underlying type of A<int>::type1 as `Int`. (There are downsides here, you can obviously get the "wrong" sugar if there are multiple instantiation sites).

Anyway, I don't really want to argue scope/design, you've clearly thought way more about this, just want to give you more to think about. Question for this patch is just cost vs benefits.

> This is similar to the infamous problem with `std::string` turning into `std::basic_string` in templates.

Yes! Though half of my `basic_string` sugar problems are things like `substr`'s return type. This seems harder to solve in a principled way, though maybe heuristically doable when the written return type uses the injected-class-name? Anyway, again offtopic for this patch...

> We could emit a new TypedefDecl with the underlying type that we want, but creating declarations is expensive, so we want to avoid that.
> We could create a new AST node that represented more accurately what was happening. But the question is, is this worth the cost?
> Do you see use cases for this yourself?

>From the API here, "divergent" exposed as a property etc, I assumed you expected some clients to handle it. And I also expected that it would end up being in more nodes than TypedefType and UsingType - it's hard to see why it stops there.
In that case an additional node could improve code complexity: isolates+centralizes the code handling this case, makes it harder to miss one kind of divergent node, keeps the rest of the AST simpler for code that doesn't care about resugaring.

But it sounds more like we expect ~no programmatic clients to care, and that this change is mostly:

- drop the assumption that decl & type have the same sugar, because of resugaring
- add isDivergent() just to enable the text dump to provide hints to human analysis

> But the question is, is this worth the cost?

If you're talking about *runtime* cost - it's not obvious that it's significantly more expensive? IIUC wherever types are resugared we're having to duplicate the chain of sugar (e.g. a function that returns T** is going to be instantiated as functiontype(pointertype(pointertype(typedeftype(int)))) and then resugared as functiontype(pointertype(pointertype(/*divergent*/typedeftype(typedeftype(int))))). The difference between enlarging typedeftype with trailing objects vs adding an extra node feels like a drop in the bucket. That said, I can't think of a way to do this that's both compact *and* much simpler, and simplicity would be the point.

(I'm not really asking you to measure this: I know it's a bunch of work. Just want to make sure that if we're deciding about cheap vs expensive we're probing our intuition a bit)

> We want resugaring to be cheap and introduce as little changes in the AST as possible, to get a better chance of affording to have it to be on all the time.

Yes. In fact I've been scared by the discussion of "maybe we should have flags to ramp up/down sugar" as it's hard enough to write AST-wrangling code correctly against a single configuration :-)

> It does not seem too far fetched that the UsingType could point to a resugared version of the declaration's underlying type, without having to create a new declaration.

OK, I think I'm sold on this.
I think it's conceptually simpler to say "this may be the case" (weaken invariants) than say "there are two flavors of TypedefType: strong-invarant and weak-invariant". Divergence is then just an implementation detail (IOW non-divergence is a storage optimization).
I don't have a strong opinion on whether this *only* means change the emphasis of the documentation, or we should also remove the `isDivergent()` accessor. If there's another way to achieve the debugging goals I'd lean toward the latter.

================
Comment at: clang/include/clang/AST/Type.h:4492
   UsingShadowDecl *getFoundDecl() const { return Found; }
+  bool isDivergent() const { return UsingBits.isDivergent; }
   QualType getUnderlyingType() const;
----------------
IIUC clients are *not* expected to care about this.
This isn't obvious, I think we should do one of:
 - remove the accessor
 - make it private and friend the things that need it
 - add a comment that makes it clear this is a detail (e.g. describe it as a storage optimization for simple cases)

I think it's worth inverting the boolean sense, e.g. call it `UsesDeclaredType`: if clients should assume in general that these have different sugar, it's when they *match* that's the special case.

================
Comment at: clang/include/clang/AST/Type.h:4493
+  bool isDivergent() const { return UsingBits.isDivergent; }
   QualType getUnderlyingType() const;

----------------
I think getUnderlyingType() is the right place to document this new wrinkle in the AST.

Something like:
```
The returned type might have different sugar than the `getFoundDecl()`.
This occurs e.g. when we "resugar" types accessed through template instantiations.
```

================
Comment at: clang/lib/AST/ASTContext.cpp:4656
+  }
+  if (Underlying.isNull() || Decl->getUnderlyingType() == Underlying)
+    return QualType(Decl->TypeForDecl, 0);
----------------
Probably worth a comment somewhere in this function: we can have different underlying sugar types for the same decl, TypeForDecl points to the one whose sugar matches the decl.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D133468/new/

https://reviews.llvm.org/D133468