[PATCH] D133468: [clang] Implement divergence for TypedefType and UsingType

Thu Sep 29 04:19:36 PDT 2022

mizvekov added a comment.

In D133468#3823143 <https://reviews.llvm.org/D133468#3823143>, @sammccall wrote:

> Sorry about the delay, I've been juggling too many things :-(

No worries, thanks!

> TL;DR: yes, I think it's reasonable. I think the implementation complexity is justified by what we get. Thanks for explaining!
>
> I think to minimize interface complexity, we should basically consider this to just weaken the invariants of TypedefType etc, and have comments reflecting that. Accordingly like to see `inDivergent()` inverted and named something like `typeMatchesDecl()` if it needs to be exposed at all.

Yeah, I think it's better to just not expose it in general.
One minor problem with inverting it is that we would make the most common case the positive case. Though if we want to avoid the churn and noise, on the TextNode dumper we will probably want to add a flag only in the negative case.

> Thanks, this is compelling! Couple of observations, not objections:
>
> - applications often print the fully-sugared type (`type2`) and/or the canonical type (`int`) with nothing in between. This might limit the benefits - I've often wished we had better & sharable heuristics for how much desugaring users want.

Yeah I am aware, the `aka` handling will step over typedefs, we will have to work on a better solution for that separate problem.

> - in this simple example we could naively solve the problem by having the class instantiation that it was instantiated with `Int` sugar, and making the original underlying type of A<int>::type1 as `Int`. (There are downsides here, you can obviously get the "wrong" sugar if there are multiple instantiation sites).

Yeah, I think it could cause too many surprises if we instantiated only one time per canonical type, but then used the sugar for the first instantiation:

- It would cause surprising behavior when instantiating with attributes which have effects beyond meaning for the user. Think over-aligned typedefs and such.
- Unless resugaring is on and always able to correct it when printing a diagnostic, it could produce references to types which are distant from the context the problem occurred.

> Yes! Though half of my `basic_string` sugar problems are things like `substr`'s return type. This seems harder to solve in a principled way, though maybe heuristically doable when the written return type uses the injected-class-name? Anyway, again offtopic for this patch...

Yeah, that for now is out of scope of the whole project even, except that maybe we could make this work for templated explicit object parameter.

The injected class name approach feels like it would even come short of solving it, even ignoring false positives. I think this would be a problem we would want to solve even for methods of regular (non-template) classes.

But it does not seem like it would justify the cost of templating a parameter just for the return type sugar.

> And I also expected that it would end up being in more nodes than TypedefType and UsingType - it's hard to see why it stops there.

It won't stop here, we can do this for any type node which gets its sugared type from some other object which is expensive to rebuild (declarations, expressions, etc)

TypedefType is the most critical here, and UsingType is it's secret twin, so might as well do both in the same patch.

> But it sounds more like we expect ~no programmatic clients to care, and that this change is mostly:
>
> - drop the assumption that decl & type have the same sugar, because of resugaring
> - add isDivergent() just to enable the text dump to provide hints to human analysis

Yeah that was the point of my question, I don't see why any user would care, and just rather not add complexity here. Glad that you agree :)

> If you're talking about *runtime* cost - it's not obvious that it's significantly more expensive? IIUC wherever types are resugared we're having to duplicate the chain of sugar (e.g. a function that returns T** is going to be instantiated as functiontype(pointertype(pointertype(typedeftype(int)))) and then resugared as functiontype(pointertype(pointertype(/*divergent*/typedeftype(typedeftype(int))))). The difference between enlarging typedeftype with trailing objects vs adding an extra node feels like a drop in the bucket. That said, I can't think of a way to do this that's both compact *and* much simpler, and simplicity would be the point.

No, I am not worried about the difference in reusing TypedefType + extra bit, versus making it a derived class or some such.
I was more worried about the cost of of adding AST nodes which explain how a type was resugared, for example providing a pointer to the naming context used to resugar. That seems too expensive, for no utility.

> Yes. In fact I've been scared by the discussion of "maybe we should have flags to ramp up/down sugar" as it's hard enough to write AST-wrangling code correctly against a single configuration :-)

Yeah, I see the point, I would like to avoid flags as well. Even a flag for resugaring on-off seems problematic, as resugaring introduces a bunch of novel sugaring configurations.

> OK, I think I'm sold on this.
> I think it's conceptually simpler to say "this may be the case" (weaken invariants) than say "there are two flavors of TypedefType: strong-invarant and weak-invariant". Divergence is then just an implementation detail (IOW non-divergence is a storage optimization).

Yeah exactly, I don't see a point in exposing the flag to the user except as a debugging aid.
The difference here could potentially break weird/dubious code, ie code that wants to analyze the type but traverses TypedefTypes through the declaration, instead of the more straightforward `desugar` route.

> I don't have a strong opinion on whether this *only* means change the emphasis of the documentation, or we should also remove the `isDivergent()` accessor. If there's another way to achieve the debugging goals I'd lean toward the latter.

The debugging goal can be easily achieved by comparing the underlying type against the declaration's type, so it's not critical at all, there is just the advantage of having a single implementation to query this.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D133468/new/

https://reviews.llvm.org/D133468