[clang-tools-extra] [clangd] [HeuristicResolver] Protect against infinite recursion on DependentNameTypes (PR #83542)

Sat Mar 2 23:50:59 PST 2024

HighCommander4 wrote:

> here is my analysis:
> 
> (Let's use the example from your test case)
> 
> ```c++
> template <int N>
> struct waldo {
>   using type = waldo<N - 1>::type::next;
> };
> ```
> 
> So, we're somehow attempting to resolve the type/decl for `next`, which is of `DependentNameType`. To achieve that, we want to inspect the type of its prefix i.e. the `waldo<N - 1>::type`, which is another `NestedNameSpecifier` and thus we shall find out the type of `waldo<N - 1>` first - which is of `ClassTemplateSpecializationType`. Then, we would perform a name lookup for `type` within the context thereof, which results in the solely `TypedefType` we're currently in.
> 
> Then we move on and try to find out what the `next` is with the `TypedefType` as its prefix. This would happen in `resolveDependentMember`, with `T` as `TypedefType` and `Name` as `next`. We expect that prefix to be of a `RecordType` so that we can perform a name lookup. That is `resolveTypeToRecordDecl`, and the type alias would get unwrapped and become a `DependentNameType` in the end.
> 
> And now we're successfully getting into an infinite loop: we want to know the type of `type`, which ends up a type alias `type` that would boil down to itself!

Thanks for the detailed writeup of how the issue arises! This matches my understanding.

Another way to think about it is that `waldo` is like a recursive compile-time function (`waldo<N>::type` is a function of `waldo<N - 1>::type`). In a real-world scenario, the recursion would be terminated by a base case expressed as a template specialization (for example `waldo<0>`). However, `HeuristicResolver` always looks at the primary template (that is the [heuristic aspect](https://searchfox.org/llvm/rev/bf08d0286825eb3e482bcfdc1cc7c19a28441ae7/clang-tools-extra/clangd/HeuristicResolver.h#43-45) of it), it will never look at a specialization, so its analysis of such a recursive construct would never terminate even if the code does in fact contain a base case.

> The PR resolves it by tracking every seen DependentNameType with a set. And since we don't want to turn the stateless `HeuristicResolver` to something stateful (e.g. adding extra data members), we have to add another layer to hide our implementation - this is what you've done.
> 
> However, I think we could track these `DependentNameType`s by adding some extra parameters. Specifically, we can pass through the `DNT` from `resolveDependentNameType` to `resolveDependentMember` and `resolveTypeToRecordDecl`, then inside `resolveTypeToRecordDecl`, we bail out with null if the unwrapped `DependentNameType` from `T` is the same as the `DNT` - I did an experiment and it turns out to be working.

Thanks for the alternative suggestion!

I've considered this, but it's slightly less general than my fix. It handles the case where the we arrive back to the same DependentNameType in a **single** level of recursion, but we can also construct cases where it takes two or more levels.

Here is an example with two levels:

```c++
template <int N>
struct odd;

template <int N>
struct even {
  using type = typename odd<N - 1>::type::next;
};

template <int N>
struct odd {
  using type = typename even<N - 1>::type::next;
};
```

To handle such cases, we need to keep track of a set of seen DependentNameTypes (hence the `llvm::SmallSet<DependentNameType*>`), rather than just one.

We could also consider a variation of your approach where we have a `SmallSet<DependentNameType*>`, but we pass it as an argument to `resolveDependentMember` etc., rather than storing it as a member variable.

On the whole, I prefer the member variable approach for a couple of reasons:
 * The call chain `resolveDependentNameType --> resolveDependentMember --> resolveTypeToRecordDecl --> resolveDependentNameType` is just one way that `resolveDependentNameType` can call itself recursively. Another possible call chain is `resolveDependentNameType --> resolveNestedNameSpecifierToType --> resolveDependentMember --> resolveTypeToRecordDecl --> resolveDependentNameType`, which would require propagating the `SmallSet<DependentNameType*>` parameter to additional functions. By making it a member variable, we handle all call chains, including ones potentially introduced in the future.
 * In the future we may run into scenarios where we get into a recursion on some other AST node type (e.g. a `Decl` or some such). The refactoring in this patch makes it easy to add protection for that by e.g. adding a `SmallSet<Decl*> SeenDecls` member as well (vs. having to add that as an additional parameter to various methods).

https://github.com/llvm/llvm-project/pull/83542