[Lldb-commits] [clang] [clang-tools-extra] [lldb] [clang] Reland: Instantiate alias templates with sugar (PR #101858)

Dmitri Gribenko via lldb-commits lldb-commits at lists.llvm.org
Wed Aug 7 07:18:22 PDT 2024


gribozavr wrote:

## On resugaring bodies of function template instantiations

> I just don't think that can be something we can always enable, as that would make some examples of source code take exponentially more time and memory to compile.

We keep track of propagated and inferred nullability annotations in a [side data structure](https://github.com/google/crubit/blob/main/nullability/type_nullability.h#L185). Our approach allows us to save on memory by not actually re-instantiating everything with new sugar. 

But I do have a question - how is the built-in resugarer going to do this without maintaining a side data structure like we do? Clang currently does not have a concept of non-canonical instantiations. `std::vector<int*>`, `std::vector<absl::Nullable<int*>>`, `std::vector<absl::Nonnull<int*>>` are the same canonical type, so they must share one instantiation. Are you also proposing to change that?

> Right now the priority is to implement the stuff that won't take a big hit on perf. But that is just priority.

But wouldn't it be better if out-of-tree users like us were not blocked on the priorities of your work? With the `Subst*` nodes in the AST the information is available for out-of-tree users to use in any way they like. Are you sure you can predict all use cases? Do you really want to be in the business of satisfying them?

---

## On (in)completeness of the out-of-tree resugarer

> I don't think, from looking at the resugarer implemented in type_nullability.cc, it's anywhere close to being able to resugar through a complex type like an STL vector, as shown in the example.

Actually, resugaring in nullability analysis already works for `std::vector::push_back()`:

```c++
#include <vector>
#include "absl/base/nullability.h"

void Test() {
  int *p = nullptr;
  std::vector<absl::Nonnull<int*>> xs;
  xs.push_back(p);
}
```

```
$ clang-tidy example.cc
example.cc:7:16: warning: expected parameter '__x' of 'std::vector<int *>::push_back' to be nonnull, but a nullable argument was used [google-runtime-pointer-nullability]
    7 |   xs.push_back(p);
      |                ^
```

> as for example both libc++ and libstdc++ vector implementations derive the element type not through the first template argument, but through the allocator instead.

As far as I see, libc++ does not actually use the allocator to define `const_reference`, `value_type` and so on:

```c++
template <class _Tp, class _Allocator /* = allocator<_Tp> */>
class _LIBCPP_TEMPLATE_VIS vector {
  typedef _Tp value_type;
  typedef value_type& reference;
  typedef const value_type& const_reference;

  /* ... */ void push_back(const_reference __x);
```

(see https://github.com/llvm/llvm-project/blob/main/libcxx/include/vector)

> I see the implementation of the resugarer in `type_nullability.cc`, and it's much more incomplete that the one I am proposing to upstream. So I think if you get it to work through upstream transforms, you will get there faster.

I'm not sure what you mean by "get there faster" - our implementation already works, is already deployed internally, and covers a lot of cases that are relevant in practice. We are still tweaking things of course, and we are working on inference tooling (that infers nullability contracts based on the implementation and suggests edits to the source code accordingly).

That is, the difficult part and the focus at the moment is not resugaring in the analysis. I do admit that our implementation of resugaring is probably not as comprehensive as the one that you're working on, but it works for the vast majority of real-world code that people write in our monorepo - we tweaked it based on the cases that we actually observed.

---

## On upstreaming the work

> As far as I can see, you don't need to keep the Subst* nodes after resugaring if you implement:

We are in the process of adding Clang's `_Nullable` and `_Nonnull` attributes to the implementation of `absl::Nullable` and `absl::Nonnull`. This work is taking time because we need to adjust our internal codebase to conform to Clang's existing expectations, behaviors, and warnings that get triggered by these attributes. These attributes will get picked up by the in-tree resugaring implementation because the source of the annotation is physically spelled in the code. Good.

However, a part of the nullability language extension is the [pragma that changes the default meaning of pointers](https://github.com/google/crubit/blob/main/nullability/test/pragma_nonnull.h#L6) from "unknown" to "nonnull". That is, we interpret the following two files in an identical way:

```c++
#include "absl/base/nullability.h"

#pragma nullability file_default nonnull

void TakePointer(int *p);
```

```c++
#include "absl/base/nullability.h"

void TakePointer(absl::Nonnull<int *> p);
```

Unfortunately we expect this pragma to be a controversial upstream because Clang already has a pragma for this exact purpose (`#pragma clang assume_nonnull begin/end`), but it was introduced by Apple in context of Objective-C and some of its behavior is incompatible with C++. It is long story and the issues are subtle, this is not the right place to discuss it.

In the end, we need a separate pragma with slightly different semantics because Clang's pragma was not suitable for C++. Getting these differences reconciled can take months in discussions alone, and we don't want to spend time on this when we are still not 100% sure that the model is going to work for us in the long run.

---

Fundamentally, your proposal to reuse the built-in resugarer depends on upstreaming an incomplete language extension, which cuts against Clang's extension policy, https://clang.llvm.org/get_involved.html:

> Evidence of a significant user community: This is based on a number of factors, including an existing user community

I don't know how to demonstrate a significant user community if the extension must live in the Clang core. Of course we could fork Clang, but we would need to maintain this fork for years. We have been working on nullability since April 2020 - that's 4+ years already. The cost involved in maintaining a Clang fork for 4+ years, compared to an out-of-tree ClangTidy check, is just unpalatable.

We're lucky that Clang upstream already has nullability annotations that happen to be a mostly good match (modulo the pragma and some other issues). But what about our work on lifetimes?

---

## On separating resugaring improvements and `Subst*` removal

Let me raise the level of the discussion here. We do appreciate your work on improving resugaring. However, simultaneously removing the `SubstTemplateTypeParmType` is breaking one out-of-tree language extension that is already deployed (nullability) and is blocking research and prototyping on another out-of-tree extension (lifetimes).

I do want to emphasize that nullability is already deployed, we depend on nullability warnings internally, and developers benefit from them greatly. We are still tweaking the semantics though (e.g., we are in the process of introducing the pragma throughout the codebase etc.), and that's why it is not the right time to upstream things yet.

For lifetimes work, the language extension is not yet deployed, but further experimentation and research would be blocked if `Subst*` nodes are removed.

Is it possible to decouple landing the improvements in resugaring and removal of `SubstTemplateTypeParmType`? The first you can land immediately, but the second blocks our work. Is it separable?

---

## On `Subst*` removal and AST fidelity

Could you provide a more detailed rationale for removing `SubstTemplateTypeParmType`?

Generally Clang tries to preserve information in the AST as much as possible. Isn't removing this node going against this principle?

https://github.com/llvm/llvm-project/pull/101858


More information about the lldb-commits mailing list