[PATCH] D116203: [clang] adds unary type transformations as compiler built-ins

Mon Apr 25 17:05:57 PDT 2022

rsmith added inline comments.

================
Comment at: clang/include/clang/AST/TransformTypeTraits.def:27
+TRANSFORM_TYPE_TRAIT_DEF(RemoveVolatile, remove_volatile)
+TRANSFORM_TYPE_TRAIT_DEF(EnumUnderlyingType, underlying_type)
----------------
This file should undef the `TRANSFORM_TYPE_TRAIT_DEF` macro at the end, like our other `.def` files do. Then you can remove all of the `#undef`s elsewhere in this patch.

================
Comment at: clang/include/clang/AST/Type.h:4567-4582
+    EnumUnderlyingType,
+    AddLvalueReference,
+    AddPointer,
+    AddRvalueReference,
+    Decay,
+    MakeSigned,
+    MakeUnsigned,
----------------
Consider using the `.def` file here.

================
Comment at: clang/include/clang/AST/Type.h:6528
+  (void)IsCPlusPlus;
+  assert(IsCPlusPlus && "Referenceable types only make sense in C++");
+  const Type &Self = **this;
----------------
I think the existence of the `IsCPlusPlus` flag may have the opposite effect of the one that's intended:

* Without this flag, looking only at the declaration of this function, I would assume you're not supposed to call this function except in C++ mode.
* With this flag, looking only at the declaration of this function, I would assume that passing in `IsCPlusPlus = false` would cause the function to always return `false` (because there are no reference types).

It might be better to define "referenceable" as "function type with no ref-qualifier nor cv-qualifier, object type, or reference type", and just accept calls to `isReferenceable` across all language modes.

================
Comment at: clang/include/clang/AST/Type.h:6532-6537
+  if (!Self.isFunctionType())
+    return false;
+
+  const auto *F = Self.getAs<FunctionProtoType>();
+  assert(F != nullptr);
+  return F->getMethodQuals().empty() && F->getRefQualifier() == RQ_None;
----------------
It's better not to test whether the type is a function type (walking past sugar etc) more than once.

================
Comment at: clang/include/clang/Sema/DeclSpec.h:340
   /*TSS*/unsigned TypeSpecSign : 2;
-  /*TST*/unsigned TypeSpecType : 6;
+  /*TST*/ unsigned TypeSpecType : 7;
   unsigned TypeAltiVecVector : 1;
----------------
Please keep the formatting consistent here, even if you need to override clang-format.

================
Comment at: clang/include/clang/Sema/DeclSpec.h:407
   static bool isTypeRep(TST T) {
-    return (T == TST_typename || T == TST_typeofType ||
-            T == TST_underlyingType || T == TST_atomic);
+    static const llvm::SmallVector<TST, 19> validTSTs = {
+        TST_atomic,
----------------
aaron.ballman wrote:
> Should we find a way we can drop the `19` here so that this doesn't become incorrect when someone adds anything to `TransformTypeTraits.def`?
It'd be nice to just compare `T` against the least and greatest trait value -- this linear scan seems more expensive than we might want. (And sure, we don't care about efficiency given that this function is only used by asserts, but I don't think we should assume it'll remain that way.)

Eg, something like:

```
constexpr TST Traits[] = {
#define TRANSFORM_TYPE_TRAIT_DEF(_, Trait) TST_##Trait,
#include "clang/AST/TransformTypeTraits.def"
#undef TRANSFORM_TYPE_TRAIT_DEF
};
static_assert(std::is_sorted(std::begin(Traits), std::end(Traits)));
return T == TST_atomic || T == TST_typename || T == TST_typeofType || (T >= Traits[0] && T <= std::end(Traits)[-1]);
```

================
Comment at: clang/lib/AST/ItaniumMangle.cpp:3962
   if (T->isDependentType()) {
-    Out << 'U';
+    Out << "Uu";

----------------
This should just be `u`, not `Uu`.

================
Comment at: clang/lib/AST/ItaniumMangle.cpp:3976

   mangleType(T->getBaseType());
 }
----------------
You're missing the `I`...`E` around this argument.

================
Comment at: clang/lib/Parse/ParseDecl.cpp:3419-3422
     case tok::kw___super:
     case tok::kw_decltype:
     case tok::identifier: {
+    ParseIdentifier:
----------------
Our normal convention is to put the `{` after the last label in a sequence of labels.

================
Comment at: clang/lib/Parse/ParseDecl.cpp:4155
+#undef TRANSFORM_TYPE_TRAIT_DEF
+      if (!ParseTypeTransformTypeSpecifier(DS))
+        goto ParseIdentifier;
----------------
The name signature you're using for this function is inconsistent with the conventions in the rest of the parser: when a `Parser::Parse...` function with a `bool` return type returns `true`, it means "I have failed and issued a diagnostic". For "parse this if possible and return whether you did", we usually use `Parser::MaybeParse...` (eg, `MaybeParseAttributes`).

Alternatively you could do the one-token lookahead here. If the `setKind` call is here, it'll be a lot clearer why it makes sense to `goto ParseIdentifier;`.

Also, for workarounds for a standard library issue, we normally include a `// HACK:` comment explaining what we're doing and why.

================
Comment at: clang/lib/Parse/ParseExpr.cpp:1041-1044
   case tok::identifier: {      // primary-expression: identifier
                                // unqualified-id: identifier
                                // constant: enumeration-constant
+  ParseIdentifier:
----------------
Our normal convention is to put the `{` after the last label in a sequence of labels.

================
Comment at: clang/lib/Parse/ParseExpr.cpp:1750
+#undef TRANSFORM_TYPE_TRAIT_DEF
+    if (NextToken().is(tok::less)) {
+      Tok.setKind(tok::identifier);
----------------
Here you're checking for `trait<` and treating it as an identifier; elsewhere you're checking for `trait(` and treating it as a builtin. Is there a reason to treat `trait` followed by a token other than `(` or `<` inconsistently?

================
Comment at: clang/lib/Parse/ParseStmt.cpp:185-186

   case tok::identifier: {
+  ParseIdentifier:
     Token Next = NextToken();
----------------
As before, `{` should go after the last label.

================
Comment at: clang/lib/Sema/SemaType.cpp:6049-6050
+      // Make sure it is a unary transform type.
+      assert(DS.getTypeSpecType() >= DeclSpec::TST_add_lvalue_reference &&
+             DS.getTypeSpecType() <= DeclSpec::TST_underlying_type);
       TL.setKWLoc(DS.getTypeSpecTypeLoc());
----------------
This assert will start failing if someone adds another trait that isn't in this range. Can we do something more robust?

One possibility would be to extend the `.def` file with macros for selecting the first and last trait only. You could also use that in `DeclSpec::isTypeRep()`. Another would be to factor out the part of `isTypeRep()` that deals with this so you can call it from here.

================
Comment at: clang/lib/Sema/SemaType.cpp:9154
+  constexpr auto UKind = UTTKind::RemovePointer;
+  if (!BaseType->isPointerType() && !BaseType->isObjCObjectPointerType())
+    return Context.getUnaryTransformType(BaseType, BaseType, UKind);
----------------
This is `!BaseType->isAnyPointerType()`.

What about block pointers? I think this patch is doing the right thing here, by saying that this only applies to pointers spelled with `*` (plain and obj-c pointers) and not to pointers spelled with `^`, but it seems worth calling out to ensure that I'm not the only one who's thought about it :)

================
Comment at: clang/lib/Sema/SemaType.cpp:9174
+  Qualifiers Quals = Underlying.getQualifiers();
+  Quals.removeCVRQualifiers();
+  return Context.getUnaryTransformType(
----------------
This seems inconsistent: the specification says that `std::decay_t` produces `std::remove_cv_t` in this case, but your implementation of `std::remove_cv_t` doesn't remove `restrict` and this does. If this is intentional, I think it warrants a comment.

================
Comment at: clang/lib/Sema/SemaType.cpp:9182-9192
+QualType Sema::BuiltinAddReference(QualType BaseType, UTTKind UKind,
+                                   SourceLocation Loc) {
+  DeclarationName EntityName(BaseType.getBaseTypeIdentifier());
+  QualType PointerToT =
+      QualType(BaseType).isReferenceable(LangOpts.CPlusPlus)
+          ? BuildReferenceType(BaseType,
+                               UKind == UnaryTransformType::AddLvalueReference,
----------------
Should we even expose this trait in non-C++ languages? Making this a C++-only keyword seems better than making it a no-op in C.

================
Comment at: clang/lib/Sema/SemaType.cpp:9196-9205
+  if (!BaseType->isArrayType())
+    return Context.getUnaryTransformType(BaseType, BaseType, UKind);
+  const ArrayType *ArrayType = BaseType->getAsArrayTypeUnsafe();
+  QualType ElementType =
+      UKind == UnaryTransformType::RemoveAllExtents
+          ? QualType(ArrayType->getBaseElementTypeUnsafe(), {})
+          : ArrayType->getElementType();
----------------
This looks like it'll lose all qualifiers other than CVR qualifiers. Presumably that's not the intent. (This will also unnecessarily remove type sugar in some cases, but that's not super important here.) Please also don't use copy-list-initialization to form a `QualType`; see https://llvm.org/docs/CodingStandards.html#do-not-use-braced-initializer-lists-to-call-a-constructor

See the suggested edits for an alternative, simpler formulation (that preserves as much type sugar as possible).

================
Comment at: clang/lib/Sema/SemaType.cpp:9240
+
+  QualType Underlying(Split.Ty, Quals.getAsOpaqueValue());
+  return Context.getUnaryTransformType(BaseType, Underlying, UKind);
----------------
It doesn't look like this will properly remove qualifiers from within an array type in general, eg:
```
using T = const volatile int;
using U = T[3];
using V = __remove_cv(U);
```

Also, you can't use the opaque value of a `Qualifiers` object like this -- it's opaque, and is not always just a CVR mask, which is what the `QualType` constructor wants. This will assert or otherwise misbehave if there are non-CVR qualifiers on the type.

The simple thing to do would be:

```
Qualifiers Quals;
QualType Unqual = Context.getUnqualifiedArrayType(BaseType, Quals);
// ... remove stuff from Quals ...
QualType Result = Context.getQualifiedType(BaseType, Quals);
```

It'd be nice to leave the type sugar alone to the extent that we can, but it doesn't really matter too much given that this will usually be wrapped inside a type-sugar-erasing template anyway.

================
Comment at: clang/lib/Sema/SemaType.cpp:9246-9247
                                        SourceLocation Loc) {
-  switch (UKind) {
-  case UnaryTransformType::EnumUnderlyingType:
-    if (!BaseType->isDependentType() && !BaseType->isEnumeralType()) {
-      Diag(Loc, diag::err_only_enums_have_underlying_types);
-      return QualType();
-    } else {
-      QualType Underlying = BaseType;
-      if (!BaseType->isDependentType()) {
-        // The enum could be incomplete if we're parsing its definition or
-        // recovering from an error.
-        NamedDecl *FwdDecl = nullptr;
-        if (BaseType->isIncompleteType(&FwdDecl)) {
-          Diag(Loc, diag::err_underlying_type_of_incomplete_enum) << BaseType;
-          Diag(FwdDecl->getLocation(), diag::note_forward_declaration) << FwdDecl;
-          return QualType();
-        }
+  if (BaseType->isDependentType())
+    return Context.getUnaryTransformType(BaseType, BaseType, UKind);
+  bool IsMakeSigned = UKind == UnaryTransformType::MakeSigned;
----------------
This check seems unnecessary: we shouldn't call any of these functions for a dependent type, and it looks like we don't.

================
Comment at: clang/lib/Sema/SemaType.cpp:9251
+      BaseType->isBooleanType()) {
+    Diag(Loc, diag::err_make_signed_integral_only) << IsMakeSigned << BaseType;
+    return QualType();
----------------
Should we support vector types here?

================
Comment at: clang/lib/Sema/SemaType.cpp:9257-9258

-        EnumDecl *ED = BaseType->castAs<EnumType>()->getDecl();
-        assert(ED && "EnumType has no EnumDecl");
+  // Clang doesn't seem to have a way to distinguish `char` from `signed char`
+  // and `unsigned char`
+  if (IsMakeSigned == IsSigned && !IsNonCharIntegral)
----------------
I don't understand this comment. `char`, `signed char`, and `unsigned char` are three different types, and you can certainly tell them apart if you want to, for example using `BaseType->isSpecificBuiltinType(BuiltinType::SChar)` or `Context.hasSameType(BaseType, Context.SignedCharTy)` to detect `signed char`.

================
Comment at: clang/lib/Sema/SemaType.cpp:9265-9266
+          ? Context.getBitIntType(!IsMakeSigned, Context.getIntWidth(BaseType))
+          : Context.getIntTypeForBitwidth(Context.getIntWidth(BaseType),
+                                          IsMakeSigned);
+  Qualifiers Quals = Underlying.getQualifiers();
----------------
This is wrong: if, say, `int` and `long` are the same bit-width, this will give the same type for `__make_unsigned(int)` and `__make_unsigned(long)`, where they are required to produce `unsigned int` and `unsigned long` respectively.

Please look at `Context.getCorrespondingSignedType` and `Context.getCorrespondingUnsignedType` to see how to do this properly; you may be able to call those functions directly in some cases, but be careful about the cases where we're required to return the lowest-rank int type of the given size. Note that `getIntTypeForBitwidth` does *not* do that; rather, it produces the *preferred* type of the given width, and for WebAssembly and AVR it produces something other than the lowest-rank type in practice in some cases.

================
Comment at: clang/lib/Sema/SemaType.cpp:9267-9270
+  Qualifiers Quals = Underlying.getQualifiers();
+  Quals.setCVRQualifiers(BaseType.getCVRQualifiers());
+  Underlying = QualType(Underlying.getSplitUnqualifiedType().Ty,
+                        Quals.getAsOpaqueValue());
----------------
As before, the opaque value is, well, opaque, and you shouldn't be using it in this way. Also, you just created `Underlying` so you know it's unqualified, so there's no need to ask for its qualifiers.

See suggested edit. Alternatively (#1), if you want to preserve all qualifiers:

```
Underlying = Context.getQualifiedType(BaseType.getQualifiers());
```

Alternatively (#2) we could strictly follow the standard wording and preserve only cv-qualifiers and not `restrict`. I think preserving all qualifiers is more in line with the intent, but preserving only `const` and `volatile` is probably the best match for what a C++ implementation of the type trait would do. *shrug*

================
Comment at: clang/lib/Sema/SemaType.cpp:9136
+  Qualifiers Quals = Underlying.getQualifiers();
+  Quals.removeCVRQualifiers();
+  return Context.getUnaryTransformType(
----------------
cjdb wrote:
> aaron.ballman wrote:
> > What about things like type attributes (are those lost during decay)?
> According to https://eel.is/c++draft/tab:meta.trans.other, no.
Type attributes are non-standard, so we can't answer this question by looking at the standard. Nonetheless, doing what `getDecayedType` does seems like the best choice.

================
Comment at: clang/test/CodeGenCXX/mangle.cpp:54
 // CHECK-LABEL: define{{.*}} void @_Z1fno
-void f(__int128_t, __uint128_t) { } 
+void f(__int128_t, __uint128_t) {}

----------------
If it's easy, can you undo the unrelated formatting changes in this file? That'll make future archaeology a little easier and help reduce the chance of merge conflicts. (Not a bit deal though.)

================
Comment at: clang/test/CodeGenCXX/mangle.cpp:1113
 template void fn<E>(E, __underlying_type(E));
-// CHECK-LABEL: @_ZN6test552fnINS_1EEEEvT_U3eutS2_
+// CHECK-LABEL: @_ZN6test552fnINS_1EEEEvT_Uu17__underlying_typeS2_
 }
----------------
Do we have a corresponding test for the MS ABI?

================
Comment at: clang/test/CodeGenCXX/mangle.cpp:1159
+
+namespace test61 {
+template <typename T> void f(T, __add_lvalue_reference(T)) {}
----------------
Let's keep all the tests for this together -- please move these to `namespace test55`.

================
Comment at: clang/test/SemaCXX/remove_pointer.mm:7
+
+template <class T> using remove_pointer_t = __remove_pointer(T);
+
----------------
Is there a reason for this wrapper? You're directly using `__is_same` below, why not directly use `__remove_pointer` too?

================
Comment at: clang/test/SemaCXX/remove_pointer.mm:12-46
+void remove_pointer() {
+  { int a[T(__is_same(remove_pointer_t<void>, void))]; }
+  { int a[T(__is_same(remove_pointer_t<const void>, const void))]; }
+  { int a[T(__is_same(remove_pointer_t<volatile void>, volatile void))]; }
+  { int a[T(__is_same(remove_pointer_t<const volatile void>, const volatile void))]; }
+  { int a[T(__is_same(remove_pointer_t<int>, int))]; }
+  { int a[T(__is_same(remove_pointer_t<const int>, const int))]; }
----------------
I expected to see some tests for ObjC pointers in here, but I don't see any. I'd like to see `__remove_pointer(id)` and `@class X; static_assert(__is_same(__remove_pointer(X*, X)));`

================
Comment at: clang/test/SemaCXX/remove_pointer.mm:13
+void remove_pointer() {
+  { int a[T(__is_same(remove_pointer_t<void>, void))]; }
+  { int a[T(__is_same(remove_pointer_t<const void>, const void))]; }
----------------
Please use `static_assert` instead of faking it with an array.

================
Comment at: clang/test/SemaCXX/transform-type-traits.cpp:5
+
+// libstdc++ uses __remove_cv as an alias, so we make our transform type traits alias-revertible
+template <class T, class U>
----------------
We generally name these tests for libstdc++ workarounds as `libstdcxx_..._hack.cpp`

================
Comment at: clang/test/SemaCXX/type-traits.cpp:2862
+struct S {};
+template <class T> using remove_const_t = __remove_const(T);
+
----------------
Consider using the trait directly instead of wrapping it in an alias.

================
Comment at: clang/test/SemaCXX/type-traits.cpp:2865
+void check_remove_const() {
+  { int a[T(__is_same(remove_const_t<void>, void))]; }
+  { int a[T(__is_same(remove_const_t<const void>, void))]; }
----------------
Please use `static_assert`.

================
Comment at: clang/test/SemaCXX/type-traits.cpp:2986
+
+void check_remove_cv() {
+  { int a[T(__is_same(remove_cv_t<void>, void))]; }
----------------
This is should have some test coverage for `restrict` and for at least one non-standard qualifier. (Similarly for the `remove_cvref` and `decay` tests.)

================
Comment at: clang/test/SemaCXX/type-traits.cpp:3071
+  { int a[T(__is_same(remove_pointer_t<int>, int))]; }
+  { int a[T(__is_same(remove_pointer_t<const int>, const int))]; }
+  { int a[T(__is_same(remove_pointer_t<volatile int>, volatile int))]; }
----------------
You seem to be missing coverage that `__remove_pointer(const int*)` preserves the `const`.
You also seem to be missing coverage that this works on a cv-qualified pointer, eg `__remove_pointer(int *const)` -> `int`.

================
Comment at: clang/test/SemaCXX/type-traits.cpp:3099-3100
+
+  { int a[T(__is_same(remove_pointer_t<int __attribute__((address_space(1))) *>, int))]; }
+  { int a[T(__is_same(remove_pointer_t<S __attribute__((address_space(2))) *>, S))]; }
+}
----------------
It seems strange to remove the address space qualifier here, given that this trait doesn't remove cv-qualifiers.

================
Comment at: clang/test/SemaCXX/type-traits.cpp:3269
+  { int a[T(__is_same(remove_cvref_t<int S::*const volatile>, int S::*))]; }
+  { int a[T(__is_same(remove_cvref_t<int (S::*const volatile)()>, int(S::*)()))]; }
+  { int a[T(__is_same(remove_cvref_t<int (S::*const volatile)() &>, int(S::*)() &))]; }
----------------
The tests you have here for references to pointers-to-members seem excessive (throughout); I don't think there's any reason to think that a reference to pointer-to-member would be so different from a reference to any other type that would justify this level of testing of the former case.

The cases that seem interesting here are the ones for non-referenceable types, but those are covered separately below.

================
Comment at: clang/test/SemaCXX/type-traits.cpp:3391
+  static void checks() {
+    { int a[T(__is_same(add_lvalue_reference_t<M>, M))]; }
+    { int a[T(__is_same(add_pointer_t<M>, M))]; }
----------------
It's weird that we permit an abominable function type to show up in the argument of a trait during template instantiation but not when written directly. That behavior should really be consistent, and I think we should consistently allow it. That is, we should allow:

`static_assert(__is_same(__add_pointer(int () &), int () &));`

... just like we allow it if you hide the usage of the traits inside a template. That'd make this easier to test, too: you can then put these tests with the other tests for the corresponding traits. (This should only require passing `DeclaratorContext::TemplateArg` to `ParseTypeName` instead of the default of `DeclaratorContext::TypeName` in the places where we parse type traits, reflecting that we want to use the rules for template arguments in these contexts.) This would also fix an incompatibility between GCC and Clang: https://godbolt.org/z/GdxThdvjz

That said, I don't think we need to block this patch on this change, so if you'd prefer to not do this now, that's fine :)

================
Comment at: clang/test/SemaCXX/type-traits.cpp:3470
+
+  { int a[T(__is_same(make_signed_t<unsigned long long>, long))]; }
+  { int a[T(__is_same(make_signed_t<const unsigned long long>, const long))]; }
----------------
This is the wrong result type, it should be `long long`.

================
Comment at: clang/test/SemaCXX/type-traits.cpp:3495-3496
+
+  { int a[T(__is_same(make_signed_t<Enum>, int))]; }
+  { int a[T(__is_same(make_signed_t<SignedEnum>, int))]; }
+  { int a[T(__is_same(make_signed_t<const SignedEnum>, const int))]; }
----------------
It'd be useful to test enums with different underlying types. However, this test is not portable: if `short` and `int` are the same size, this is required to produce `short`, not `int`. It'd be good to have some test coverage of that quirk too. Perhaps this is easiest to see with a test like:

```
enum E : long long {};
static_assert(__is_same(__make_signed(E), long));
```

... which should hold in cases where `long` and `long long` are the same size and are larger than `int`.

================
Comment at: clang/test/SemaCXX/type-traits.cpp:3822-3825
+  { int x[T(__is_same(remove_all_extents_t<int[][2]>, int))]; }
+  { int x[T(__is_same(remove_all_extents_t<const int[]>, const int))]; }
+  { int x[T(__is_same(remove_all_extents_t<const int[1]>, const int))]; }
+  { int x[T(__is_same(remove_all_extents_t<const int[1][2]>, const int))]; }
----------------
Please also test the case where the qualifiers are attached outside the array type:

```
using T = int[1][2];
static_assert(__is_same(__remove_all_extents(const T), const int), "");
```

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D116203/new/

https://reviews.llvm.org/D116203