[PATCH] D116203: [clang] adds unary type transformations as compiler built-ins

Tue Jun 14 13:58:27 PDT 2022

rsmith added inline comments.

================
Comment at: clang/lib/AST/ASTContext.cpp:5863
   // FIXME: derive from "Target" ?
-  return WCharTy;
+  return IntTy;
 }
----------------
This change seems surprising. Can you explain what's going on here?

================
Comment at: clang/lib/AST/ASTContext.cpp:10712-10713
 QualType ASTContext::getCorrespondingUnsignedType(QualType T) const {
-  assert((T->hasSignedIntegerRepresentation() || T->isSignedFixedPointType()) &&
+  assert((T->hasSignedIntegerRepresentation() || T->isIntegerType() ||
+          T->isEnumeralType() || T->isSignedFixedPointType()) &&
          "Unexpected type");
----------------
If we now allow unsigned types to be passed in here, can we do so consistently?

This seems to do the wrong thing for a vector of scoped enumeration type. Is that a problem for the builtins?

================
Comment at: clang/lib/AST/ASTContext.cpp:10731-10759
+  case BuiltinType::Char_U:
   case BuiltinType::Char_S:
   case BuiltinType::SChar:
+  case BuiltinType::UChar:
+  case BuiltinType::Char8:
     return UnsignedCharTy;
   case BuiltinType::Short:
----------------
(Continuation of suggested edits in comment a few lines below.)

================
Comment at: clang/lib/AST/ASTContext.cpp:10785-10786
     return SatUnsignedLongFractTy;
   default:
-    llvm_unreachable("Unexpected signed integer or fixed point type");
+    llvm_unreachable("Unexpected integer or signed fixed point type");
   }
----------------
Perhaps we can handle all the plain unsigned types (other than `wchar_t` and `char`, which are special) here? This would also cover the fixed-point types.

================
Comment at: clang/lib/AST/ASTContext.cpp:10791
 QualType ASTContext::getCorrespondingSignedType(QualType T) const {
-  assert((T->hasUnsignedIntegerRepresentation() ||
-          T->isUnsignedFixedPointType()) &&
+  assert((T->hasUnsignedIntegerRepresentation() || T->isIntegerType() ||
+          T->isEnumeralType() || T->isUnsignedFixedPointType()) &&
----------------
Similar comments for this function as previous one.

================
Comment at: clang/lib/AST/TypePrinter.cpp:1158
                                            raw_ostream &OS) {
   IncludeStrongLifetimeRAII Strong(Policy);
 }
----------------
Remove this line too. No point building an RAII scope with nothing in it.

================
Comment at: clang/lib/Parse/ParseExpr.cpp:1750
+#undef TRANSFORM_TYPE_TRAIT_DEF
+    if (NextToken().is(tok::less)) {
+      Tok.setKind(tok::identifier);
----------------
cjdb wrote:
> rsmith wrote:
> > Here you're checking for `trait<` and treating it as an identifier; elsewhere you're checking for `trait(` and treating it as a builtin. Is there a reason to treat `trait` followed by a token other than `(` or `<` inconsistently?
> The parser needs to treat `trait<` as an identifier to accommodate libstdc++'s usage of a few of these as alias templates. I've added a comment to explain why in the code.
I agree we need to treat `trait<` as an identifier and `trait(` as the builtin. My question is, why do we have inconsistent treatment of the remaining cases (`trait` followed by any other token)? For example, `MaybeParseTypeTransformTypeSpecifier` treats `trait + 1` as an identifier. But this code treats it as the builtin name.

I think there are two choices that make the most sense, if they work:

1) `trait(` is the builtin and any other utterance is an identifier, or
2) `trait<` is an identifier, `using trait =` is an identifier, and any other utterance is the builtin.

I think I prefer option (2), given that it's defining the narrower special case. But all I'm really looking for here is a consistent story one way or the other, if it's feasible to have one.

================
Comment at: clang/lib/Sema/SemaType.cpp:9160-9164
+  Qualifiers Quals = Pointee.getQualifiers();
+  Quals.removeAddressSpace();
+  return Context.getUnaryTransformType(
+      BaseType,
+      QualType(Pointee.getSplitUnqualifiedType().Ty, Quals.getAsOpaqueValue()),
----------------
You're splitting the type into unqualified type and quailfiers twice here, once in the call to `getQualifiers` and again in `getSplitUnqualifiedType`. It'd be better to only split it once.

================
Comment at: clang/lib/Sema/SemaType.cpp:9161
+  Qualifiers Quals = Pointee.getQualifiers();
+  Quals.removeAddressSpace();
+  return Context.getUnaryTransformType(
----------------
Do we really want to remove the address space here? The libc++ and libstdc++ implementations of the trait don't do that (https://godbolt.org/z/jqafYP6er) and it makes the behavior of `__remove_pointer` inconsistent with the behavior of `__add_pointer`. Can we just produce the pointee type intact, including all of its qualifiers?

================
Comment at: clang/lib/Sema/SemaType.cpp:9164
+      BaseType,
+      QualType(Pointee.getSplitUnqualifiedType().Ty, Quals.getAsOpaqueValue()),
+      UKind);
----------------
It's not correct to treat a `Qualifiers` opaque value as if it has meaning for anything other than converting back from an opaque value. The second argument here is a cvr mask, which is not the same thing.

Use `Context.getQualifiedType` to combine a type with some qualifiers.

================
Comment at: clang/lib/Sema/SemaType.cpp:9175-9183
+  Qualifiers Quals = Underlying.getQualifiers();
+  // std::decay is supposed to produce 'std::remove_cv', but since 'restrict' is
+  // in the same group of qualifiers as 'const' and 'volatile', we're extending
+  // '__decay(T)' so that it also removes '__restrict' in C++.
+  Quals.removeCVRQualifiers();
+  return Context.getUnaryTransformType(
+      BaseType,
----------------
As above, you can't use `getAsOpaqueValue` like this, and should avoid splitting the type twice.

Note that the pattern used here is only correct because we know that `Underlying` cannot be an array type.

================
Comment at: clang/lib/Sema/SemaType.cpp:9178
+  // in the same group of qualifiers as 'const' and 'volatile', we're extending
+  // '__decay(T)' so that it also removes '__restrict' in C++.
+  Quals.removeCVRQualifiers();
----------------
Why "in C++"? It looks like it does this in C too, which is probably appropriate. However, this is a divergence from what `std::decay_t` does in libc++ and libstdc++, where it does not remove `__restrict`.

================
Comment at: clang/lib/Sema/SemaType.cpp:9225-9236
+  assert(LangOpts.CPlusPlus);
+  QualType Underlying = BaseType.getNonReferenceType();
+  Qualifiers Quals = Underlying.getQualifiers();
+  if (UKind == UnaryTransformType::RemoveCVRef) {
+    Quals.removeConst();
+    Quals.removeVolatile();
+  }
----------------
As before, the way you were splitting and reconstructing types wasn't correct.

================
Comment at: clang/lib/Sema/SemaType.cpp:9241-9245
+  Qualifiers Quals;
+  QualType Unqual = Context.getUnqualifiedArrayType(BaseType, Quals);
+  if ((BaseType->isReferenceType() && UKind != UTTKind::RemoveRestrict) ||
+      BaseType->isFunctionType())
+    return Context.getUnaryTransformType(BaseType, BaseType, UKind);
----------------
LLVM style is to restrict the scopes of variables as much as possible, and there's no point splitting the type if we're not going to use the result.

================
Comment at: clang/lib/Sema/SemaType.cpp:9267-9274
+  QualType Underlying =
+      BaseType->isBitIntType()
+          ? Context.getBitIntType(!IsMakeSigned, Context.getIntWidth(BaseType))
+      : BaseType->isEnumeralType()
+          ? Context.getIntTypeForBitwidth(Context.getIntWidth(BaseType),
+                                          IsMakeSigned)
+      : IsMakeSigned ? Context.getCorrespondingSignedType(BaseType)
----------------
With the changes suggested for `getCorresponding*Type`, I think we can remove the `BitInt` special case here.

================
Comment at: clang/lib/Sema/SemaType.cpp:9196-9205
+  if (!BaseType->isArrayType())
+    return Context.getUnaryTransformType(BaseType, BaseType, UKind);
+  const ArrayType *ArrayType = BaseType->getAsArrayTypeUnsafe();
+  QualType ElementType =
+      UKind == UnaryTransformType::RemoveAllExtents
+          ? QualType(ArrayType->getBaseElementTypeUnsafe(), {})
+          : ArrayType->getElementType();
----------------
rsmith wrote:
> This looks like it'll lose all qualifiers other than CVR qualifiers. Presumably that's not the intent. (This will also unnecessarily remove type sugar in some cases, but that's not super important here.) Please also don't use copy-list-initialization to form a `QualType`; see https://llvm.org/docs/CodingStandards.html#do-not-use-braced-initializer-lists-to-call-a-constructor
> 
> See the suggested edits for an alternative, simpler formulation (that preserves as much type sugar as possible).
> 
Looks like you forgot to remove the old code when adding the new code?

================
Comment at: clang/lib/Sema/SemaType.cpp:9251
+      BaseType->isBooleanType()) {
+    Diag(Loc, diag::err_make_signed_integral_only) << IsMakeSigned << BaseType;
+    return QualType();
----------------
cjdb wrote:
> cjdb wrote:
> > rsmith wrote:
> > > Should we support vector types here?
> > Is it a conforming extension for `make_signed_t<int __attribute__((ext_vector_type(2)))>` to work?
> Gentle ping.
Yes, it's a conforming extension compared to C++20; the standard places no restrictions on what a program that uses a vendor extension type such as a vector type does.

On reflection, I'm not sure it would be a conforming extension to OpenCL (in which such vector types exist as part of the language standard -- in particular, `ext_vector_type` is intended to follow the OpenCL rules). So let's not support such things until / unless we get some direction from the OpenCL folks.

================
Comment at: clang/lib/Sema/SemaType.cpp:9265-9266
+          ? Context.getBitIntType(!IsMakeSigned, Context.getIntWidth(BaseType))
+          : Context.getIntTypeForBitwidth(Context.getIntWidth(BaseType),
+                                          IsMakeSigned);
+  Qualifiers Quals = Underlying.getQualifiers();
----------------
cjdb wrote:
> rsmith wrote:
> > This is wrong: if, say, `int` and `long` are the same bit-width, this will give the same type for `__make_unsigned(int)` and `__make_unsigned(long)`, where they are required to produce `unsigned int` and `unsigned long` respectively.
> > 
> > Please look at `Context.getCorrespondingSignedType` and `Context.getCorrespondingUnsignedType` to see how to do this properly; you may be able to call those functions directly in some cases, but be careful about the cases where we're required to return the lowest-rank int type of the given size. Note that `getIntTypeForBitwidth` does *not* do that; rather, it produces the *preferred* type of the given width, and for WebAssembly and AVR it produces something other than the lowest-rank type in practice in some cases.
> This makes me very happy.
The comment on this producing the wrong underlying type for enums in some cases doesn't seem to be addressed. You need to produce the lowest-rank type of the given size, which is not what `getIntTypeForBitwidth` does.

================
Comment at: clang/lib/Sema/SemaType.cpp:6023
     void VisitUnaryTransformTypeLoc(UnaryTransformTypeLoc TL) {
-      // FIXME: This holds only because we only have one unary transform.
-      assert(DS.getTypeSpecType() == DeclSpec::TST_underlyingType);
+      // Make sure it is a unary transform type
+      assert(DS.getTypeSpecType() >= DeclSpec::TST_underlyingType &&
----------------
aaron.ballman wrote:
> 
I don't think this comment pulls its weight any more, now that you're calling a named function for this check. Maybe just remove it?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D116203/new/

https://reviews.llvm.org/D116203