[clang] [clang/AST] Make it possible to use SwiftAttr in type context (PR #108631)

Thu Nov 7 05:58:04 PST 2024

bevin-hansson wrote:

Hi @xedin! We've observed a difference downstream due to this patch and are curious whether this was intentional. It seems that the changes to how AttributedType is keyed (including the attribute) causes some type duplication when attributes are involved. For example, building this (reduced) program with `clang -target x86_64 -fsanitize=undefined`:
```
void a() {
  for (unsigned int b;; *(const unsigned int __attribute__((noderef)) *)*(const unsigned int __attribute__((noderef)) *)b)
    ;
}

```
(Ignore that there are dereferenced pointers with `deref`; the original repro had `address_space` but you don't get sanitizers for such pointers upstream)

Before this patch, there would only be a single type-info struct/string in the resulting assembly, but with the patch, there are now two identical ones:
```
 	.type	.L__unnamed_3, at object           # @0
 	.section	.rodata,"a", at progbits
 	.p2align	4, 0x0
 .L__unnamed_3:
 	.short	0                               # 0x0
 	.short	10                              # 0xa
 	.asciz	"'unsigned int const __attribute__((noderef))'"
 	.size	.L__unnamed_3, 50

 	.type	.L__unnamed_1, at object           # @1
 	.data
 	.p2align	4, 0x0
 .L__unnamed_1:
 	.quad	.L.src
 	.long	4                               # 0x4
 	.long	73                              # 0x49
 	.quad	.L__unnamed_3
 	.byte	2                               # 0x2
 	.byte	0                               # 0x0
 	.zero	6
 	.size	.L__unnamed_1, 32

-	.type	.L__unnamed_2, at object           # @2
+	.type	.L__unnamed_4, at object           # @2
+	.section	.rodata,"a", at progbits
+	.p2align	4, 0x0
+.L__unnamed_4:
+	.short	0                               # 0x0
+	.short	10                              # 0xa
+	.asciz	"'unsigned int const __attribute__((noderef))'"
+	.size	.L__unnamed_4, 50
+
+	.type	.L__unnamed_2, at object           # @3
+	.data
 	.p2align	4, 0x0
 .L__unnamed_2:
 	.quad	.L.src
 	.long	4                               # 0x4
 	.long	25                              # 0x19
-	.quad	.L__unnamed_3
+	.quad	.L__unnamed_4
 	.byte	2                               # 0x2
 	.byte	0                               # 0x0
 	.zero	6
 	.size	.L__unnamed_2, 32
```
This is possibly happening for the sanitizer emission due to the code in CodeGenFunction::EmitCheckTypeDescriptor:
{code}
  // Only emit each type's descriptor once.
  if (llvm::Constant *C = CGM.getTypeDescriptorFromMap(T))
    return C;
{code}
The two types are different for map purposes and type creation (since they have different syntactical Attrs) but the actual types are really the same.

I guess this is pretty rare, but it could cause some hefty duplication depending on what types are used and how. There might be other effects I don't know of either, but this was the noticeable one for us.

https://github.com/llvm/llvm-project/pull/108631