[clang] [ItaniumCXXABI] Mark RTTI type name as global unnamed_addr (PR #111343)

Thu Oct 10 06:06:48 PDT 2024

luxufan wrote:

Thanks for your detailed explanation! I have updated the PR.

> > But, I have a question: now that it has ensured the uniqueness of typeinfo's address, why does the implementation still compare the type equality by the address of type name string?
> 
> The uniqueness of the address of the `type_info` object itself is not guaranteed. The reason is that we sometimes [generate `type_info` objects for pointers to incomplete types](https://itanium-cxx-abi.github.io/cxx-abi/abi.html#:~:text=When%20it%20is,the%20type_info%20addresses.) as part of exception handling, so we can end up with multiple `type_info` objects for the same type.
> 
> > For symbols with internal linkage or hidden visibility, I don't think there would be problems if they were allowed to be merged. For example, considering there are 2 translation units 0.cpp and 1.cpp, and we defined internal `class A {}` in these 2 translation units, since they are all internal symbols, I think comparing `0.cpp::A` with `1.cpp::A` is undefined behavior. Because there at least one symbol was referenced outside of the current visibility scope.
> 
> Such comparisons can happen in valid C++ code. For example:
> 
> ```c++
> // a.cc
> namespace { class A {}; }
> void throwA() { throw A(); }
> ```
> 
> ```c++
> // b.cc
> namespace { class A {}; }
> void throwA();
> int main() {
>   try { throwA(); }
>   catch (A) { return 1; }
>   catch (...) { return 2; }
> }
> ```
> 
> This program is valid and `main` is required to return 2 -- the `A` in `a.cc` and the `A` in `b.cc` are two different types. During exception handling, we will compare them by comparing their `type_info`, which means we will compare the addresses of the name strings, so we need two different addresses.
> 
> > For dynamic loading, if there are two same symbols in different DSOs, the symbol would be interposed.
> 
> Not in the cases I mentioned. (I think we can _probably_ get away with marking type name strings as `unnamed_addr` if the type has external linkage, because we don't expect `unnamed_addr` to have any effect after static linking. But it's not clear to me that that's actually guaranteed by the LLVM semantics, or whether it would be permissible to make use of `unnamed_addr` in some cross-DSO LTO situation.)
> 
> You _can_ still apply `unnamed_addr` in the cases where the the target ABI rule is that `type_info` comparisons will always use or fall back to a string comparison. Per the libc++ implementation, that's that case on Apple arm64 targets. You can detect this using `classifyRTTIUniqueness`.
> 
> I think it's also correct and safe to apply `local_unnamed_addr` to these type name strings in all cases. Merging with another string literal from the same compilation should always be OK.

I think it is also safe to apply `local_unnamed_addr` to type info global variable. What do you think?

https://github.com/llvm/llvm-project/pull/111343