[PATCH] D45438: [CodeView] Enable debugging of captured variables within C++ lambdas

David Blaikie via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Jan 25 12:18:53 PST 2022


dblaikie added inline comments.


================
Comment at: lib/CodeGen/CGDebugInfo.cpp:812-814
+  // CodeView types with C++ mangling need a type identifier.
+  if (CGM.getCodeGenOpts().EmitCodeView)
+    return true;
----------------
rnk wrote:
> dblaikie wrote:
> > bwyma wrote:
> > > dblaikie wrote:
> > > > rnk wrote:
> > > > > rnk wrote:
> > > > > > dblaikie wrote:
> > > > > > > Just came across this code today - it's /probably/ a problem for LTO. LLVM IR linking depends on the identifier to determine if two structs are the same for linkage purposes - so if an identifier is added for a non-linkage (local/not externally visible) type, LLVM will consider them to be the same even though they're unrelated:
> > > > > > > ```
> > > > > > > namespace { struct t1 { int i; int j; }; }
> > > > > > > t1 v1;
> > > > > > > void *v3 = &v1;
> > > > > > > ```
> > > > > > > ```
> > > > > > > namespace { struct t1 { int i; }; }
> > > > > > > t1 v1;
> > > > > > > void *v2 = &v1;
> > > > > > > ```
> > > > > > > ```
> > > > > > > $ clang++-tot -emit-llvm -S a.cpp b.cpp -g -gcodeview  && ~/dev/llvm/build/default/bin/llvm-link -o ab.bc a.ll b.ll && ~/dev/llvm/build/default/bin/llvm-dis ab.bc && cat ab.ll | grep "\"t1\""
> > > > > > > !8 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "t1", scope: !9, file: !3, line: 1, size: 64, flags: DIFlagTypePassByValue, elements: !10, identifier: "_ZTSN12_GLOBAL__N_12t1E")
> > > > > > > $ clang++-tot -emit-llvm -S a.cpp b.cpp -g && ~/dev/llvm/build/default/bin/llvm-link -o ab.bc a.ll b.ll && ~/dev/llvm/build/default/bin/llvm-dis ab.bc && cat ab.ll | grep "\"t1\""
> > > > > > > !8 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "t1", scope: !9, file: !3, line: 1, size: 64, flags: DIFlagTypePassByValue, elements: !10)
> > > > > > > !21 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "t1", scope: !9, file: !17, line: 1, size: 32, flags: DIFlagTypePassByValue, elements: !22)
> > > > > > > ```
> > > > > > > 
> > > > > > > So in linking, now both `a.cpp` and `b.cpp` refer to a single `t1` type (in this case, it looks like the first one - the 64 bit wide one).
> > > > > > > 
> > > > > > > If CodeView actually can't represent these two distinct types with the same name in the same object file, so be it? But this looks like it's likely to cause problems for consumers/users.
> > > > > > If you use the MSVC ABI, we will assign unique identifiers to these two structs:
> > > > > > ```
> > > > > > !9 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "t1", scope: !10, file: !7, line: 1, size: 64, flags: DIFlagTypePassByValue, elements: !11, identifier: ".?AUt1@?A0xF964240C@@")
> > > > > > ```
> > > > > > 
> > > > > > The `A0xF964240C` is set up here, and it is based on the source file path (the hash will only be unique when source file paths are unique across the build):
> > > > > > https://github.com/llvm/llvm-project/blob/main/clang/lib/AST/MicrosoftMangle.cpp#L464
> > > > > > ```
> > > > > >   // It's important to make the mangled names unique because, when CodeView
> > > > > >   // debug info is in use, the debugger uses mangled type names to distinguish
> > > > > >   // between otherwise identically named types in anonymous namespaces.
> > > > > > ```
> > > > > > 
> > > > > > Maybe this should use a "is MSVC ABI" check instead, since that is what controls whether the identifier will be unique, and the unique-ness is what matters for LTO and PDBs.
> > > > > After thinking about it some more, see the comment I added here: rG59320fc9d78237a5f9fdd6a5030d6554bb6976ce
> > > > > 
> > > > > ```
> > > > > // If the type is not externally visible, it should be unique to the current TU,
> > > > > // and should not need an identifier to participate in type deduplication.
> > > > > // However, when emitting CodeView, the format internally uses these
> > > > > // unique type name identifers for references between debug info. For example,
> > > > > // the method of a class in an anonymous namespace uses the identifer to refer
> > > > > // to its parent class. The Microsoft C++ ABI attempts to provide unique names
> > > > > // for such types, so when emitting CodeView, always use identifiers for C++
> > > > > // types. This may create problems when attempting to emit CodeView when the MS
> > > > > // C++ ABI is not in use.
> > > > > ```
> > > > > 
> > > > > I think type identifiers are pretty crucial for CodeView functionality. The debugger won't be able to link a method to its class without them. Therefore, I think it's better to leave this as it is. The bad experience of duplicate type identifiers is better than the lack of functionality from not emitting identifiers at all for non-externally visible types.
> > > > Yeah, thanks for explaining/adding the comment - that about covers it.
> > > > 
> > > > 
> > > > Hmm, do these identifiers just need to only be unique within a single object file to provide the name referencing mechanics? Perhaps we could generate them later to ensure they're unique - rather than risk collapsing them while linking?
> > > > 
> > > > (at least here's an obscure case where they seem to collide: Compiling the same file with different `-D` values, I guess the hash is only for the filename, not for other properties that might impact the output - so this ends up with two types with the same mangled name even on MSVC ABI:
> > > > ```
> > > > namespace { struct t1 { int i;
> > > > #ifdef SECOND
> > > >   int j;
> > > > #endif
> > > > }; }
> > > > t1 v1;
> > > > void*
> > > > #ifdef SECOND
> > > > v2
> > > > #else
> > > > v3
> > > > #endif
> > > > = &v1;
> > > > ```
> > > > ```
> > > > clang++-tot -emit-llvm -S a.cpp -g -gcodeview  -target x86_64-windows && clang++-tot -emit-llvm -S a.cpp -DSECOND -g -gcodeview -o b.ll -target x86_64-windows && ~/dev/llvm/build/default/bin/llvm-link -o ab.bc a.ll b.ll && ~/dev/llvm/build/default/bin/llvm-dis ab.bc && cat ab.ll | grep "\"t1\""
> > > > ```
> > > > )
> > > As Reid already noted, the unique id is required in CodeView to match type definitions to forward references. Unfortunately, the unique id must align across translation units. Compiling the same file multiple times with preprocessor conditionals will result in two different types with the same unique id. You'll see the same behavior with Visual Studio 2019.
> > I think what I was getting at is that in the case of mangled names for internal linkage things - presumably there's no ABI compatibility concern? It's not like something compiled with MSVC or another version of Clang has to use the same naming scheme for these local entities, so long as the naming scheme doesn't collide?
> > 
> > So at least for LTO if we generated the name later (in the CodeView backend) it could be more unique - an LTO build wouldn't collapse the two types together. It could still have an identifier (generated by the CodeView backend) that was used to reference decls and defs within that file.
> Yes, we could calculate the unique identifier during LTO. However, that seems like a very marginal improvement in the final user experience since most users don't use LTO for incremental development, which is when type information is most useful.
> 
> On the other hand, if we want to establish the invariant that all DICompositeType identifiers are unique within a compile unit, we could probably go ahead and do that without impacting the user experience. We already have renaming logic like this for `llvm::GlobalValue`s, and frontends are check for existing values of that name if they don't want renaming to occur.
>  if we want to establish the invariant that all DICompositeType identifiers are unique within a compile unit, we could probably go ahead and do that without impacting the user experience. We already have renaming logic like this for llvm::GlobalValues, and frontends are check for existing values of that name if they don't want renaming to occur.

That invariant exists - that's the original purpose of the identifiers - LLVM IR linking deduplicates on the basis of the identifier. So we can't/don't rename them (because the point is to deduplicate based on them) - we keep the first one we see and drop the rest.

Any idea what happens when non-LTO with MSVC and you end up with duplicate identifiers? Are both versions kept, but references end up ambiguous? Is the second version seen dropped as being a duplicate/unneeded?

If non-LTO MSVC behavior drops later instances of the identifier as duplicate, then at least the current LTO behavior is no worse than that - though there'd be some opportunity to make at least LTO better, but yeah, marginally beneficial.


Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D45438/new/

https://reviews.llvm.org/D45438



More information about the llvm-commits mailing list