[PATCH] D45438: [CodeView] Enable debugging of captured variables within C++ lambdas

David Blaikie via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Jan 25 09:22:43 PST 2022


dblaikie added inline comments.


================
Comment at: lib/CodeGen/CGDebugInfo.cpp:812-814
+  // CodeView types with C++ mangling need a type identifier.
+  if (CGM.getCodeGenOpts().EmitCodeView)
+    return true;
----------------
bwyma wrote:
> dblaikie wrote:
> > rnk wrote:
> > > rnk wrote:
> > > > dblaikie wrote:
> > > > > Just came across this code today - it's /probably/ a problem for LTO. LLVM IR linking depends on the identifier to determine if two structs are the same for linkage purposes - so if an identifier is added for a non-linkage (local/not externally visible) type, LLVM will consider them to be the same even though they're unrelated:
> > > > > ```
> > > > > namespace { struct t1 { int i; int j; }; }
> > > > > t1 v1;
> > > > > void *v3 = &v1;
> > > > > ```
> > > > > ```
> > > > > namespace { struct t1 { int i; }; }
> > > > > t1 v1;
> > > > > void *v2 = &v1;
> > > > > ```
> > > > > ```
> > > > > $ clang++-tot -emit-llvm -S a.cpp b.cpp -g -gcodeview  && ~/dev/llvm/build/default/bin/llvm-link -o ab.bc a.ll b.ll && ~/dev/llvm/build/default/bin/llvm-dis ab.bc && cat ab.ll | grep "\"t1\""
> > > > > !8 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "t1", scope: !9, file: !3, line: 1, size: 64, flags: DIFlagTypePassByValue, elements: !10, identifier: "_ZTSN12_GLOBAL__N_12t1E")
> > > > > $ clang++-tot -emit-llvm -S a.cpp b.cpp -g && ~/dev/llvm/build/default/bin/llvm-link -o ab.bc a.ll b.ll && ~/dev/llvm/build/default/bin/llvm-dis ab.bc && cat ab.ll | grep "\"t1\""
> > > > > !8 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "t1", scope: !9, file: !3, line: 1, size: 64, flags: DIFlagTypePassByValue, elements: !10)
> > > > > !21 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "t1", scope: !9, file: !17, line: 1, size: 32, flags: DIFlagTypePassByValue, elements: !22)
> > > > > ```
> > > > > 
> > > > > So in linking, now both `a.cpp` and `b.cpp` refer to a single `t1` type (in this case, it looks like the first one - the 64 bit wide one).
> > > > > 
> > > > > If CodeView actually can't represent these two distinct types with the same name in the same object file, so be it? But this looks like it's likely to cause problems for consumers/users.
> > > > If you use the MSVC ABI, we will assign unique identifiers to these two structs:
> > > > ```
> > > > !9 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "t1", scope: !10, file: !7, line: 1, size: 64, flags: DIFlagTypePassByValue, elements: !11, identifier: ".?AUt1@?A0xF964240C@@")
> > > > ```
> > > > 
> > > > The `A0xF964240C` is set up here, and it is based on the source file path (the hash will only be unique when source file paths are unique across the build):
> > > > https://github.com/llvm/llvm-project/blob/main/clang/lib/AST/MicrosoftMangle.cpp#L464
> > > > ```
> > > >   // It's important to make the mangled names unique because, when CodeView
> > > >   // debug info is in use, the debugger uses mangled type names to distinguish
> > > >   // between otherwise identically named types in anonymous namespaces.
> > > > ```
> > > > 
> > > > Maybe this should use a "is MSVC ABI" check instead, since that is what controls whether the identifier will be unique, and the unique-ness is what matters for LTO and PDBs.
> > > After thinking about it some more, see the comment I added here: rG59320fc9d78237a5f9fdd6a5030d6554bb6976ce
> > > 
> > > ```
> > > // If the type is not externally visible, it should be unique to the current TU,
> > > // and should not need an identifier to participate in type deduplication.
> > > // However, when emitting CodeView, the format internally uses these
> > > // unique type name identifers for references between debug info. For example,
> > > // the method of a class in an anonymous namespace uses the identifer to refer
> > > // to its parent class. The Microsoft C++ ABI attempts to provide unique names
> > > // for such types, so when emitting CodeView, always use identifiers for C++
> > > // types. This may create problems when attempting to emit CodeView when the MS
> > > // C++ ABI is not in use.
> > > ```
> > > 
> > > I think type identifiers are pretty crucial for CodeView functionality. The debugger won't be able to link a method to its class without them. Therefore, I think it's better to leave this as it is. The bad experience of duplicate type identifiers is better than the lack of functionality from not emitting identifiers at all for non-externally visible types.
> > Yeah, thanks for explaining/adding the comment - that about covers it.
> > 
> > 
> > Hmm, do these identifiers just need to only be unique within a single object file to provide the name referencing mechanics? Perhaps we could generate them later to ensure they're unique - rather than risk collapsing them while linking?
> > 
> > (at least here's an obscure case where they seem to collide: Compiling the same file with different `-D` values, I guess the hash is only for the filename, not for other properties that might impact the output - so this ends up with two types with the same mangled name even on MSVC ABI:
> > ```
> > namespace { struct t1 { int i;
> > #ifdef SECOND
> >   int j;
> > #endif
> > }; }
> > t1 v1;
> > void*
> > #ifdef SECOND
> > v2
> > #else
> > v3
> > #endif
> > = &v1;
> > ```
> > ```
> > clang++-tot -emit-llvm -S a.cpp -g -gcodeview  -target x86_64-windows && clang++-tot -emit-llvm -S a.cpp -DSECOND -g -gcodeview -o b.ll -target x86_64-windows && ~/dev/llvm/build/default/bin/llvm-link -o ab.bc a.ll b.ll && ~/dev/llvm/build/default/bin/llvm-dis ab.bc && cat ab.ll | grep "\"t1\""
> > ```
> > )
> As Reid already noted, the unique id is required in CodeView to match type definitions to forward references. Unfortunately, the unique id must align across translation units. Compiling the same file multiple times with preprocessor conditionals will result in two different types with the same unique id. You'll see the same behavior with Visual Studio 2019.
I think what I was getting at is that in the case of mangled names for internal linkage things - presumably there's no ABI compatibility concern? It's not like something compiled with MSVC or another version of Clang has to use the same naming scheme for these local entities, so long as the naming scheme doesn't collide?

So at least for LTO if we generated the name later (in the CodeView backend) it could be more unique - an LTO build wouldn't collapse the two types together. It could still have an identifier (generated by the CodeView backend) that was used to reference decls and defs within that file.


Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D45438/new/

https://reviews.llvm.org/D45438



More information about the cfe-commits mailing list