[PATCH] D123538: [symbolizer] Parse DW_TAG_variable DIs to show line info for globals

Wed Apr 20 10:51:21 PDT 2022

dblaikie added inline comments.

================
Comment at: llvm/include/llvm/DebugInfo/DWARF/DWARFUnit.h:248
+  std::unordered_map<uint64_t, DWARFDie> VariableDieMap;
+  std::unordered_set<uint64_t> RootsParsedForVariables;
+
----------------
hctim wrote:
> tschuett wrote:
> > Are you sure about your map and set types?
> > https://llvm.org/docs/ProgrammersManual.html#other-set-like-container-options
> thanks, changed to set/map given that bionic's `libc.so` has 10.5k DW_TAG_variables, so DenseSet/DenseMap doesn't seem like the right structure either.
I don't believe the number of elements is an argument against the dense data structures - could you help explain/understand the connection there?

================
Comment at: llvm/lib/DebugInfo/DWARF/DWARFContext.cpp:1267-1277
+  if (!CU) {
+    // Fallback to walking the .debug_info, looking for DW_TAG_variables.
+    for (const auto &CU : normal_units()) {
+      DWARFDie Die = CU->getVariableForAddress(Address.Address);
+      if (Die.isValid()) {
+        Result.FileName = Die.getDeclFile(FileLineInfoKind::AbsoluteFilePath);
+        Result.Line = Die.getDeclLine();
----------------
Should this logic live in `getCompileUnitForAddress` insetad of here? (at least given the name, it sounds like it's something `getCompileUnitForAddress` should handle - but perhaps it should be renamed to be only about function addresses, and the variable walk could/should remain here?)

(awkwardly: LLVM's debug_aranges (Clang doesn't emit these by default anyway, so it's not likely to be much of an issue in the wild) would be sufficient to answer this query, but GCC does not include globals in debug_aranges, so they probably can't be relied upon unfortunately (unfortunate that Clang pays the cost when it emits them, but consumers can't rely on them) - and personally I'd rather we (DWARF in general) move away from aranges anyway, since (apart from this unreliable global variable case) they're covered by CU level high/low/ranges anyway)

================
Comment at: llvm/lib/DebugInfo/DWARF/DWARFContext.cpp:1270-1271
+    for (const auto &CU : normal_units()) {
+      DWARFDie Die = CU->getVariableForAddress(Address.Address);
+      if (Die.isValid()) {
+        Result.FileName = Die.getDeclFile(FileLineInfoKind::AbsoluteFilePath);
----------------

================
Comment at: llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp:743

+void DWARFUnit::updateVariableDieMap(DWARFDie Die) {
+  for (DWARFDie Child = Die.getFirstChild(); Child; Child = Child.getSibling())
----------------
 Walking the entire DIE tree seems expensive - could this prune any branches? (such as when descending into a type DIE? (DW_TAG_*_type))

================
Comment at: llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp:744
+void DWARFUnit::updateVariableDieMap(DWARFDie Die) {
+  for (DWARFDie Child = Die.getFirstChild(); Child; Child = Child.getSibling())
+    updateVariableDieMap(Child);
----------------
maybe DWARFDie supports range-based for loop now?

================
Comment at: llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp:779-781
+  if (RootsParsedForVariables.count(RootDie.getOffset()) == 0) {
+    updateVariableDieMap(RootDie);
+    RootsParsedForVariables.insert(RootDie.getOffset());
----------------
Rather than doing `count` + `insert` this should probably call `insert` up-front and inspect the return value to see if the insert happened, or if the value was already present - then if it happened, do the update.

================
Comment at: llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp:784-786
+  if (!VariableDieMap.count(Address))
+    return DWARFDie();
+  return VariableDieMap[Address];
----------------
This should probably use `find` rather than `count+[]` to avoid doing two lookups.

================
Comment at: llvm/test/tools/llvm-symbolizer/data-location.s:16
+################################################################################
+## File below was generated using:
+##   $ clang -g -fuse-ld=lld -shared /tmp/file.c -o out.so && obj2yaml out.so
----------------
jhenderson wrote:
> hctim wrote:
> > dblaikie wrote:
> > > hctim wrote:
> > > > jhenderson wrote:
> > > > > One of the purposes of the cross-project-tests project was to make it  easier to test llvm-symbolizer without having to resort to canned binaries or canned YAML blobs like below. By writing the test there, you can use clang and lld directly and consequently start from a much more understandable .c file.
> > > > > 
> > > > > That being said, I wonder if you'd be better off hand-crafting this YAML. It doesn't seem likely to be that complicated to write some .debug_info that has the property you want using yaml2obj's DWARF support, if I follow the required behaviour correctly. There are already some good examples of similar bits. The advantage with using yaml like that is that you can keep it tightly focused on what is important, omitting the various components of the object that are unrelated to the test (e.g. .eh_frame).
> > > > I had a quick look at cross-project-tests, and didn't find any examples where llvm-symbolizer was currently being tests there, and I think it's out of scope for this patch.
> > > > 
> > > > The right way to test this is to invoke clang and run llvm-symbolizer as the output.
> > > > 
> > > > My problem with hand-crafting is that it's:
> > > > 
> > > >  a) Very time consuming, looking into it further I'd have to read a lot more about the ELF spec for .debug_abbrev, .debug_info, and .debug_line to understand what headers are needed.
> > > > 
> > > >  b) Hard to change. If someone introduces some new feature, then they have to go through the same "understand nuance of ELF" to just modify my test so that it passes. It's much easier for someone to go "hey, just need to recompile this example file and make sure the new golden file passes the existing test".
> > > (re: cross-project-tests: They must not be used as a replacement for individual project testing - but for added feature/cross-project coverage if there's important interactions between components. But an LLVM patch should still have an LLVM test, a Clang patch with a Clang test, etc - and optionally some additional cross-project feature testing)
> > > 
> > > re: hand crafting: Yeah, I'm not a super fan of doing that, until/unless yaml2obj can really simplify writing DWARF down to not needing to write abbreviations, multiple sections, attribute forms, etc.
> > > 
> > > Though is there any reason this test needs a function in it? & any reason it needs two integers?
> > I updated the test a little and added some more of what I was hoping for.
> > 
> > One global in data, one in bss, one as a function-static global and a string (this patch now depends on D123534).
> > (re: cross-project-tests: They must not be used as a replacement for individual project testing - but for added feature/cross-project coverage if there's important interactions between components. But an LLVM patch should still have an LLVM test, a Clang patch with a Clang test, etc - and optionally some additional cross-project feature testing)
> 
> That is one purpose for why I created this testsuite, but the other was for using tools to generate test inputs at runtime, such as clang or lld. In the original thread, I even discussed llvm-symbolizer as a concrete example (using LLD in that particular case, in the same manner as one might use llvm-mc for generating test input at test time, because creating canned test inputs means they're not easily maintainable). There are no current examples of this yet though.
(this might be shorter/simpler to read as assembly, rather than yaml?)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123538/new/

https://reviews.llvm.org/D123538