[PATCH] D42021: [DWARF] v5 implementation of string offsets tables - producer side

David Blaikie via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Jan 15 11:41:39 PST 2018


dblaikie added inline comments.


================
Comment at: lib/CodeGen/AsmPrinter/DIE.cpp:608
+    llvm_unreachable("Expected valid string form");
+    return 0;
+  }
----------------
Remove the return after unreachable.


================
Comment at: lib/CodeGen/AsmPrinter/DwarfDebug.cpp:604-605
+  // Create the symbol that designates the start of the unit's contribution
+  // to the string offsets table. In a split DWARF scenario, only the skeleton
+  // unit has the DW_AT_str_offsets_base attribute (and hence needs the symbol).
+  if (useSegmentedStringOffsetsTable()) {
----------------
This looks like it's inconsistent with the code.

The code looks like it sets the symbol for both the skeleton and non-skeleton holders in the case of split DWARF, doesn't it?

Perhaps it should be:

  if (useSplitDwarf)
    SkeletonHolder...
  else
    InfoHolder...

?

(or that could be rewritten as:

  (useSplitDwarf ? SkeletonHolder : InfoHolder).setStringOffsetsStartSym(...);


================
Comment at: lib/CodeGen/AsmPrinter/DwarfDebug.cpp:2060-2062
+  if (useSegmentedStringOffsetsTable())
+    NewCU.addStringOffsetsStart();
+
----------------
This is duplicated in a couple of places - could it be reasonably moved into some common place, like the DwarfUnit ctor? (not sure if that has some common setup code for adding attributes common to all units - maybe there is no such common place? not sure)


================
Comment at: lib/CodeGen/AsmPrinter/DwarfDebug.h:265
+  /// Whether to use the DWARF v5 string offsets table.
+  bool UseSegmentedStringOffsetsTable;
+
----------------
What's "segmented" mean/imply/add/clarify in this context?


================
Comment at: lib/CodeGen/AsmPrinter/DwarfFile.cpp:44
+  // referenced by most unit headers via DW_AT_str_offsets_base.
+  // Split compile units and split type units do not use the attribute.
+  Asm->OutStreamer->EmitLabel(StringOffsetsStartSym);
----------------
maybe just say "split units" 


================
Comment at: lib/CodeGen/AsmPrinter/DwarfUnit.cpp:251
+    unsigned Index = StringPoolEntry.getIndex();
+    if ((Index & 0xff) == Index)
+      IxForm = dwarf::DW_FORM_strx1;
----------------
Would these comparisons be better written as "Index >= 0xff", etc?


================
Comment at: test/DebugInfo/Generic/string-offsets-form.ll:8-13
+; struct S0 { int i;};
+; struct S1 { S0 s0; };
+; struct S2 { S1 s1; };
+; ...
+; struct S125 { S124 s124; };
+; S125 s125;
----------------
To get a really large string index that doesn't need lots of IR to be cleaned up, maybe just use one giant enum with lots of enumerators? Tie the enum to the IR with a single global, same as the struct here.


================
Comment at: test/DebugInfo/Generic/string-offsets-multiple-cus.ll:17-38
+; int aglob;
+; struct A {
+;   int i;
+;   float f;
+; };
+; 
+; void afunc(int i)
----------------
Could these be simplified to all use the same construct for the strings. Otherwise it makes the test a bit more confusing/cryptic by implying that functions, global variahbles, and structs, might be meaningfully different for string emission - which shouldn't be true.

(eg: all use global variables, all use enums (I'd say enums are probably a good call), or whatever)

Ah, I see, type units are relevant - you could probably make this work by having named/external linkage enums (which will go in type units) and anonymous/internal linkage enums (which won't go in type units - but can still be emitted by using an enumerator to initialize a global/external linkage variable, I think? That should cause the enumeration to go into the enum list on the CU)


================
Comment at: test/DebugInfo/Generic/string-offsets-multiple-cus.ll:49
+; BOTH-NOT:    {{DW_TAG|NULL}}
+; BOTH:        DW_AT_str_offsets_base [DW_FORM_sec_offset] (0x[[CU1_STROFF:[0-9a-f]+]])
+;
----------------
Maybe rather than naming it CU1_STROFF, name it STROFFBASE or something, since it's meant to be the same for all the units?


================
Comment at: test/DebugInfo/Generic/string-offsets-multiple-cus.ll:71-72
+;
+; Verify that the type unit has the proper DW_AT_str_offsets_base attribute and 
+; a segment in the .debug_str_offsets section.
+;
----------------
Is each type unit getting its own string offset? I'd expect type units to use the string offset table of the CU they were attached to, so this should be the same as CU1_STROFF?

I mean it's a tradeoff - using separate string offset sections per TU means they can be dropped if this copy of the TU is dropped. But it also means more duplication - the same string offset having to be duplicated in the CU and every TU if the same string appears in all/several of them.


================
Comment at: test/DebugInfo/Generic/string-offsets-multiple-cus.ll:110-111
+; TYPEUNITS-NEXT: {{.*:}}
+; The string with index 6 should be "bglob"
+; TYPEUNITS-NEXT: {{.*:}} [[BGLOBOFF]]
+
----------------
Would it be worth printing out the index number in the section dumping to make it easier to look them up? Then the matching wouldn't need 5 trivial matches, you could match the index from the unit dumping to the string here. (& similarly in the non-TU case above)


================
Comment at: test/DebugInfo/Generic/string-offsets-table.ll:13-21
+; struct A {
+;   int i;
+;   float f;
+; };
+; 
+; void afunc(int i)
+; {
----------------
Some comments about which parts of this code are important/what situation is being created/tested for (& ensuring the code is as simple as possible for that goal) might be handy.


https://reviews.llvm.org/D42021





More information about the llvm-commits mailing list