[PATCH] D96035: [WIP][dsymutil][DWARFlinker] implement separate multi-thread processing for compile units.

Fri Feb 5 07:38:05 PST 2021

avl added a comment.

In D96035#2543434 <https://reviews.llvm.org/D96035#2543434>, @clayborg wrote:

>> Handling inter-CU references: inter-CU references are hard to process using only one pass. f.e. if CU1 references CU100 and CU100 references CU1, we could not finish handling of CU1 until we finished CU100. Thus we either need to load all CUs into the memory, either load CUs several times.
>
> What kind of inter-CU references do you see? There really shouldn't be any unless they are all in the same .o file and were produced by an LTO pass. Are there any inter CU references that I am forgetting about?

yes. That is the case: they are all in the same .o file and were produced by an LTO pass.

>> a) No common abbreviation table. Each compile unit has its own abbreviation table. Generating common abbreviation table slowdowns parallel execution (This is a resource that is accessed many times from many threads). Abbreviation table does not take a lot of space, so it looks cheap to have separate abbreviations tables. Later, it might be optimized a bit(by removing equal abbreviations tables).
>
> Can't the linking process create on large uniqued abbrev table as it links each .o file's DWARF into the final output? This info isn't that large that is true, but it'd be nice to have just one table as long as we are optimizing the DWARF.

If we try to generate it while cloning DIEs it leads to slow execution, because threads are waiting for each other.
(All threads need access to the global abbreviation table for every die. Thus execution time spent on waiting).

If we try to implement it as some post-process step(when all compile units are processed) - then it would also require too much additional time. For each DIE we need to add abbreviations into the global abbreviation table. This might change the abbreviation code. Since the abbreviation code is ULEB128 - The size of DIE might change. That means we need to update all reference attributes. It would take a significant amount of time. Technically it might be done, but it looks like it would not be practical.

Probably, it might be easily done for "num-threads 1" case only.

>> b) Removed check for ambiguous type declarations. Existed check does not solve the problem in general (current check does not work if there are ambiguous  types in different compile units).
>
> I am assuming you didn't pull out ODR type uniquing all together right? That is very important and one of the main reasons for using dsymutil.

Right. This patch does mostly the same ODR deduplication as the current dsymutil. 
There are small differences:

1. In multi-thread mode, some canonical dies might become not seen by some threads.

  If the request for setting canonical dies come from several threads at the same time - some threads might receive information that there is not canonical die(though it is actually, it would be set by another thread). As the result, some type would not be de-duplicated. But the number of such cases is small. For the clang, there is no noticeable difference in the size of the binary.

2. Ambiguous types. That is what currently done by this code: https://github.com/llvm/llvm-project/blob/main/llvm/lib/DWARFLinker/DWARFLinkerDeclContext.cpp#L19

  i.e. when dsymutil can not distinguish function type because of parameterized arguments. It marks the declaration context as invalid. The current workaround does not solve the problem fully. A similar approach might be implemented(with the same limitations) by this patch. But, probably, we might want to think of a better solution.

> There are many things that can make this ODR type uniquing fail to work correctly, but those are bugs that should be fixed. One of the main things is if a class or struct has a forward declaration to another class inside of it. If this is the case the llvm dsymutil doesn't try and see if everyone has the same full class definition with a forward decl on it, it just punts and emits another copy of the type. I saw this when I would have a "FILE" type from the unix fopen(...) function. If multiple files had this type in a C++ source file, they wouldn't get uniqued because one of the members had a forward declaration. The reason it doesn't get uniqued is dsymutil worries someone might have a "FILE" type info in the DWARF that _does_ have a full definition for that forward declaration. A lot of stuff the compiler emits these days also causes many of the types to not be correctly removed even though they aren't used.
>
> So there are many C++ compiler things that seem to be keeping things alive in the DWARF output by dsymutil these days that used to not be there before. When ODR works correctly, the first instance of a type gets emitted from the first object file file that has it, and then all other .o files that have this same type remove it and have all CU references within changed to a DW_FORM_ref_addr inter CU reference. We should make sure that this isn't happening to many clang types and causing us to not unique types.
>
> When the type uniquing works correctly it really reduces the output DWARF size. Back at Apple the WebCore.framework DWARF debug info went from 1.8GB down to 450MB.
>
> As an example of C++ compiler type bloat, you can see this if you do "#include <cstdio>". If you compile this with a recent clang:
>
>   #include <stdio.h>
>   
>   int main() {
>       printf("hello\n");
>       return 0;
>   }
>
> The DWARF looks very small:
>
>   0x0000000b: DW_TAG_compile_unit
>                 DW_AT_producer	("Apple clang version 12.0.0 (clang-1200.0.32.29)")
>                 DW_AT_language	(DW_LANG_C_plus_plus_11)
>                 DW_AT_name	("main.cpp")
>                 DW_AT_LLVM_sysroot	("/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk")
>                 DW_AT_APPLE_sdk	("MacOSX.sdk")
>                 DW_AT_stmt_list	(0x00000000)
>                 DW_AT_comp_dir	("/Users/gclayton/Documents/src/args")
>                 DW_AT_low_pc	(0x0000000100003f60)
>                 DW_AT_high_pc	(0x0000000100003f8a)
>   
>   0x00000032:   DW_TAG_subprogram
>                   DW_AT_low_pc	(0x0000000100003f60)
>                   DW_AT_high_pc	(0x0000000100003f8a)
>                   DW_AT_frame_base	(DW_OP_reg6 RBP)
>                   DW_AT_name	("main")
>                   DW_AT_decl_file	("/Users/gclayton/Documents/src/args/main.cpp")
>                   DW_AT_decl_line	(4)
>                   DW_AT_type	(0x000000000000004b "int")
>                   DW_AT_external	(true)
>   
>   0x0000004b:   DW_TAG_base_type
>                   DW_AT_name	("int")
>                   DW_AT_encoding	(DW_ATE_signed)
>                   DW_AT_byte_size	(0x04)
>   
>   0x00000052:   NULL
>
> But if you include the "cstdio" file instead:
>
>   #include <cstdio>
>   
>   int main() {
>       printf("hello\n");
>       return 0;
>   }
>
> This ends up having tons of extra type stuff that really bloats the DWARF. The C++ header ends up with full definitions for every function in stdio.h and data types (like "FILE"!) for functions that are not used in the .o file. In this case the C++ code add DW_TAG_imported_declaration to all these types and that ends up keeping all of these extra types alive since something refers to it and ends up with way toe much stuff in the final dSYM. We can see the DW_TAG_imported_declaration below:
>
>   0x00000032:   DW_TAG_namespace
>                   DW_AT_name	("std")
>   
>   0x00000037:     DW_TAG_namespace
>                     DW_AT_name	("__1")
>                     DW_AT_export_symbols	(true)
>   
>   0x0000003c:       DW_TAG_imported_declaration
>                       DW_AT_decl_file	("/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/cstdio")
>                       DW_AT_decl_line	(109)
>                       DW_AT_import	(0x0000000000000195)
>   `
>
> And later we see:
>
>   0x00000193:       NULL
>   
>   0x00000194:     NULL
>   
>   0x00000195:   DW_TAG_typedef
>                   DW_AT_type	(0x00000000000001a0 "__sFILE")
>                   DW_AT_name	("FILE")
>                   DW_AT_decl_file	("/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/_stdio.h")
>                   DW_AT_decl_line	(157)
>   
>   0x000001a0:   DW_TAG_structure_type
>                   DW_AT_calling_convention	(DW_CC_pass_by_value)
>                   DW_AT_name	("__sFILE")
>                   DW_AT_byte_size	(0x98)
>                   DW_AT_decl_file	("/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/_stdio.h")
>                   DW_AT_decl_line	(126)
>   
>   0x000001a9:     DW_TAG_member
>                     DW_AT_name	("_p")
>                     DW_AT_type	(0x000000000000029a "unsigned char*")
>                     DW_AT_decl_file	("/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/_stdio.h")
>                     DW_AT_decl_line	(127)
>                     DW_AT_data_member_location	(0x00)
>   ....

According to the problem of imported declarations - this patch does the same as the current dsymutil. 
The solution for that problem would be nice to have. Though this is out of the scope of this patch.

>> c) .debug_frame. Already generated CIE records are not reused between object files
>
> Can't we unique these when the .o file gets linked on the main thread?

We can, but we need to re-parse generated debug_frame then.
Probably, If it would not take a lot of time, we might to try.

>> e) .debug_names is not supported by this patch.
>
> Any reason why this can't be done? I believe that dsymutil just generates the .debug_names (or Apple accelerator tables) after you have written all of the DWARF out by traversing it no? If this doesn't happen, it can be added with the --update option to dsymutil, but again, this takes time.

There is no fundamental reason. I think it might be supported.
I just need more time to check that code and understand how to implement it properly.
I would update this patch with implementation later.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96035/new/

https://reviews.llvm.org/D96035