<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sun, Oct 25, 2020 at 9:31 AM Alexey Lapshin <<a href="mailto:avl.lapshin@gmail.com">avl.lapshin@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

  
  <div>

    <p><br>

    </p>

    <div>On 23.10.2020 19:43, David Blaikie

      wrote:<br>

    </div>

    <blockquote type="cite">

      
      <div dir="ltr">

        <div class="gmail_quote">

          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

            <blockquote type="cite">

              <div dir="ltr">

                <div class="gmail_quote">

                  <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                    <div><br>

                      <br>

                    </div>

                  </blockquote>

                  <div><br>

                  </div>

                  <div>Ah, yeah - that seems like a missed opportunity -

                    duplicating the whole type DIE. LTO does this by

                    making monolithic types - merging all the members

                    from different definitions of the same type into

                    one, but that's maybe too expensive for dsymutil

                    (might still be interesting to know how much more

                    expensive, etc). But I think the other way to go

                    would be to produce a declaration of the type, with

                    the relevant members - and let the DWARF consumer

                    identify this declaration as matching up with the

                    earlier definition. That's the sort of DWARF you get

                    from the non-MachO default -fno-standalone-debug

                    anyway, so it's already pretty well tested/supported

                    (support in lldb's a bit younger/more

                    work-in-progress, admittedly). I wonder how much

                    dsym size there is that could be reduced by such an

                    implementation.</div>

                </div>

              </div>

            </blockquote>

            <p>I see. Yes, that could be done and I think it would

              result in noticeable size reduction(I do not know exact

              numbers at the moment).</p>

            <p>I work on multi-thread DWARFLinker now and it`s first

              version will do exactly the same type processing like

              current dsymutil.</p>

          </blockquote>

          <div>Yeah, best to keep the behavior the same through that</div>

          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

            <div>

              <p>Above scheme could be implemented as a next step and it

                would result in better size reduction(better than

                current state).</p>

              <p>But I think the better scheme could be done also and it

                would result in even bigger size reduction and in faster

                execution. This scheme is something similar to what

                you`ve described above: "LTO does - making monolithic

                types - merging all the members from different

                definitions of the same type into one".</p>

            </div>

          </blockquote>

          <div>I believe the reason that's probably not been done is

            that it can't be streamed - it'd lead to buffering more of

            the output </div>

        </div>

      </div>

    </blockquote>

    <p>yes. The fact that DWARF should be streamed into AsmPrinter

      complicates parallel dwarf generation. In my prototype, I generate

      <br>

      several resulting files(each for one source compilation unit) and

      then sequentially glue them into the final resulting file.<br></p></div></blockquote><div>How does that help? Do you use relocations in those intermediate object files so the DWARF in them can refer across files? <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><p>

    </p>

    <p><br>

    </p>

    <blockquote type="cite">

      <div dir="ltr">

        <div class="gmail_quote">

          <div>(if two of these expandable types were in one CU - the

            start of the second type couldn't be known until the end

            because it might keep getting pushed later due to expansion

            of the first type) and/or having to revisit all the type

            references (the offset to the second type wouldn't be known

            until the end - so writing the offsets to refer to the type

            would have to be deferred until then).<br>

          </div>

        </div>

      </div>

    </blockquote>

    <p>That is the second problem: offsets are not known until the end

      of file.<br>

      dsymutil already has that situation for inter-CU references, so it

      has extra pass to<br>

      fixup offsets. </p></div></blockquote><div>Oh, it does? I figured it was one-pass, and that it only ever refers back to types in previous CUs? So it doesn't have to go back and do a second pass. But I guess if sees a declaration of T1 in CU1, then later on sees a definition of T1 in CU2, does it somehow go back to CU1 and remove the declaration/make references refer to the definition in CU2? I figured it'd just leave the declaration and references to it as-is, then add the definition and use that from CU2 onwards? <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><p>With multi-thread implementation such situation

      would arise more often <br>

      for type references and so more offsets should be fixed during

      additional pass.<br>

    </p>

    <blockquote type="cite">

      <div dir="ltr">

        <div class="gmail_quote">

          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

            <div>

              <p>DWARFLinker could create additional artificial compile

                unit and put all merged types there. Later patch all

                type references to point into this additional

                compilation unit.  No any bits would be duplicated in

                that case. The performance improvement could be achieved

                due to less amount of the copied DWARF and due to the

                fact that type references could be updated when DWARF is

                cloned(no need in additional pass for that).<br>

              </p>

            </div>

          </blockquote>

          <div>"later patch all type references to point into this

            additional compilation unit" - that's the additional pass

            that people are probably talking/concerned about. Rewalking

            all the DWARF. The current dsymutil approach, as far as I

            know, is single pass - it knows the final, absolute offset

            to the type from the moment it emits that type/needs to

            refer to it. <br>

          </div>

        </div>

      </div>

    </blockquote>

    <p>Right. Current dsymutil approach is single pass. And from that

      point of view, solution <br>

      which you`ve described(to produce a declaration of the type, with

      the relevant members) <br>

      allows to keep that single pass implementation.<br>

      <br>

      But there is a restriction for current dsymutil approach: To

      process inter-CU references <br>

      it needs to load all DWARF into the memory(While it analyzes which

      part of DWARF is live, <br>

      it needs to have all CUs loaded into the memory).</p></div></blockquote><div>All DWARF for a single file (which for dsymutil is mostly a single CU, except with LTO I guess?), not all DWARF for all inputs in memory at once, yeah? <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><p> That leads to

      huge memory usage. <br>

      It is less important when source is a set of object files(like in

      dsymutil case) and this <br>

      become a real problem for llvm-dwarfutil utility when source is a

      single file(With current <br>

      implementation it needs 30G of memory for compiling clang binary).<br></p></div></blockquote><div>Yeah, that's where I think you'd need a fixup pass one way or another - because cross-CU references can mean that when you figure out a new layout for CU5 (because it has a duplicate type definition of something in CU1) then you might have to touch CU4 that had an absolute/cross-CU forward reference to CU5. Once you've got such a fixup pass (if dsymutil already has one? Which, like I said, I'm confused why it would have one/that doesn't match my very vague understanding) then I think you could make dsymutil work on a per-CU basis streaming things out, then fixing up a few offsets.<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><p>Without loading all CU into the memory it would require two passes

      solution. First to analyze <br>

      which part of DWARF relates to live code and then second pass to

      generate the result. <br></p></div></blockquote><div>Not sure it'd require any more second pass than a "fixup" pass, which it sounds like you're saying it already has? </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><p>

      If we would have a two passes solution then we could create a

      compilation unit with all <br>

      types at first pass and at the second pass we could generate

      result with correct offsets(no <br>

      need to fix up them as it is currently required by dsymutil for

      forward inter-CU references).<br>

      The open question currently: how expensive this two passes

      approach is.<br>

    </p>

    <p>Thank you, Alexey.<br>

    </p>

    <blockquote type="cite">

      <div dir="ltr">

        <div class="gmail_quote">

          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

            <div>

              <p> </p>

              <p>Anyway, that might be the next step after multi-thread

                DWARFLinker would be ready.<br>

              </p>

            </div>

          </blockquote>

          <div>Yep, be interesting to see how it all goes! </div>

          <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

            <div>

              <p> </p>

              <blockquote type="cite">

                <div dir="ltr">

                  <div class="gmail_quote">

                    <div> </div>

                    <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

                      <div> <br>

                        Do you suggest that 0x0000011b should be

                        transformed into something like that:<br>

                        <br>

                        0x000000fc: DW_TAG_compile_unit<br>

                                      DW_AT_language   

                        (DW_LANG_C_plus_plus)<br>

                                      DW_AT_name        ("templ.cpp")<br>

                                      DW_AT_stmt_list   (0x00000090)<br>

                                      DW_AT_low_pc     

                        (0x0000000100000fa0)<br>

                                      DW_AT_high_pc    

                        (0x0000000100000fab)<br>

                        <br>

                        0x0000011b:   DW_TAG_structure_type<br>

                                        DW_AT_specification (0x0000002a

                        "x")<br>

                        <br>

                        0x00000124:     DW_TAG_subprogram<br>

                                          DW_AT_linkage_name   

                        ("_ZN1x2f3IiEEiv")<br>

                                          DW_AT_name   

                        ("f3<int>")<br>

                                          DW_AT_type   

                        (0x000000000000005e "int")<br>

                                          DW_AT_declaration     (true)<br>

                                          DW_AT_external        (true)<br>

                                          DW_AT_APPLE_optimized (true)<br>

                        0x00000138:       NULL<br>

                        0x00000139:     NULL<br>

                        <br>

                        0x00000140:   DW_TAG_subprogram<br>

                                        DW_AT_low_pc   

                        (0x0000000100000fa0)<br>

                                        DW_AT_high_pc  

                        (0x0000000100000fab)<br>

                                        DW_AT_specification    

                        (0x0000000000000124 "_ZN1x2f3IiEEiv")<br>

                        0x00000155:     NULL<br>

                        <br>

                        Did I correctly get the idea?<br>

                      </div>

                    </blockquote>

                    <div><br>

                    </div>

                    <div>Yep, more or less. It'd be "safer" if 11b

                      didn't use DW_AT_specification to refer to 2a, but

                      instead was only a completely independent

                      declaration of "x" - that path is already well

                      supported/tested (well, it's the work-in-progress

                      stuff for lldb to support -fno-standalone-debug,

                      but gdb's been consuming DWARF like this for

                      years, Clang and GCC both produce DWARF like this

                      (if the type is "homed" in another file, then

                      Clang/GCC produce DWARF that emits a declaration

                      with just the members needed to define any member

                      functions defined/inlined/referenced in this CU))

                      for years.<br>

                      <br>

                      But using DW_AT_specification, or maybe some other

                      extension attribute might make the consumers task

                      a bit easier (could do both - use an extension

                      attribute to tie them up, leave

                      DW_AT_declaration/DW_AT_name here for consumers

                      that don't understand the extension attribute) in

                      finding that they're all the same type/pieces of

                      teh same type.</div>

                    <div> </div>

                  </div>

                </div>

              </blockquote>

              <p>yes. would try this solution.</p>

              <p>Thank you, Alexey.<br>

              </p>

              <br>

            </div>

          </blockquote>

        </div>

      </div>

    </blockquote>

  </div>


</blockquote></div></div>