<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    <p><br>
    </p>
    <div class="moz-cite-prefix">On 26.10.2020 22:38, David Blaikie
      wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:CAENS6EvyRRDXrYnU+eeQy7PxHYf3j6jFJbPox+x_b9bzKXbKuw@mail.gmail.com">
      <meta http-equiv="content-type" content="text/html; charset=UTF-8">
      <div dir="ltr">
        <div dir="ltr"><br>
        </div>
        <br>
        <div class="gmail_quote">
          <div dir="ltr" class="gmail_attr">On Sun, Oct 25, 2020 at 9:31
            AM Alexey Lapshin <<a href="mailto:avl.lapshin@gmail.com"
              moz-do-not-send="true">avl.lapshin@gmail.com</a>>
            wrote:<br>
          </div>
          <blockquote class="gmail_quote" style="margin:0px 0px 0px
            0.8ex;border-left:1px solid
            rgb(204,204,204);padding-left:1ex">
            <div>
              <p><br>
              </p>
              <div>On 23.10.2020 19:43, David Blaikie wrote:<br>
              </div>
              <blockquote type="cite">
                <div dir="ltr">
                  <div class="gmail_quote">
                    <blockquote class="gmail_quote" style="margin:0px
                      0px 0px 0.8ex;border-left:1px solid
                      rgb(204,204,204);padding-left:1ex">
                      <blockquote type="cite">
                        <div dir="ltr">
                          <div class="gmail_quote">
                            <blockquote class="gmail_quote"
                              style="margin:0px 0px 0px
                              0.8ex;border-left:1px solid
                              rgb(204,204,204);padding-left:1ex">
                              <div><br>
                                <br>
                              </div>
                            </blockquote>
                            <div><br>
                            </div>
                            <div>Ah, yeah - that seems like a missed
                              opportunity - duplicating the whole type
                              DIE. LTO does this by making monolithic
                              types - merging all the members from
                              different definitions of the same type
                              into one, but that's maybe too expensive
                              for dsymutil (might still be interesting
                              to know how much more expensive, etc). But
                              I think the other way to go would be to
                              produce a declaration of the type, with
                              the relevant members - and let the DWARF
                              consumer identify this declaration as
                              matching up with the earlier definition.
                              That's the sort of DWARF you get from the
                              non-MachO default -fno-standalone-debug
                              anyway, so it's already pretty well
                              tested/supported (support in lldb's a bit
                              younger/more work-in-progress,
                              admittedly). I wonder how much dsym size
                              there is that could be reduced by such an
                              implementation.</div>
                          </div>
                        </div>
                      </blockquote>
                      <p>I see. Yes, that could be done and I think it
                        would result in noticeable size reduction(I do
                        not know exact numbers at the moment).</p>
                      <p>I work on multi-thread DWARFLinker now and it`s
                        first version will do exactly the same type
                        processing like current dsymutil.</p>
                    </blockquote>
                    <div>Yeah, best to keep the behavior the same
                      through that</div>
                    <blockquote class="gmail_quote" style="margin:0px
                      0px 0px 0.8ex;border-left:1px solid
                      rgb(204,204,204);padding-left:1ex">
                      <div>
                        <p>Above scheme could be implemented as a next
                          step and it would result in better size
                          reduction(better than current state).</p>
                        <p>But I think the better scheme could be done
                          also and it would result in even bigger size
                          reduction and in faster execution. This scheme
                          is something similar to what you`ve described
                          above: "LTO does - making monolithic types -
                          merging all the members from different
                          definitions of the same type into one".</p>
                      </div>
                    </blockquote>
                    <div>I believe the reason that's probably not been
                      done is that it can't be streamed - it'd lead to
                      buffering more of the output </div>
                  </div>
                </div>
              </blockquote>
              <p>yes. The fact that DWARF should be streamed into
                AsmPrinter complicates parallel dwarf generation. In my
                prototype, I generate <br>
                several resulting files(each for one source compilation
                unit) and then sequentially glue them into the final
                resulting file.<br>
              </p>
            </div>
          </blockquote>
          <div>How does that help? Do you use relocations in those
            intermediate object files so the DWARF in them can refer
            across files? <br>
          </div>
        </div>
      </div>
    </blockquote>
    <p>It does not help with referring across the file. It helps to
      parallel the generation of CU bodies. <br>
      It is not possible to write two CUs in parallel into AsmPrinter.
      To make possible parallel generation I stream them into different
      AsmPrinters(this comment is for "I believe the reason that's
      probably not been done is that it can't be streamed". which
      initially was about referring across the file, but it seems I
      added another direction).<br>
    </p>
    <blockquote type="cite"
cite="mid:CAENS6EvyRRDXrYnU+eeQy7PxHYf3j6jFJbPox+x_b9bzKXbKuw@mail.gmail.com">
      <div dir="ltr">
        <div class="gmail_quote">
          <blockquote class="gmail_quote" style="margin:0px 0px 0px
            0.8ex;border-left:1px solid
            rgb(204,204,204);padding-left:1ex">
            <div>
              <p> </p>
              <p><br>
              </p>
              <blockquote type="cite">
                <div dir="ltr">
                  <div class="gmail_quote">
                    <div>(if two of these expandable types were in one
                      CU - the start of the second type couldn't be
                      known until the end because it might keep getting
                      pushed later due to expansion of the first type)
                      and/or having to revisit all the type references
                      (the offset to the second type wouldn't be known
                      until the end - so writing the offsets to refer to
                      the type would have to be deferred until then).<br>
                    </div>
                  </div>
                </div>
              </blockquote>
              <p>That is the second problem: offsets are not known until
                the end of file.<br>
                dsymutil already has that situation for inter-CU
                references, so it has extra pass to<br>
                fixup offsets. </p>
            </div>
          </blockquote>
          <div>Oh, it does? I figured it was one-pass, and that it only
            ever refers back to types in previous CUs? So it doesn't
            have to go back and do a second pass. But I guess if sees a
            declaration of T1 in CU1, then later on sees a definition of
            T1 in CU2, does it somehow go back to CU1 and remove the
            declaration/make references refer to the definition in CU2?
            I figured it'd just leave the declaration and references to
            it as-is, then add the definition and use that from CU2
            onwards? <br>
          </div>
        </div>
      </div>
    </blockquote>
    <p>For the processing of the types, it do not go back. <br>
      This "I figured it was one-pass, and that it only ever refers back
      to types in previous CUs" <br>
      and this "I figured it'd just leave the declaration and references
      to it as-is, then add the definition and use that from CU2
      onwards" are correct. <br>
      <br>
    </p>
    <blockquote type="cite"
cite="mid:CAENS6EvyRRDXrYnU+eeQy7PxHYf3j6jFJbPox+x_b9bzKXbKuw@mail.gmail.com">
      <div dir="ltr">
        <div class="gmail_quote">
          <blockquote class="gmail_quote" style="margin:0px 0px 0px
            0.8ex;border-left:1px solid
            rgb(204,204,204);padding-left:1ex">
            <div>
              <p>With multi-thread implementation such situation would
                arise more often <br>
                for type references and so more offsets should be fixed
                during additional pass.<br>
              </p>
              <blockquote type="cite">
                <div dir="ltr">
                  <div class="gmail_quote">
                    <blockquote class="gmail_quote" style="margin:0px
                      0px 0px 0.8ex;border-left:1px solid
                      rgb(204,204,204);padding-left:1ex">
                      <div>
                        <p>DWARFLinker could create additional
                          artificial compile unit and put all merged
                          types there. Later patch all type references
                          to point into this additional compilation
                          unit.  No any bits would be duplicated in that
                          case. The performance improvement could be
                          achieved due to less amount of the copied
                          DWARF and due to the fact that type references
                          could be updated when DWARF is cloned(no need
                          in additional pass for that).<br>
                        </p>
                      </div>
                    </blockquote>
                    <div>"later patch all type references to point into
                      this additional compilation unit" - that's the
                      additional pass that people are probably
                      talking/concerned about. Rewalking all the DWARF.
                      The current dsymutil approach, as far as I know,
                      is single pass - it knows the final, absolute
                      offset to the type from the moment it emits that
                      type/needs to refer to it. <br>
                    </div>
                  </div>
                </div>
              </blockquote>
              <p>Right. Current dsymutil approach is single pass. And
                from that point of view, solution <br>
                which you`ve described(to produce a declaration of the
                type, with the relevant members) <br>
                allows to keep that single pass implementation.<br>
                <br>
                But there is a restriction for current dsymutil
                approach: To process inter-CU references <br>
                it needs to load all DWARF into the memory(While it
                analyzes which part of DWARF is live, <br>
                it needs to have all CUs loaded into the memory).</p>
            </div>
          </blockquote>
          <div>All DWARF for a single file (which for dsymutil is mostly
            a single CU, except with LTO I guess?), not all DWARF for
            all inputs in memory at once, yeah? <br>
          </div>
        </div>
      </div>
    </blockquote>
    <p>right. In dsymutil case - all DWARF for a single file(not all
      DWARF for all inputs in memory at once).<br>
      But in llvm-dwarfutil case single file contains DWARF for all
      original input object files and it all becomes<br>
      loaded into memory.<br>
    </p>
    <blockquote type="cite"
cite="mid:CAENS6EvyRRDXrYnU+eeQy7PxHYf3j6jFJbPox+x_b9bzKXbKuw@mail.gmail.com">
      <div dir="ltr">
        <div class="gmail_quote">
          <blockquote class="gmail_quote" style="margin:0px 0px 0px
            0.8ex;border-left:1px solid
            rgb(204,204,204);padding-left:1ex">
            <div>
              <p> That leads to huge memory usage. <br>
                It is less important when source is a set of object
                files(like in dsymutil case) and this <br>
                become a real problem for llvm-dwarfutil utility when
                source is a single file(With current <br>
                implementation it needs 30G of memory for compiling
                clang binary).<br>
              </p>
            </div>
          </blockquote>
          <div>Yeah, that's where I think you'd need a fixup pass one
            way or another - because cross-CU references can mean that
            when you figure out a new layout for CU5 (because it has a
            duplicate type definition of something in CU1) then you
            might have to touch CU4 that had an absolute/cross-CU
            forward reference to CU5. Once you've got such a fixup pass
            (if dsymutil already has one? Which, like I said, I'm
            confused why it would have one/that doesn't match my very
            vague understanding) then I think you could make dsymutil
            work on a per-CU basis streaming things out, then fixing up
            a few offsets.<br>
          </div>
        </div>
      </div>
    </blockquote>
    <p>When dsymutil deduplicates types it changes local CU reference
      into inter-CU reference(so that CU2(next) could reference type
      definition from CU1(prev)). To do this change it does not need to
      do any fixups currently.<br>
      <br>
      When dsymutil meets already existed(located in the input object
      file) inter-CU reference pointing into the CU which has not been
      processed yet(and then its offset is unknown) it marks it as
      "forward reference" and patches later during additional pass
      "fixup forward references" at a time when offsets are known. <br>
      <br>
      If CUs would be processed in parallel their offsets would not be
      known at the moment when local type reference would be changed
      into inter-CU reference. So we would need to do the same fix-up
      processing for all references to the types like we already do for
      other inter-CU references.<br>
      <br>
    </p>
    <blockquote type="cite"
cite="mid:CAENS6EvyRRDXrYnU+eeQy7PxHYf3j6jFJbPox+x_b9bzKXbKuw@mail.gmail.com">
      <div dir="ltr">
        <div class="gmail_quote">
          <blockquote class="gmail_quote" style="margin:0px 0px 0px
            0.8ex;border-left:1px solid
            rgb(204,204,204);padding-left:1ex">
            <div>
              <p>Without loading all CU into the memory it would require
                two passes solution. First to analyze <br>
                which part of DWARF relates to live code and then second
                pass to generate the result. <br>
              </p>
            </div>
          </blockquote>
          <div>Not sure it'd require any more second pass than a "fixup"
            pass, which it sounds like you're saying it already has? <br>
          </div>
        </div>
      </div>
    </blockquote>
    <p>It looks like it would need an additional pass to process
      inter-CU references(existed in incoming file) if we do not want to
      load all CUs into memory.<br>
      When the input file contains inter-CU references, DWARFLinker
      needs to follow them while doing liveness marking. i.e. if the
      original CU has a live part which references another CU we need to
      follow this new CU and mark the referenced part as life. At the
      current moment, while doing liveness analysis, we have all CUs in
      memory. That allows us to load all CUs once and analyze them all.
      In case llvm-dwarfutil(which loads all DWARF for input file) it
      leads to huge memory usage. <br>
      <br>
      Let's say CU1 references CU100. And CU100 references CU1. We could
      not start generation for CU1 until we analyzed CU100 and marked
      the corresponding part of CU1 as life. At the same time, we could
      not load DWARF for all CUs. Then processing(in simplified form)
      could look like this:<br>
      <br>
      1: for (CU : CU1...CU100)<br>
        load CU, do liveness analysis, remember references, unload CU<br>
        <br>
      2: for (all references)<br>
        load CU, do liveness analysis, unload CU<br>
        <br>
      3: for (CU : CU1...CU100)<br>
        load CU, clone CU<br>
        <br>
      That is a simplified scheme, but I think it is enough to show the
      idea. In this scheme we have 1 and 2 which should be done before
      3. <br>
    </p>
    <p><br>
    </p>
    <p>Alexey.<br>
    </p>
    <blockquote type="cite"
cite="mid:CAENS6EvyRRDXrYnU+eeQy7PxHYf3j6jFJbPox+x_b9bzKXbKuw@mail.gmail.com">
      <div dir="ltr">
        <div class="gmail_quote">
          <blockquote class="gmail_quote" style="margin:0px 0px 0px
            0.8ex;border-left:1px solid
            rgb(204,204,204);padding-left:1ex">
            <div>
              <p> If we would have a two passes solution then we could
                create a compilation unit with all <br>
                types at first pass and at the second pass we could
                generate result with correct offsets(no <br>
                need to fix up them as it is currently required by
                dsymutil for forward inter-CU references).<br>
                The open question currently: how expensive this two
                passes approach is.<br>
              </p>
              <p>Thank you, Alexey.<br>
              </p>
              <blockquote type="cite">
                <div dir="ltr">
                  <div class="gmail_quote">
                    <blockquote class="gmail_quote" style="margin:0px
                      0px 0px 0.8ex;border-left:1px solid
                      rgb(204,204,204);padding-left:1ex">
                      <div>
                        <p> </p>
                        <p>Anyway, that might be the next step after
                          multi-thread DWARFLinker would be ready.<br>
                        </p>
                      </div>
                    </blockquote>
                    <div>Yep, be interesting to see how it all goes! </div>
                    <blockquote class="gmail_quote" style="margin:0px
                      0px 0px 0.8ex;border-left:1px solid
                      rgb(204,204,204);padding-left:1ex">
                      <div>
                        <p> </p>
                        <blockquote type="cite">
                          <div dir="ltr">
                            <div class="gmail_quote">
                              <div> </div>
                              <blockquote class="gmail_quote"
                                style="margin:0px 0px 0px
                                0.8ex;border-left:1px solid
                                rgb(204,204,204);padding-left:1ex">
                                <div> <br>
                                  Do you suggest that 0x0000011b should
                                  be transformed into something like
                                  that:<br>
                                  <br>
                                  0x000000fc: DW_TAG_compile_unit<br>
                                                DW_AT_language   
                                  (DW_LANG_C_plus_plus)<br>
                                                DW_AT_name       
                                  ("templ.cpp")<br>
                                                DW_AT_stmt_list  
                                  (0x00000090)<br>
                                                DW_AT_low_pc     
                                  (0x0000000100000fa0)<br>
                                                DW_AT_high_pc    
                                  (0x0000000100000fab)<br>
                                  <br>
                                  0x0000011b:   DW_TAG_structure_type<br>
                                                  DW_AT_specification
                                  (0x0000002a "x")<br>
                                  <br>
                                  0x00000124:     DW_TAG_subprogram<br>
                                                   
                                  DW_AT_linkage_name   
                                  ("_ZN1x2f3IiEEiv")<br>
                                                    DW_AT_name   
                                  ("f3<int>")<br>
                                                    DW_AT_type   
                                  (0x000000000000005e "int")<br>
                                                   
                                  DW_AT_declaration     (true)<br>
                                                   
                                  DW_AT_external        (true)<br>
                                                   
                                  DW_AT_APPLE_optimized (true)<br>
                                  0x00000138:       NULL<br>
                                  0x00000139:     NULL<br>
                                  <br>
                                  0x00000140:   DW_TAG_subprogram<br>
                                                  DW_AT_low_pc   
                                  (0x0000000100000fa0)<br>
                                                  DW_AT_high_pc  
                                  (0x0000000100000fab)<br>
                                                 
                                  DW_AT_specification    
                                  (0x0000000000000124 "_ZN1x2f3IiEEiv")<br>
                                  0x00000155:     NULL<br>
                                  <br>
                                  Did I correctly get the idea?<br>
                                </div>
                              </blockquote>
                              <div><br>
                              </div>
                              <div>Yep, more or less. It'd be "safer" if
                                11b didn't use DW_AT_specification to
                                refer to 2a, but instead was only a
                                completely independent declaration of
                                "x" - that path is already well
                                supported/tested (well, it's the
                                work-in-progress stuff for lldb to
                                support -fno-standalone-debug, but gdb's
                                been consuming DWARF like this for
                                years, Clang and GCC both produce DWARF
                                like this (if the type is "homed" in
                                another file, then Clang/GCC produce
                                DWARF that emits a declaration with just
                                the members needed to define any member
                                functions defined/inlined/referenced in
                                this CU)) for years.<br>
                                <br>
                                But using DW_AT_specification, or maybe
                                some other extension attribute might
                                make the consumers task a bit easier
                                (could do both - use an extension
                                attribute to tie them up, leave
                                DW_AT_declaration/DW_AT_name here for
                                consumers that don't understand the
                                extension attribute) in finding that
                                they're all the same type/pieces of teh
                                same type.</div>
                              <div> </div>
                            </div>
                          </div>
                        </blockquote>
                        <p>yes. would try this solution.</p>
                        <p>Thank you, Alexey.<br>
                        </p>
                        <br>
                      </div>
                    </blockquote>
                  </div>
                </div>
              </blockquote>
            </div>
          </blockquote>
        </div>
      </div>
    </blockquote>
  </body>
</html>