<html>
    <head>
      <base href="https://bugs.llvm.org/">
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - `llvm-objcopy --keep-symbols` is slower than GNU equivalent"
   href="https://bugs.llvm.org/show_bug.cgi?id=50404">50404</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>`llvm-objcopy --keep-symbols` is slower than GNU equivalent
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>tools
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>trunk
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>Windows NT
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>enhancement
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>llvm-objcopy/strip
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>brian.cain@gmail.com
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>alexander.v.shaposhnikov@gmail.com, jake.h.ehrlich@gmail.com, jh7370.2008@my.bristol.ac.uk, llvm-bugs@lists.llvm.org, rupprecht@google.com
          </td>
        </tr></table>
      <p>
        <div>
        <pre>While investigating an issue with android build performance, I found that
llvm-objcopy performance is far from parity with GNU objcopy for the
"--keep-symbols" case, when the keep symbols list is very large.

I can see that llvm objcopy creates a list of NameOrPattern-s to represent the
keep symbol list, but it looks like it exhaustively compares each symbol in the
keep list against each symbol in the input file.  Not sure what GNU objcopy
does but merging the non-pattern symbol names into a sorted list would make
searching much faster.

Below is a script to reproduce the problem and its output.  The test case below
demonstrates a significant performance difference.  The android build failure
case reported had a shared obj with ~700k symbols and the keep list was ~300k
symbols and it took ~8 minutes to execute.

~~~

Here's the output I get when I run it on ToT-within-last-week-or-so:

$ PATH=$PWD/bin:$PATH ../../tmp/qt66370/32/objcopy_perf.sh
creating init file
creating obj file
creating shared obj file
ld.lld: warning: lld uses blx instruction, no object with architecture
supporting feature detected
performing llvm objcopy
12.34user 0.00system 0:12.35elapsed 99%CPU (0avgtext+0avgdata
20228maxresident)k
0inputs+2288outputs (0major+3815minor)pagefaults 0swaps
performing GNU objcopy
0.03user 0.00system 0:00.04elapsed 97%CPU (0avgtext+0avgdata 20396maxresident)k
0inputs+2288outputs (0major+5895minor)pagefaults 0swaps</pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>