<html>
    <head>
      <base href="https://bugs.llvm.org/">
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - For some bitcodes it can take 12 hours to read and compile"
   href="https://bugs.llvm.org/show_bug.cgi?id=47395">47395</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>For some bitcodes it can take 12 hours to read and compile
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>libraries
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>trunk
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>Windows NT
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>enhancement
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>Bitcode Reader
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>scott.waye@hubse.com
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>llvm-bugs@lists.llvm.org
          </td>
        </tr></table>
      <p>
        <div>
        <pre>I create bitcode using libLLVM for the corert compiler project
(<a href="https://github.com/dotnet/corert">https://github.com/dotnet/corert</a>).  It uses the c# bindings over libLLVM from
<a href="https://github.com/Microsoft/LLVMSharp">https://github.com/Microsoft/LLVMSharp</a>.

I have 2 bitcodes generated from mostly the same source code.  They are around
240MB in size.  One compiles in 3 minutes, the other in 12 hours.  I suspect
the 12 hour compilation is either not optimal or doing something wrong.  I use
emscripten to compile and this ultimately calls

E:/GitHub/llvm-project/build/release/bin/clang++.exe -target
wasm32-unknown-emscripten -D__EMSCRIPTEN_major__=1 -D__EMSCRIPTEN_minor__=39
-D__EMSCRIPTEN_tiny__=19 -D_LIBCPP_ABI_VERSION=2 -Dunix -D__unix -D__unix__
-Werror=implicit-function-declaration -Xclang -nostdsysteminc -Xclang
-isystemE:\GitHub\emsdk\upstream\emscripten\system\include\libcxx -Xclang
-isystemE:\GitHub\emsdk\upstream\emscripten\system\lib\libcxxabi\include
-Xclang
-isystemE:\GitHub\emsdk\upstream\emscripten\system\lib\libunwind\include
-Xclang -isystemE:\GitHub\emsdk\upstream\emscripten\system\include\compat
-Xclang -isystemE:\GitHub\emsdk\upstream\emscripten\system\include -Xclang
-isystemE:\GitHub\emsdk\upstream\emscripten\system\include\libc -Xclang
-isystemE:\GitHub\emsdk\upstream\emscripten\system\lib\libc\musl\arch\emscripten
-Xclang -isystemE:\GitHub\emsdk\upstream\emscripten\system\local\include
-Xclang -isystemE:\GitHub\emsdk\upstream\emscripten\system\include\SSE -Xclang
-isystemE:\GitHub\emsdk\upstream\emscripten\cache\wasm\include -DEMSCRIPTEN
-fignore-exceptions -c -g
E:\GitHub\UnoCoreRt\UnoCoreRt.Wasm\bin\Debug\netstandard2.0\UnoCoreRt.Wasm.bc
-Xclang -isystemE:\GitHub\emsdk\upstream\emscripten\system\include\SDL -c -o
E:\GitHub\UnoCoreRt\UnoCoreRt.Wasm\bin\Debug\netstandard2.0\UnoCoreRt-release.o
-mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj
-mllvm -disable-lsr -g

What I've noticed is that compared to the "fast", 3 minute compile, the "slow"
compile makes around 1 million calls to
<a href="https://github.com/llvm/llvm-project/blob/a6eb70c052da767aef6b041d0db20bdf3a9e06b5/llvm/lib/Bitcode/Reader/ValueList.cpp#L89">https://github.com/llvm/llvm-project/blob/a6eb70c052da767aef6b041d0db20bdf3a9e06b5/llvm/lib/Bitcode/Reader/ValueList.cpp#L89</a>
and hence the ResolveConstants variable ends up with that many entries. 
Resolving these constants is then what seems to take most of the time.  I think
the bitcode reader is identifying 1 million forward references so possible
causes of the slowness that come to mind are:

1.  Incorrect identification of forward references
2.  Incorrect writing from libLLVM that creates forward references
unnecessarily
3.  Slow algorithm to resolve correctly identified and written forward
references.

A copy of the bitcode is at <a href="http://dev.hubse.com/UnoCoreRt.Wasm.bc.msi">http://dev.hubse.com/UnoCoreRt.Wasm.bc.msi</a>   (its
not really an msi, just needed a binary extension that the web server would
serve).  File is actually a .7z compressed file, so needs renaming from .msi to
.7z

I privately messaged @tlively in discord and I believe he has confirmed that it
takes a long time for him also.

<a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - Crash in BitcodeReader.cpp under LTO"
   href="show_bug.cgi?id=46750">https://bugs.llvm.org/show_bug.cgi?id=46750</a>  looks to be the same area of code,
but not the same problem.

I did spend a bit of time with clang++  in the debugger, but I'm not that
familiar with it at all, so I couldn't make any conclusion about my 3 theories
above.</pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>