<html>

    <head>

      <base href="https://bugs.llvm.org/">

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW - lld-link /debug:ghash produces bad type stream with some masm object files"

   href="https://bugs.llvm.org/show_bug.cgi?id=40221">40221</a>

          </td>

        </tr>

        <tr>

          <th>Summary</th>

          <td>lld-link /debug:ghash produces bad type stream with some masm object files

          </td>

        </tr>

        <tr>

          <th>Product</th>

          <td>libraries

          </td>

        </tr>

        <tr>

          <th>Version</th>

          <td>trunk

          </td>

        </tr>

        <tr>

          <th>Hardware</th>

          <td>PC

          </td>

        </tr>

        <tr>

          <th>OS</th>

          <td>Windows NT

          </td>

        </tr>

        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>

        <tr>

          <th>Severity</th>

          <td>enhancement

          </td>

        </tr>

        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>

        <tr>

          <th>Component</th>

          <td>DebugInfo

          </td>

        </tr>

        <tr>

          <th>Assignee</th>

          <td>unassignedbugs@nondot.org

          </td>

        </tr>

        <tr>

          <th>Reporter</th>

          <td>rnk@google.com

          </td>

        </tr>

        <tr>

          <th>CC</th>

          <td>alexandre.ganea@ubisoft.com, jdevlieghere@apple.com, keith.walker@arm.com, llvm-bugs@lists.llvm.org, paul_robinson@playstation.sony.com, zturner@google.com

          </td>

        </tr></table>

      <p>

        <div>

        <pre>At a high-level, when enabling /debug:ghash in official chrome builds,

chrome.dll.pdb ends up containing a cycle in the TPI stream of the final PDB.

This causes various dumpers like cvdump and `llvm-pdbutil dump -types` to crash

while dumping the type stream.

The type stream merger has extra logic to deal with the case where type records

are not topologically sorted, as is the case in .debug$T sections in

masm-produced objects. The basic idea is that, if a type record contains

forward references, we don't insert it on the first pass. Then, we re-insert

all the records in subsequent passes until all the records are successfully

inserted. This relies on the idea that type record insertion is idempotent,

i.e. we can just insert all the records again, and if they were inserted on the

first pass, nothing happens. This is inefficient, but masm type streams are

small, i.e. < 10 elements.

However, it seems that we compute the wrong ghash when the type stream contains

forward references. Here's the ghash code in question:

<a href="https://github.com/llvm-git-prototype/llvm/blob/a845b0985672ee66c1cc8e070ca5d5ac6e89c0c9/llvm/lib/DebugInfo/CodeView/TypeHashing.cpp#L58">https://github.com/llvm-git-prototype/llvm/blob/a845b0985672ee66c1cc8e070ca5d5ac6e89c0c9/llvm/lib/DebugInfo/CodeView/TypeHashing.cpp#L58</a>

    for (TypeIndex TI : Indices) {

      ArrayRef<uint8_t> BytesToHash;

      if (TI.isSimple() || TI.isNoneType() || TI.toArrayIndex() >= Prev.size())

{

        const uint8_t *IndexBytes = reinterpret_cast<const uint8_t *>(&TI);

        BytesToHash = makeArrayRef(IndexBytes, sizeof(TypeIndex));

      } else {

        BytesToHash = Prev[TI.toArrayIndex()].Hash;

      }

      S.update(BytesToHash);

    }

In the case of forward references, i.e. `TI.toArrayIndex() >= Prev.size()`, we

just hash the type index as a plain type index, as if it were simple. This is

wrong, it will produce different hashes in different TUs.

I don't yet see how this leads to the off-by-one error and type graph cycle in

the result, but I think that's the cause of the bug.

We've been thinking about computing ghashes in parallel up front at the start

of the link, and I think that would be a great time to detect non-topo sorted

object files and sort them at that time. They are ultimately separable tasks,

and each obj can be handled in parallel, just like ghash computation.</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>