[llvm-bugs] [Bug 40221] New: lld-link /debug:ghash produces bad type stream with some masm object files

Thu Jan 3 13:34:19 PST 2019

https://bugs.llvm.org/show_bug.cgi?id=40221

            Bug ID: 40221
           Summary: lld-link /debug:ghash produces bad type stream with
                    some masm object files
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: Windows NT
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: DebugInfo
          Assignee: unassignedbugs at nondot.org
          Reporter: rnk at google.com
                CC: alexandre.ganea at ubisoft.com, jdevlieghere at apple.com,
                    keith.walker at arm.com, llvm-bugs at lists.llvm.org,
                    paul_robinson at playstation.sony.com, zturner at google.com

At a high-level, when enabling /debug:ghash in official chrome builds,
chrome.dll.pdb ends up containing a cycle in the TPI stream of the final PDB.
This causes various dumpers like cvdump and `llvm-pdbutil dump -types` to crash
while dumping the type stream.

The type stream merger has extra logic to deal with the case where type records
are not topologically sorted, as is the case in .debug$T sections in
masm-produced objects. The basic idea is that, if a type record contains
forward references, we don't insert it on the first pass. Then, we re-insert
all the records in subsequent passes until all the records are successfully
inserted. This relies on the idea that type record insertion is idempotent,
i.e. we can just insert all the records again, and if they were inserted on the
first pass, nothing happens. This is inefficient, but masm type streams are
small, i.e. < 10 elements.

However, it seems that we compute the wrong ghash when the type stream contains
forward references. Here's the ghash code in question:
https://github.com/llvm-git-prototype/llvm/blob/a845b0985672ee66c1cc8e070ca5d5ac6e89c0c9/llvm/lib/DebugInfo/CodeView/TypeHashing.cpp#L58

    for (TypeIndex TI : Indices) {
      ArrayRef<uint8_t> BytesToHash;
      if (TI.isSimple() || TI.isNoneType() || TI.toArrayIndex() >= Prev.size())
{
        const uint8_t *IndexBytes = reinterpret_cast<const uint8_t *>(&TI);
        BytesToHash = makeArrayRef(IndexBytes, sizeof(TypeIndex));
      } else {
        BytesToHash = Prev[TI.toArrayIndex()].Hash;
      }
      S.update(BytesToHash);
    }

In the case of forward references, i.e. `TI.toArrayIndex() >= Prev.size()`, we
just hash the type index as a plain type index, as if it were simple. This is
wrong, it will produce different hashes in different TUs.

I don't yet see how this leads to the off-by-one error and type graph cycle in
the result, but I think that's the cause of the bug.

We've been thinking about computing ghashes in parallel up front at the start
of the link, and I think that would be a great time to detect non-topo sorted
object files and sort them at that time. They are ultimately separable tasks,
and each obj can be handled in parallel, just like ghash computation.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20190103/731f38c7/attachment-0001.html>