[llvm-bugs] [Bug 43618] New: [llvm-profdata] Race condition in mergeInstrProfile() with NumThreads >1 causing corruption/crash

via llvm-bugs llvm-bugs at lists.llvm.org
Wed Oct 9 07:02:53 PDT 2019


https://bugs.llvm.org/show_bug.cgi?id=43618

            Bug ID: 43618
           Summary: [llvm-profdata] Race condition in mergeInstrProfile()
                    with NumThreads >1 causing corruption/crash
           Product: new-bugs
           Version: trunk
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: release blocker
          Priority: P
         Component: new bugs
          Assignee: unassignedbugs at nondot.org
          Reporter: ua_llvm-bugs at binary-island.eu
                CC: htmldeveloper at gmail.com, llvm-bugs at lists.llvm.org

llvm-profdata.cpp - mergeInstrProfile():

  SmallVector<std::unique_ptr<WriterContext>, 4> Contexts;
  for (unsigned I = 0; I < NumThreads; ++I)
    Contexts.emplace_back(std::make_unique<WriterContext>(
        OutputSparse, ErrorLock, WriterErrorCodes));

  ....

    unsigned Ctx = 0;
    for (const auto &Input : Inputs) {
      Pool.async(loadInput, Input, Remapper, Contexts[Ctx].get());
      Ctx = (Ctx + 1) % NumThreads;
    }
    Pool.wait()


There is an error in reasoning here causing a nasty race condition: Only
NumThreads many Contexts are being created and re-used for _all_ files being
processed (which are most likely way more than NumThreads for bigger projects).
Since all Tasks are queued with a preassigned Context, Tasks with the same
assigned Context, are racing with each other.

If Task A and Task D both have the same Context, while B and C do have
different ones, and if Task A takes longer to complete than B and C, then Task
D will execute even though Task A is still using the very same Context. This
will cause corruption and crashes.

I see this while compiling Firefox where llvm-profdata crashes consistently
(as-in: almost 100% of the time on my machine).

This bug goes back when threading was introducing to llvm-profdata in July
2016, so it is present in all current major versions (trunk, 9.x, 8.x, ...).

A probable fix would be to assign each Task its own unique Context, which would
in return increase the memory usage linearly with the amount of files that need
to be processed. This would be the easiest fix to apply since it requires the
fewest changes overall.

Alternatively, schedule Tasks in batches and wait for those to finish before
scheduling new Tasks, with memory usage now being O(threads). But this would
probably be measurably slower.

I haven't provided a patch for this, since I wasn't sure what the desired fix
was because each comes with a cost.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20191009/be5716b2/attachment.html>


More information about the llvm-bugs mailing list