[all-commits] [llvm/llvm-project] 45b8a7: [LLD][COFF] When using LLD-as-a-library, always pr...

Thu Nov 12 05:15:06 PST 2020

  Branch: refs/heads/master
  Home:   https://github.com/llvm/llvm-project
  Commit: 45b8a741fbbf271e0fb71294cb7cdce3ad4b9bf3
      https://github.com/llvm/llvm-project/commit/45b8a741fbbf271e0fb71294cb7cdce3ad4b9bf3
  Author: Alexandre Ganea <alexandre.ganea at ubisoft.com>
  Date:   2020-11-12 (Thu, 12 Nov 2020)

  Changed paths:
    M lld/Common/ErrorHandler.cpp
    M lld/include/lld/Common/Driver.h
    M lld/test/COFF/arm-thumb-branch20-error.s
    M lld/test/COFF/comdat-selection.s
    M lld/test/COFF/delayimports-error.test
    M lld/test/COFF/driver-windows.test
    M lld/test/COFF/driver.test
    M lld/test/COFF/export-limit.s
    M lld/test/COFF/failifmismatch.test
    M lld/test/COFF/invalid-obj.test
    M lld/test/COFF/invalid-section-number.test
    M lld/test/COFF/linkenv.test
    M lld/test/COFF/manifestinput-error.test
    M lld/test/COFF/merge.test
    M lld/test/COFF/pdata-arm64-bad.yaml
    M lld/test/COFF/precomp-link.test
    M lld/test/COFF/thin-archive.s
    M lld/test/COFF/thunk-replace.s
    M lld/tools/lld/lld.cpp
    M llvm/include/llvm/Support/CrashRecoveryContext.h
    M llvm/include/llvm/Support/Process.h
    M llvm/lib/Support/CrashRecoveryContext.cpp
    M llvm/lib/Support/Process.cpp

  Log Message:
  -----------
  [LLD][COFF] When using LLD-as-a-library, always prevent re-entrance on failures

This is a follow-up for D70378 (Cover usage of LLD as a library).

While debugging an intermittent failure on a bot, I recalled this scenario which
causes the issue:

1.When executing lld/test/ELF/invalid/symtab-sh-info.s L45, we reach
  lld::elf::Obj-File::ObjFile() which goes straight into its base ELFFileBase(),
  then ELFFileBase::init().
2.At that point fatal() is thrown in lld/ELF/InputFiles.cpp L381, leaving a
  half-initialized ObjFile instance.
3.We then end up in lld::exitLld() and since we are running with LLD_IN_TEST, we
  hapily restore the control flow to CrashRecoveryContext::RunSafely() then back
  in lld::safeLldMain().
4.Before this patch, we called errorHandler().reset() just after, and this
  attempted to reset the associated SpecificAlloc<ObjFile<ELF64LE>>. That tried
  to free the half-initialized ObjFile instance, and more precisely its
  ObjFile::dwarf member.

Sometimes that worked, sometimes it failed and was catched by the
CrashRecoveryContext. This scenario was the reason we called
errorHandler().reset() through a CrashRecoveryContext.

But in some rare cases, the above repro somehow corrupted the heap, creating a
stack overflow. When the CrashRecoveryContext's filter (that is,
__except (ExceptionFilter(GetExceptionInformation()))) tried to handle the
exception, it crashed again since the stack was exhausted -- and that took the
whole application down. That is the issue seen on the bot. Locally it happens
about 1 times out of 15.

Now this situation can happen anywhere in LLD. Since catching stack overflows is
not a reliable scenario ATM when using CrashRecoveryContext, we're now
preventing further re-entrance when such failures occur, by signaling
lld::SafeReturn::canRunAgain=false. When running with LLD_IN_TEST=2 (or above),
only one iteration will be executed, instead of two.

Differential Revision: https://reviews.llvm.org/D88348