[PATCH] D88348: [LLD][COFF] When using LLD-as-a-library, always prevent re-entrance on failures

Fri Sep 25 17:57:00 PDT 2020

aganea created this revision.
aganea added reviewers: MaskRay, rnk, amccarth.
aganea added a project: lld.
Herald added subscribers: llvm-commits, hiraditya.
Herald added a project: LLVM.
aganea requested review of this revision.

This is a follow-up for D70378 <https://reviews.llvm.org/D70378> (Cover usage of LLD as a library).

I'm increasingly convinced that it is impossible to support re-entrance once a failure occurs in LLD, even if that currently works on the surface.

While debugging an intermittent failure <http://lab.llvm.org:8011/builders/lld-x86_64-win/builds/2641/steps/test-check-all/logs/FAIL%3A%20lld%3A%3Asymtab-sh-info.s> on bot, I recalled this scenario which causes the issue:

1. When executing `lld/test/ELF/invalid/symtab-sh-info.s` L45, we reach `lld::elf::ObjFile::ObjFile()` which goes straight into its base `ELFFileBase()`, then `ELFFileBase::init()`.
2. At that point `fatal()` is thrown in `lld/ELF/InputFiles.cpp` L381, leaving a //half-initialized// `ObjFile` instance.
3. We end up in `lld::exitLld()` and since we are running with `LLD_IN_TEST`, we hapily restore the control flow to `CrashRecoveryContext::RunSafely()` then back in `lld::safeLldMain()`.
4. Before this patch, we called `errorHandler().reset()` just after, and this attempted to reset the associated `SpecificAlloc<ObjFile<ELF64LE>>`. That tried to free the //half-initialized// `ObjFile` instance, and more precisely its `ObjFile::dwarf` member.

Sometimes that worked, sometimes it failed and was catched by the `CrashRecoveryContext`. This scenario was the reason we called `errorHandler().reset()` through a `CrashRecoveryContext`.

But in some rare cases, the above repro somehow corrupted the heap, creating a stack overflow. When the `CrashRecoveryContext`'s filter (that is, `__except (ExceptionFilter(GetExceptionInformation()))`) tried to handle the exception, it crashed again since the stack was exhausted -- and that took the whole application down. That is the issue seen on the bot. Locally it happens about 1 times out of 15.

Now this situation can happen anywhere in LLD. Since catching **stack overflows** is not a reliable scenario when using `CrashRecoveryContext`, we're now preventing further re-entrance when such failures occur, by signaling `lld::SafeReturn::canRunAgain=false`. When running with `LLD_IN_TEST=2`, only one iteration will be executed in this case.

Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D88348

Files:
  lld/Common/ErrorHandler.cpp
  lld/include/lld/Common/Driver.h
  lld/include/lld/Common/ErrorHandler.h
  lld/tools/lld/lld.cpp
  llvm/include/llvm/Support/CrashRecoveryContext.h
  llvm/include/llvm/Support/Process.h
  llvm/lib/Support/CrashRecoveryContext.cpp
  llvm/lib/Support/Process.cpp

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D88348.294460.patch
Type: text/x-patch
Size: 7332 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20200926/243ca040/attachment.bin>