[PATCH] D109506: [RFC] Print current request context along with the stack trance in clangd

Sam McCall via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Thu Sep 9 10:35:40 PDT 2021


sammccall added a comment.

Thanks for doing this, I think this is incredibly useful for debugging.
But also subtle, so please forgive a bunch of comments!

I don't think we're strictly in defined-behavior territory in much of what these signal handlers are doing, but neither is clang's existing crash handlers and they seem to work well (main difference is the thread-local access)

Also discussed this a bunch with @kadircet offline.



================
Comment at: clang-tools-extra/clangd/JSONTransport.cpp:114
       if (readRawMessage(JSON)) {
+        ThreadSignalHandler ScopedHandler([JSON]() {
+          auto &OS = llvm::errs();
----------------
this is capturing by value, I don't think it needs to


================
Comment at: clang-tools-extra/clangd/JSONTransport.cpp:117
+          OS << "Signalled while processing message:\n";
+          OS << JSON << "\r\n";
+        });
----------------
why \r?


================
Comment at: clang-tools-extra/clangd/TUScheduler.cpp:596
+  /// Called by thread local signal handler.
+  void printRequestContextOnSignal() const;
 
----------------
naming: "context" is a bit vague (and overloaded in clangd), and we often use "dump" for things in a debugging context.

Maybe `dumpCurrentRequest()`?


================
Comment at: clang-tools-extra/clangd/TUScheduler.cpp:1380
+void ASTWorker::printRequestContextOnSignal() const {
+  auto &OS = llvm::errs();
+  OS << "Signalled during AST action:\n";
----------------
i'd probably pass this in as a param, just to move the "we're printing to stderr" and "we're running in a signal handler" decisions closer together.


================
Comment at: clang-tools-extra/clangd/TUScheduler.cpp:1381
+  auto &OS = llvm::errs();
+  OS << "Signalled during AST action:\n";
+  auto &Command = FileInputs.CompileCommand;
----------------
the name of the action is available as CurrentRequest->Action, which I think is safe to either pass into this function or read directly here.


================
Comment at: clang-tools-extra/clangd/TUScheduler.cpp:1383
+  auto &Command = FileInputs.CompileCommand;
+  OS << "Filename: " << Command.Filename << "\n";
+  OS << "Directory: " << Command.Directory << "\n";
----------------
nit: please indent the lines after the first to set this apart from other crash output a bit


================
Comment at: clang-tools-extra/clangd/TUScheduler.cpp:1392
+  OS << "Contents:\n";
+  OS << FileInputs.Contents << "\n";
+}
----------------
hmm, this is potentially really useful (could gather full reproducers) but also is going to dominate the output and make it annoying to see the stack traces/other details. I'm worried this may make this feature net-negative for a majority of users, who won't find the right parts of the output to report...

How critical do you think this is for debugging crashes in practice? (I suspect fairly important!)
Since we're taking liberties with async-signal-safety anyway, it might be possible to get this sent to a tempfile, which might be more useful. In any case, we might want full file dumps to be off by default outside of environments where anyone is likely to actually try to get the reproducers.

I'd probably suggest leaving this out of the initial patch entirely so we can focus on getting the rest of the mechanism right.


================
Comment at: clang-tools-extra/clangd/TUScheduler.cpp:1392
+  OS << "Contents:\n";
+  OS << FileInputs.Contents << "\n";
+}
----------------
sammccall wrote:
> hmm, this is potentially really useful (could gather full reproducers) but also is going to dominate the output and make it annoying to see the stack traces/other details. I'm worried this may make this feature net-negative for a majority of users, who won't find the right parts of the output to report...
> 
> How critical do you think this is for debugging crashes in practice? (I suspect fairly important!)
> Since we're taking liberties with async-signal-safety anyway, it might be possible to get this sent to a tempfile, which might be more useful. In any case, we might want full file dumps to be off by default outside of environments where anyone is likely to actually try to get the reproducers.
> 
> I'd probably suggest leaving this out of the initial patch entirely so we can focus on getting the rest of the mechanism right.
another thought along the lines of reproducers (but nothing for this patch)...

If we're working on a file and have a preamble, then the headers that make up the preamble are relevant. In practice it's hard to reproduce bug reports we get because we need all their headers. If we're using a preamble then just dumping all the filenames it includes would make it possible for some tool to take the crash report and zip up a reproducer.

(This probably needs the ability to run all of the handlers for the current thread, not just the top of the stack, since we acquire the relevant preamble later)


================
Comment at: clang-tools-extra/clangd/support/ThreadSignalHandler.cpp:13
+
+using namespace clang::clangd;
+
----------------
style nit: we tend do wrap the code in `namespace clang { namespace clangd {` instead, at least within clangd


================
Comment at: clang-tools-extra/clangd/support/ThreadSignalHandler.cpp:15
+
+static llvm::sys::ThreadLocal<ThreadSignalHandlerCallback> CurrentCallback;
+
----------------
we assume `thread_local` in clangd (see support/Context.ccp)

It's possible that llvm::sys::ThreadLocal happens to play nicer with signal handlers in practice (none of these are completely safe!), but unless we've seen some concrete evidence I'd rather just have `static thread_local ThreadSignalHandlerCallback*` here


================
Comment at: clang-tools-extra/clangd/support/ThreadSignalHandler.cpp:15
+
+static llvm::sys::ThreadLocal<ThreadSignalHandlerCallback> CurrentCallback;
+
----------------
sammccall wrote:
> we assume `thread_local` in clangd (see support/Context.ccp)
> 
> It's possible that llvm::sys::ThreadLocal happens to play nicer with signal handlers in practice (none of these are completely safe!), but unless we've seen some concrete evidence I'd rather just have `static thread_local ThreadSignalHandlerCallback*` here
we can quite easily support a stack of handlers, such that we dump *all* living handlers on the crashing thread.

just give each ThreadSignalHandler a `ThreadSignalHandler* Next = nullptr` member, and both the constructor and destructor `swap(Next, CurrentCallback)`. Then CurrentCallback points to the head of a linked list of handlers.

Not sure if you want to have that in this patch, but it's pretty simple so I wouldn't object.


================
Comment at: clang-tools-extra/clangd/support/ThreadSignalHandler.cpp:17
+
+static void registerSignalHandlerIfNeeded() {
+  static llvm::once_flag RegisterOnceFlag;
----------------
A library may create a ThreadSignalHandler to dump context which may in turn install a global signal handler, which can be a surprising side-effect.
In particular, we do have users that embed ClangdServer in different environments (not via ClangdMain) and don't use the LLVM crash-handlers.

I think the cleanest thing would be to have ThreadSignalHandler.cpp expose a function `RunThreadCrashHandlers()` or so, which is basically PrintInputs, and have ClangdMain.cpp call llvm::sys::AddSignalHandler near where it calls `PrintStackTraceOnErrorSignal` today


================
Comment at: clang-tools-extra/clangd/support/ThreadSignalHandler.cpp:22
+      ThreadSignalHandlerCallback *Callback = CurrentCallback.get();
+      // TODO: ignore non thread local signals like SIGTERM
+      if (Callback) {
----------------
I think we should do this already, probably just starting with an allowlist of `SIGBUS, SIGFPE, SIGILL, SIGSEGV` (these are signals that are delivered to the thread that triggered them per the linux docs).

I'm also wondering how portable this assumption/mechanism is, e.g. whether we should enable it for certain platforms only. I guess it's probably true for unixy things, and apparently mostly true for windows too[1]? I'm a little concerned *something* may go wrong on windows, though.

[1] https://stackoverflow.com/questions/15879342/which-thread-are-signal-handlers-e-g-signalsigint-crtl-c-called-on


================
Comment at: clang-tools-extra/clangd/support/ThreadSignalHandler.cpp:36
+  registerSignalHandlerIfNeeded();
+  CurrentCallback.set(&Callback);
+}
----------------
we could consider `atomic_signal_fence` around these mutations, or making CurrentCallback atomic.

It doesn't get us out of all possible write-reordering problems (we could see inconsistent states of the data referred to by the handler) but may avoid some catastrophic failures


================
Comment at: clang-tools-extra/clangd/support/ThreadSignalHandler.h:7
+//
+//===----------------------------------------------------------------------===//
+
----------------
Either here in the file comment or on the ThreadSignalHandler class we should definitely have an overview of what this is for/how it's used.


================
Comment at: clang-tools-extra/clangd/support/ThreadSignalHandler.h:16
+
+class ThreadSignalHandler {
+public:
----------------
I understand framing this as general signal handler machinery, but I think it might be easier to reason about if we give it a name more specific to its purpose. Like maybe `ThreadCrashReporter`?

The main reason is that in the general case signal handling calls to mind a bunch of concerns like async safety that we're able to be a bit cavalier about in the specific case where the signal is (probably) handled synchronously, we're crashing anyway, etc. We also don't really want to bother exposing info like *which* signal is being handled, but a "signal handler" interface looks strange without it.


================
Comment at: clang-tools-extra/clangd/support/ThreadSignalHandler.h:21
+
+  ThreadSignalHandler(ThreadSignalHandler &&);
+  ThreadSignalHandler(const ThreadSignalHandler &) = delete;
----------------
I'm not quite clear on why we need to support move-construction (and the move-constructor's interaction with static state!)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D109506/new/

https://reviews.llvm.org/D109506



More information about the cfe-commits mailing list