[PATCH] D137381: [clang][compiler-rt] Exception escape out of an non-unwinding function is an undefined behaviour

Mon Nov 28 17:24:12 PST 2022

lebedev.ri added a reviewer: efriedma.
lebedev.ri added a comment.

@rjmccall thank you for taking a look!

In D137381#3955100 <https://reviews.llvm.org/D137381#3955100>, @rjmccall wrote:

> In D137381#3953178 <https://reviews.llvm.org/D137381#3953178>, @lebedev.ri wrote:
>
>> ping. We need to get this going, reminder, this was caught because it caused a visible miscompilation.
>
> I'm sorry, I caught COVID-19 during the LLVM conference and have not had a lot of energy recently,

I'm sorry.

> plus last week was the Thanksgiving holiday in the US.

> Please don't describe this as a miscompilation.  It may not match what you want as a user, but what the the compiler is doing is not incorrect.

Right.

> What you're doing is adding a mitigation to try to protect against a source of UB that recently affected you, which is commendable but doesn't have the same level of urgency.

I'm mainly worried that there are likely other "miscompilations" in the wild.
Also, after this, we might want to reevaluate which clang attribute does what,

================
Comment at: clang/lib/CodeGen/CGCall.cpp:5402-5411
+    if (SanOpts.has(SanitizerKind::ExceptionEscape) &&
+        ExceptionEscapeUBLastInvokeSrcLoc) {
+      llvm::Constant *CheckSourceLocation = EmitCheckSourceLocation(Loc);
+      Builder.CreateStore(
+          CheckSourceLocation,
+          Address(ExceptionEscapeUBLastInvokeSrcLoc,
+                  CheckSourceLocation->getType(),
----------------
rjmccall wrote:
> lebedev.ri wrote:
> > lebedev.ri wrote:
> > > @rjmccall
> > > So there are two issues:
> > > 1. `getInvokeDest()` isn't necessarily called just before creating an `invoke`, see e.g. `-EHa` handling in `CodeGenFunction::PopCleanupBlock()`
> > > 2. Even if we ignore that, we need to do this for *every* `invoke`, not just those going to the our UB landing pad, consider: https://godbolt.org/z/qTeKor41a <- the invoke leads to a normal landing pad, yet we immediately rethrow the just-caught exception, and now end up in the UB landing pad.
> > > 
> > > So i'm not really seeing an alternative path here?
> > @rjmccall do you agree with my reasoning here?
> > Perhaps there is some other solution that i'm not seeing,
> > other than not having the source location?
> I think there are four interesting questions posed by your example.
> 
> The first is whether we should make an effort to report the original source location for the call to `maybe_throws` instead of the rethrow location.  To me, the answer is no, and the main reason is that the user is explicitly acknowledging the possibility of an exception and is (probably) trying to handle it.  That means that the proximate source of programmer error is now that their attempt to handle it failed, not that an exception was raised in the first place.  That might seem weird for your example specifically, 
> 
> The second is what source location we should report for invokes that don't have an obvious source location.  The answer is that I think we should always try to have a source location, even if it might be an imperfect one.  Synthesized code almost always has a source location that triggered the synthesis.  Cleanups are typically associated with some variable or temporary that itself should have a source location.  Even things like `__cxa_end_catch` are associated with the `catch` clause.  Hopefully UBSan  has a flexible enough schema in how source locations are expressed to it for us to communicate these things down to the runtime, like "synthesized code at Foo.cpp:18" or "cleanup for variable at Bar.cpp:107".
> 
> The third is the more general question of whether we need to store a source location before every `invoke`.  To ensure that this variable is meaningfully initialized, we need to store a source location before every invoke that can lead to a UB scope.  But the variable is stateful, so it's equally important that we not emit stores that might overwrite the source location that we actually want to report.  For example, we want the source location reported in your example to be the source location for the `throw;`, not something associated with the `__cxa_end_catch` cleanup.  This basically boils down to not doing stores within a landing pad before it enters a `catch` handler.  Fortunately, landing pad code like that should always be in a terminate scope, so I think you can just look at the current EH stack and see whether the landing pad can possibly reach a UB scope.  (You might need some save-and-restore logic around `@finally` blocks in ObjC.)
> 
> The final question is whether we need to emit cleanups on paths that lead to UB scopes at all.  In the simplest cases, I think we don't.  The standard says that it's implementation-specified whether the stack is unwound on an exception path that leads to termination.  That rule doesn't technically apply here, but only because we're dealing with a language extension that makes unwinding UB in the first place.  Since we have room to choose the semantics, the standard gives us pretty strong cover to be just as aggressive about skipping cleanups in contexts where throwing is UB as it is in contexts where throwing terminates.  That, in turn, means that the landing pad in these contexts will usually just be a UB scope with no cleanups in it.  So even if we did need to worry about cleanups overwriting the source location, it usually won't be possible because we won't be running cleanups.  In fact, if there's a catch-all or a terminate scope, the UB scope won't be reachable, so the only situation where we have to worry about cleanups being run before we reach a UB scope is if there are cleanups inside a *partial* catch clause, like `catch (const std::exception &e)`.  And in principle we have the information up front from the EH selector to know whether we're going to end up in a catch handler, so we could actually just check immediately at the start of the landing pad and then trigger UBSan.
> The final question is whether we need to emit cleanups on paths that lead to UB scopes at all. In the simplest cases, I think we don't. The standard says that it's implementation-specified whether the stack is unwound on an exception path that leads to termination. That rule doesn't technically apply here, but only because we're dealing with a language extension that makes unwinding UB in the first place. Since we have room to choose the semantics, the standard gives us pretty strong cover to be just as aggressive about skipping cleanups in contexts where throwing is UB as it is in contexts where throwing terminates. That, in turn, means that the landing pad in these contexts will usually just be a UB scope with no cleanups in it. So even if we did need to worry about cleanups overwriting the source location, it usually won't be possible because we won't be running cleanups. In fact, if there's a catch-all or a terminate scope, the UB scope won't be reachable, so the only situation where we have to worry about cleanups being run before we reach a UB scope is if there are cleanups inside a *partial* catch clause, like catch (const std::exception &e). And in principle we have the information up front from the EH selector to know whether we're going to end up in a catch handler, so we could actually just check immediately at the start of the landing pad and then trigger UBSan.

The more i stare at this. The more worried i am.
Is the idea that, when we are in termination/UB EH scope,
we no longer have to play by the RAII rules,
and destructors, for the moment, are no-ops and are side-effect free?

Consider:
```
#include <cstdlib>
#include <cstdio>

void will_throw() { throw int(42); }
void might_throw() { }

struct abort_on_dtor {
  abort_on_dtor() {}
  [[noreturn]] ~abort_on_dtor() { printf("in ~abort_on_dtor()\n"); abort(); }
};

void test0() noexcept {
  abort_on_dtor scope;
  will_throw();
  might_throw();
}

__attribute__ ((pure)) void test1() {
  abort_on_dtor scope;
  will_throw();
  might_throw();
}

int main(int argc, char* argv[]) {
  printf("argc = %i\n", argc);
  if(argc == 1)
    abort_on_dtor scope;
  if(argc == 2)
    test0();
  if(argc == 3)
    test1();
  return 0;
}
```
You'd think, if call to `test0()`/`test1()` unwinds,
we'd first destruct `abort_on_dtor`, and termination/UB no longer happens.
But i guess termination/UB time-traveled/"happened" the moment `test0()`/`test1()` unwinds?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D137381/new/

https://reviews.llvm.org/D137381