[llvm] [WebAssembly] Don't return multivalue for invokes (PR #86048)

Heejin Ahn via llvm-commits llvm-commits at lists.llvm.org
Wed Mar 20 17:31:21 PDT 2024


https://github.com/aheejin created https://github.com/llvm/llvm-project/pull/86048

Emscripten EH/SjLj uses invoke wrappers in JS, such as `invoke_vi`, from which user functions are called indirectly, to emulate exceptions and setjmp/longjmp.

But in case the invoked function returns multiple values or a 128-bit value, its the JS invoke wrappers cannot return multivalue because JS doesn't support that. So we should not enable multivalue returns for the JS invoke wrappers and also the functions called by those JS wrappers because their signature has to match with the JS wrapper. For example, if `func` returns `{i32, i32}` and we have
``ll
  invoke {i32, i32} @func() ...
```
while LowerEmscriptenEHSjLj will lower it down to something like
```ll
  %0 = call { i32, i32 } @"__invoke_{i32.i32}_void"(ptr @func)
  ...
```
we should eventually lower both the invoke wrapper (whose name will be
changed later to `invoke_vi`) and `func` down to a signature that
indirectly returns multiple values by memory parameter, because JS
invoke wrappers do support multiple return values.

So we need to disable multivalue returns for JS invoke wrappers and
functions called by them. I think we have three ways to do that:

1. Make a set and add all functions that are invoked by JS invoke
   wrappers in LowerEmscriptenEHSjLj and pass it to the backend using a
   auxiliary data structure. We have a precedent of this kind of
   structure already, which is used for Wasm EH:
   https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/CodeGen/WasmEHFuncInfo.h
   We can even add this set to `WasmEHFuncInfo` maybe, given that this
   is also a way of handling exceptions in Wasm.
   - Pros: Most precise
   - Cons: Auxiliary structure needed

2. Unless we record the invoked functions in LowerEmscriptenEHSjLj like
   1, we don't have a way of precisely knowing the set of invoked
   functions. But they are all indirectly invoked, so in the backend we
   can check whether a given function is ever indirectly used (i.e., its
   pointer taken) by traversing its `users()` in the IR and if it is,
   we don't allow multivalue returns for them.
   - Pros: No auxiliary structure
   - Cons: IR checking overhead. More conservative than 1.

3. Disallow all multivalue returns when Emscripten EH or SjLj is
   enabled.
   - Pros: Simplest
   - Cons: Most conservative.

This PR is doing 3. While it is the most conservative and possibly
disallowing multivalue returns from more functions than needed, I chose
this because it is the simplest, and given that hopefully more people
will adopt Wasm EH going forward, I don't think there will be many
people who would use multivalue and Emscripten EH/SjLj together and want
the whatever performance benefit that multivalue return can bring, given
that Emscripten EH/SjLj has already a huge performance cost.

This is separate from whether we should make the multivalue return
dependent on the multivalue feature or something else like a clang flag,
which is being partly discussed in
https://github.com/WebAssembly/tool-conventions/pull/223.
Whichever way we decide on that front, we still need to disable
multivalue returns in case Emscripten EH/SjLj is used.

>From 21371442bd130c8f1eab0ff985415ced42ffe0d6 Mon Sep 17 00:00:00 2001
From: Heejin Ahn <aheejin at gmail.com>
Date: Wed, 6 Mar 2024 09:41:50 +0000
Subject: [PATCH] [WebAssembly] Don't return multivalue for invokes

Emscripten EH/SjLj uses invoke wrappers in JS, such as `invoke_vi`, from
which user functions are called indirectly, to emulate exceptions and
setjmp/longjmp.

But in case the invoked function returns multiple values or a 128-bit
value, its the JS invoke wrappers cannot return multivalue because JS
doesn't support that. So we should not enable multivalue returns for the
JS invoke wrappers and also the functions called by those JS wrappers
because their signature has to match with the JS wrapper. For example,
if `func` returns `{i32, i32}` and we have
``ll
  invoke {i32, i32} @func() ...
```
while LowerEmscriptenEHSjLj will lower it down to something like
```ll
  %0 = call { i32, i32 } @"__invoke_{i32.i32}_void"(ptr @func)
  ...
```
we should eventually lower both the invoke wrapper (whose name will be
changed later to `invoke_vi`) and `func` down to a signature that
indirectly returns multiple values by memory parameter, because JS
invoke wrappers do support multiple return values.

So we need to disable multivalue returns for JS invoke wrappers and
functions called by them. I think we have three ways to do that:

1. Make a set and add all functions that are invoked by JS invoke
   wrappers in LowerEmscriptenEHSjLj and pass it to the backend using a
   auxiliary data structure. We have a precedent of this kind of
   structure already, which is used for Wasm EH:
   https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/CodeGen/WasmEHFuncInfo.h
   We can even add this set to `WasmEHFuncInfo` maybe, given that this
   is also a way of handling exceptions in Wasm.
   - Pros: Most precise
   - Cons: Auxiliary structure needed

2. Unless we record the invoked functions in LowerEmscriptenEHSjLj like
   1, we don't have a way of precisely knowing the set of invoked
   functions. But they are all indirectly invoked, so in the backend we
   can check whether a given function is ever indirectly used (i.e., its
   pointer taken) by traversing its `users()` in the IR and if it is,
   we don't allow multivalue returns for them.
   - Pros: No auxiliary structure
   - Cons: IR checking overhead. More conservative than 1.

3. Disallow all multivalue returns when Emscripten EH or SjLj is
   enabled.
   - Pros: Simplest
   - Cons: Most conservative.

This PR is doing 3. While it is the most conservative and possibly
disallowing multivalue returns from more functions than needed, I chose
this because it is the simplest, and given that hopefully more people
will adopt Wasm EH going forward, I don't think there will be many
people who would use multivalue and Emscripten EH/SjLj together and want
the whatever performance benefit that multivalue return can bring, given
that Emscripten EH/SjLj has already a huge performance cost.

This is separate from whether we should make the multivalue return
dependent on the multivalue feature or something else like a clang flag,
which is being partly discussed in
https://github.com/WebAssembly/tool-conventions/pull/223.
Whichever way we decide on that front, we still need to disable
multivalue returns in case Emscripten EH/SjLj is used.
---
 .../WebAssembly/WebAssemblyAsmPrinter.cpp     | 13 +++-----
 .../WebAssembly/WebAssemblyAsmPrinter.h       |  2 +-
 .../WebAssembly/WebAssemblyISelLowering.cpp   |  4 +--
 .../WebAssemblyMachineFunctionInfo.cpp        | 11 ++++---
 .../WebAssembly/WebAssemblyUtilities.cpp      | 14 ++++++++
 .../Target/WebAssembly/WebAssemblyUtilities.h |  5 +++
 .../lower-em-ehsjlj-multi-return.ll           | 33 ++++++++++++++++---
 7 files changed, 61 insertions(+), 21 deletions(-)

diff --git a/llvm/lib/Target/WebAssembly/WebAssemblyAsmPrinter.cpp b/llvm/lib/Target/WebAssembly/WebAssemblyAsmPrinter.cpp
index 03897b551ecaee..c6c7bf1cacda40 100644
--- a/llvm/lib/Target/WebAssembly/WebAssemblyAsmPrinter.cpp
+++ b/llvm/lib/Target/WebAssembly/WebAssemblyAsmPrinter.cpp
@@ -151,19 +151,14 @@ static std::string getEmscriptenInvokeSymbolName(wasm::WasmSignature *Sig) {
 //===----------------------------------------------------------------------===//
 
 MCSymbolWasm *WebAssemblyAsmPrinter::getMCSymbolForFunction(
-    const Function *F, bool EnableEmEH, wasm::WasmSignature *Sig,
+    const Function *F, bool EnableEmEHSjLj, wasm::WasmSignature *Sig,
     bool &InvokeDetected) {
   MCSymbolWasm *WasmSym = nullptr;
-  if (EnableEmEH && isEmscriptenInvokeName(F->getName())) {
+  if (EnableEmEHSjLj && isEmscriptenInvokeName(F->getName())) {
     assert(Sig);
     InvokeDetected = true;
-    if (Sig->Returns.size() > 1) {
-      std::string Msg =
-          "Emscripten EH/SjLj does not support multivalue returns: " +
-          std::string(F->getName()) + ": " +
-          WebAssembly::signatureToString(Sig);
-      report_fatal_error(Twine(Msg));
-    }
+    // When Emscripten EH/SjLj is enabled, we don't enable multivalue returns
+    assert(Sig->Returns.size() <= 1);
     WasmSym = cast<MCSymbolWasm>(
         GetExternalSymbolSymbol(getEmscriptenInvokeSymbolName(Sig)));
   } else {
diff --git a/llvm/lib/Target/WebAssembly/WebAssemblyAsmPrinter.h b/llvm/lib/Target/WebAssembly/WebAssemblyAsmPrinter.h
index c30e0155c81e72..4892df1f22485d 100644
--- a/llvm/lib/Target/WebAssembly/WebAssemblyAsmPrinter.h
+++ b/llvm/lib/Target/WebAssembly/WebAssemblyAsmPrinter.h
@@ -81,7 +81,7 @@ class LLVM_LIBRARY_VISIBILITY WebAssemblyAsmPrinter final : public AsmPrinter {
   MVT getRegType(unsigned RegNo) const;
   std::string regToString(const MachineOperand &MO);
   WebAssemblyTargetStreamer *getTargetStreamer();
-  MCSymbolWasm *getMCSymbolForFunction(const Function *F, bool EnableEmEH,
+  MCSymbolWasm *getMCSymbolForFunction(const Function *F, bool EnableEmEHSjLj,
                                        wasm::WasmSignature *Sig,
                                        bool &InvokeDetected);
   MCSymbol *getOrCreateWasmSymbol(StringRef Name);
diff --git a/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp b/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
index 905ff3b9018428..64bcadf3f5677c 100644
--- a/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
+++ b/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
@@ -1288,7 +1288,7 @@ bool WebAssemblyTargetLowering::CanLowerReturn(
     const SmallVectorImpl<ISD::OutputArg> &Outs,
     LLVMContext & /*Context*/) const {
   // WebAssembly can only handle returning tuples with multivalue enabled
-  return Subtarget->hasMultivalue() || Outs.size() <= 1;
+  return WebAssembly::canLowerReturn(Outs.size(), Subtarget);
 }
 
 SDValue WebAssemblyTargetLowering::LowerReturn(
@@ -1296,7 +1296,7 @@ SDValue WebAssemblyTargetLowering::LowerReturn(
     const SmallVectorImpl<ISD::OutputArg> &Outs,
     const SmallVectorImpl<SDValue> &OutVals, const SDLoc &DL,
     SelectionDAG &DAG) const {
-  assert((Subtarget->hasMultivalue() || Outs.size() <= 1) &&
+  assert(WebAssembly::canLowerReturn(Outs.size(), Subtarget) &&
          "MVP WebAssembly can only return up to one value");
   if (!callingConvSupported(CallConv))
     fail(DL, DAG, "WebAssembly doesn't support non-C calling conventions");
diff --git a/llvm/lib/Target/WebAssembly/WebAssemblyMachineFunctionInfo.cpp b/llvm/lib/Target/WebAssembly/WebAssemblyMachineFunctionInfo.cpp
index 1e959111a4dbcb..2c104382aa65ca 100644
--- a/llvm/lib/Target/WebAssembly/WebAssemblyMachineFunctionInfo.cpp
+++ b/llvm/lib/Target/WebAssembly/WebAssemblyMachineFunctionInfo.cpp
@@ -17,6 +17,7 @@
 #include "Utils/WebAssemblyTypeUtilities.h"
 #include "WebAssemblyISelLowering.h"
 #include "WebAssemblySubtarget.h"
+#include "WebAssemblyUtilities.h"
 #include "llvm/CodeGen/Analysis.h"
 #include "llvm/CodeGen/WasmEHFuncInfo.h"
 #include "llvm/Target/TargetMachine.h"
@@ -70,12 +71,12 @@ void llvm::computeSignatureVTs(const FunctionType *Ty,
   computeLegalValueVTs(ContextFunc, TM, Ty->getReturnType(), Results);
 
   MVT PtrVT = MVT::getIntegerVT(TM.createDataLayout().getPointerSizeInBits());
-  if (Results.size() > 1 &&
-      !TM.getSubtarget<WebAssemblySubtarget>(ContextFunc).hasMultivalue()) {
+  if (!WebAssembly::canLowerReturn(
+          Results.size(),
+          &TM.getSubtarget<WebAssemblySubtarget>(ContextFunc))) {
     // WebAssembly can't lower returns of multiple values without demoting to
-    // sret unless multivalue is enabled (see
-    // WebAssemblyTargetLowering::CanLowerReturn). So replace multiple return
-    // values with a poitner parameter.
+    // sret unless multivalue is enabled (see WebAssembly::canLowerReturn). So
+    // replace multiple return values with a poitner parameter.
     Results.clear();
     Params.push_back(PtrVT);
   }
diff --git a/llvm/lib/Target/WebAssembly/WebAssemblyUtilities.cpp b/llvm/lib/Target/WebAssembly/WebAssemblyUtilities.cpp
index ac7cf5b37fcaa4..f09700ec7c0527 100644
--- a/llvm/lib/Target/WebAssembly/WebAssemblyUtilities.cpp
+++ b/llvm/lib/Target/WebAssembly/WebAssemblyUtilities.cpp
@@ -179,3 +179,17 @@ unsigned WebAssembly::getCopyOpcodeForRegClass(const TargetRegisterClass *RC) {
     llvm_unreachable("Unexpected register class");
   }
 }
+
+bool WebAssembly::canLowerReturn(size_t ResultSize,
+                                 const WebAssemblySubtarget *Subtarget) {
+  // We don't return multivalues when either of Emscripten EH or Emscripten SjLj
+  // is used. JS invoke wrapper functions, which are used in Emscripten EH/SjLj,
+  // don't support multiple return values, and the functions indirectly called
+  // from those JS invoke wrapper functions can't be lowered using multivalue
+  // results because they have to match the signature of the JS invoke wrappers.
+  // We can come up with a precise set of functions that are called from the JS
+  // wrappers if we do some bookeeping, but given that Emscripten EH/SjLj is an
+  // old feature whose use is expected to decline, we don't think it's worth it.
+  return ResultSize <= 1 || (Subtarget->hasMultivalue() &&
+                             !WebAssembly::WasmEnableEmEH && !WasmEnableEmSjLj);
+}
diff --git a/llvm/lib/Target/WebAssembly/WebAssemblyUtilities.h b/llvm/lib/Target/WebAssembly/WebAssemblyUtilities.h
index 7f28fb1858a690..b430edba4627ae 100644
--- a/llvm/lib/Target/WebAssembly/WebAssemblyUtilities.h
+++ b/llvm/lib/Target/WebAssembly/WebAssemblyUtilities.h
@@ -63,6 +63,11 @@ MachineInstr *findCatch(MachineBasicBlock *EHPad);
 /// Returns the appropriate copy opcode for the given register class.
 unsigned getCopyOpcodeForRegClass(const TargetRegisterClass *RC);
 
+/// Returns true if the function's return value(s) can be lowered directly,
+/// i.e., not indirectly via a pointer parameter that points to the value in
+/// memory.
+bool canLowerReturn(size_t ResultSize, const WebAssemblySubtarget *Subtarget);
+
 } // end namespace WebAssembly
 
 } // end namespace llvm
diff --git a/llvm/test/CodeGen/WebAssembly/lower-em-ehsjlj-multi-return.ll b/llvm/test/CodeGen/WebAssembly/lower-em-ehsjlj-multi-return.ll
index 4f33439db770dc..a4d0908299f4dc 100644
--- a/llvm/test/CodeGen/WebAssembly/lower-em-ehsjlj-multi-return.ll
+++ b/llvm/test/CodeGen/WebAssembly/lower-em-ehsjlj-multi-return.ll
@@ -1,8 +1,8 @@
-; RUN: not --crash llc < %s -enable-emscripten-cxx-exceptions -mattr=+multivalue 2>&1 | FileCheck %s --check-prefix=EH
-; RUN: not --crash llc < %s -enable-emscripten-sjlj -mattr=+multivalue 2>&1 | FileCheck %s --check-prefix=SJLJ
+; RUN: llc < %s -enable-emscripten-cxx-exceptions -enable-emscripten-sjlj -mattr=+multivalue | FileCheck %s
 
-; Currently multivalue returning functions are not supported in Emscripten EH /
-; SjLj. Make sure they error out.
+; Even with multivalue feature enabled, we don't enable multivalue returns when
+; Emscripten EH/SjLj is used. Also the invoke wrappers' (e.g. invoke_vi)
+; signature should not change.
 
 target triple = "wasm32-unknown-unknown"
 
@@ -35,7 +35,32 @@ entry:
   unreachable
 }
 
+define void @invoke_i128() personality ptr @__gxx_personality_v0 {
+entry:
+  invoke i128 @get_i128()
+          to label %try.cont unwind label %lpad
+
+lpad:                                             ; preds = %entry
+  %1 = landingpad { ptr, i32 }
+          catch ptr null
+  %2 = extractvalue { ptr, i32 } %1, 0
+  %3 = extractvalue { ptr, i32 } %1, 1
+  %4 = call ptr @__cxa_begin_catch(ptr %2)
+  call void @__cxa_end_catch()
+  br label %try.cont
+
+try.cont:                                         ; preds = %entry, %lpad
+  ret void
+}
+
 declare {i32, i32} @foo(i32)
+; CHECK-DAG: .functype foo (i32, i32) -> ()
+; CHECK-DAG: .functype invoke_vi (i32, i32) -> ()
+
+declare i128 @get_i128()
+; CHECK-DAG: .functype get_i128 (i32) -> ()
+; CHECK-DAG: .functype invoke_vii (i32, i32, i32) -> ()
+
 declare i32 @__gxx_personality_v0(...)
 declare ptr @__cxa_begin_catch(ptr)
 declare void @__cxa_end_catch()



More information about the llvm-commits mailing list