[llvm] 78f01f6 - [WebAssembly] Ensure 'end_function' in functions

Heejin Ahn via llvm-commits llvm-commits at lists.llvm.org
Mon Jan 9 11:10:55 PST 2023


Author: Heejin Ahn
Date: 2023-01-09T11:09:35-08:00
New Revision: 78f01f69b32801fbd0c6e0724893f2dfd20302d8

URL: https://github.com/llvm/llvm-project/commit/78f01f69b32801fbd0c6e0724893f2dfd20302d8
DIFF: https://github.com/llvm/llvm-project/commit/78f01f69b32801fbd0c6e0724893f2dfd20302d8.diff

LOG: [WebAssembly] Ensure 'end_function' in functions

Local info is supposed to be emitted in the start of every function.
When there are locals, `.local` section should be present, and we emit
local info according to the section.

If there is no locals, empty local info should be emitted. This empty
local info is emitted whenever a first instruction is emitted within a
function without encountering a `.local` section. If there is no
instruction, `end_function` pseudo instruction should be present and the
empty local info will be emitted when parsing the pseudo instruction.

The following assembly is malformed because the function `test` doesn't
have an `end_function` at the end, and the parser doesn't end up
emitting the empty local info needed. But currently we don't error out
and silently produce an invalid binary.
```
.functype test () -> ()
test:
```

This patch adds one extra state to the Wasm assembly parser,
`FunctionLabel` to detect whether a function label is parsed but not
ended properly when the next function starts or the file ends.

It is somewhat tricky to distinguish `FunctionLabel` and
`FunctionStart`, because it is not always possible to ensure the state
goes from `FunctionLabel` -> `FunctionStart`. `.functype` directive does
not seem to be mandated before a function label, in which case we don't
know if the label is a function at the time of parsing. But when we do
know the label is function, we would like to ensure it ends with an
`end_function` properly. Also we would like to error out when it does
not.

For example,
```
.functype test() -> ()
test:
```
We should error out for this because we know `test` is a function and it
doesn't end with an `end_function`. This PR fixes this.

```
test:
```
We don't error out for this because there is no info that `test` is a
function, so we don't know whether there should be an `end_function` or
not.

```
test:
.functype test() -> ()
```
We error out for this currently already, because we currently switch to
`FunctionStart` state when we first see `.functype` directive after its
label definition.

Fixes https://github.com/llvm/llvm-project/issues/57427.

Reviewed By: sbc100

Differential Revision: https://reviews.llvm.org/D141103

Added: 
    llvm/test/MC/WebAssembly/func-end-errors.s

Modified: 
    llvm/lib/Target/WebAssembly/AsmParser/WebAssemblyAsmParser.cpp

Removed: 
    


################################################################################
diff  --git a/llvm/lib/Target/WebAssembly/AsmParser/WebAssemblyAsmParser.cpp b/llvm/lib/Target/WebAssembly/AsmParser/WebAssemblyAsmParser.cpp
index 9b80c32b77db6..1cba0843f8910 100644
--- a/llvm/lib/Target/WebAssembly/AsmParser/WebAssemblyAsmParser.cpp
+++ b/llvm/lib/Target/WebAssembly/AsmParser/WebAssemblyAsmParser.cpp
@@ -210,6 +210,7 @@ class WebAssemblyAsmParser final : public MCTargetAsmParser {
   // guarantee that correct order.
   enum ParserState {
     FileStart,
+    FunctionLabel,
     FunctionStart,
     FunctionLocals,
     Instructions,
@@ -287,8 +288,8 @@ class WebAssemblyAsmParser final : public MCTargetAsmParser {
     return Parser.Error(Tok.getLoc(), Msg + Tok.getString());
   }
 
-  bool error(const Twine &Msg) {
-    return Parser.Error(Lexer.getTok().getLoc(), Msg);
+  bool error(const Twine &Msg, SMLoc Loc = SMLoc()) {
+    return Parser.Error(Loc.isValid() ? Loc : Lexer.getTok().getLoc(), Msg);
   }
 
   void addSignature(std::unique_ptr<wasm::WasmSignature> &&Sig) {
@@ -336,11 +337,12 @@ class WebAssemblyAsmParser final : public MCTargetAsmParser {
     return false;
   }
 
-  bool ensureEmptyNestingStack() {
+  bool ensureEmptyNestingStack(SMLoc Loc = SMLoc()) {
     auto Err = !NestingStack.empty();
     while (!NestingStack.empty()) {
       error(Twine("Unmatched block construct(s) at function end: ") +
-            nestingString(NestingStack.back().NT).first);
+                nestingString(NestingStack.back().NT).first,
+            Loc);
       NestingStack.pop_back();
     }
     return Err;
@@ -865,12 +867,24 @@ class WebAssemblyAsmParser final : public MCTargetAsmParser {
         return true;
       auto WasmSym = cast<MCSymbolWasm>(Ctx.getOrCreateSymbol(SymName));
       if (WasmSym->isDefined()) {
-        // This .functype indicates a start of a function.
-        if (ensureEmptyNestingStack())
-          return true;
+        // We push 'Function' either when a label is parsed or a .functype
+        // directive is parsed. The reason it is not easy to do this uniformly
+        // in a single place is,
+        // 1. We can't do this at label parsing time only because there are
+        //    cases we don't have .functype directive before a function label,
+        //    in which case we don't know if the label is a function at the time
+        //    of parsing.
+        // 2. We can't do this at .functype parsing time only because we want to
+        //    detect a function started with a label and not ended correctly
+        //    without encountering a .functype directive after the label.
+        if (CurrentState != FunctionLabel) {
+          // This .functype indicates a start of a function.
+          if (ensureEmptyNestingStack())
+            return true;
+          push(Function);
+        }
         CurrentState = FunctionStart;
         LastFunctionLabel = WasmSym;
-        push(Function);
       }
       auto Signature = std::make_unique<wasm::WasmSignature>();
       if (parseSignature(Signature.get()))
@@ -1100,6 +1114,22 @@ class WebAssemblyAsmParser final : public MCTargetAsmParser {
     // Also generate DWARF for this section if requested.
     if (getContext().getGenDwarfForAssembly())
       getContext().addGenDwarfSection(WS);
+
+    if (WasmSym->isFunction()) {
+      // We give the location of the label (IDLoc) here, because otherwise the
+      // lexer's next location will be used, which can be confusing. For
+      // example:
+      //
+      // test0: ; This function does not end properly
+      //   ...
+      //
+      // test1: ; We would like to point to this line for error
+      //   ...  . Not this line, which can contain any instruction
+      ensureEmptyNestingStack(IDLoc);
+      CurrentState = FunctionLabel;
+      LastFunctionLabel = Symbol;
+      push(Function);
+    }
   }
 
   void onEndOfFunction(SMLoc ErrorLoc) {

diff  --git a/llvm/test/MC/WebAssembly/func-end-errors.s b/llvm/test/MC/WebAssembly/func-end-errors.s
new file mode 100644
index 0000000000000..dda91654c83c9
--- /dev/null
+++ b/llvm/test/MC/WebAssembly/func-end-errors.s
@@ -0,0 +1,17 @@
+# RUN: not llvm-mc -triple=wasm32-unknown-unknown %s 2>&1 | FileCheck %s
+
+# A Wasm function should always end with a 'end_function' pseudo instruction in
+# assembly. This causes the parser to properly wrap up function info when there
+# is no other instructions present.
+
+.functype test0 () -> ()
+test0:
+
+.functype test1 () -> ()
+# CHECK: [[@LINE+1]]:1: error: Unmatched block construct(s) at function end: function
+test1:
+  end_function
+
+.functype test2 () -> ()
+test2:
+# CHECK: [[@LINE+1]]:1: error: Unmatched block construct(s) at function end: function


        


More information about the llvm-commits mailing list