[llvm] [WebAssembly] Support multiple `.init_array` fragments when writing Wasm objects (PR #111008)
via llvm-commits
llvm-commits at lists.llvm.org
Thu Oct 3 08:42:42 PDT 2024
llvmbot wrote:
<!--LLVM PR SUMMARY COMMENT-->
@llvm/pr-subscribers-backend-webassembly
@llvm/pr-subscribers-mc
Author: George Stagg (georgestagg)
<details>
<summary>Changes</summary>
For WebAssembly objects, only functions listed in the first symbol in the `.init_array` section are invoked.
Consider the C program:
```c
#import <stdio.h>
void init1() { puts("init1"); }
void init2() { puts("init2"); }
void init3() { puts("init3"); }
typedef void (*f)(void);
__attribute__((section(".init_array"))) void (*p_init1)(void) = &init1;
__attribute__((section(".init_array"))) f p_init2[2] = { &init2, &init3 };
int main() {
puts("main");
return 0;
}
```
Compiling for Linux results in each function running before `main()`. But for WebAssembly (here via Emscripten) only the first function is invoked:
```
$ clang-20 -c test.c -o test.o
$ clang-20 test.o -o test
$ ./test
init1
init2
init3
main
$ EM_LLVM_ROOT=/lib/llvm-20/bin emcc -c test.c -o test.o
$ EM_LLVM_ROOT=/lib/llvm-20/bin emcc test.o -o test.js
$ node test.js
init1
main
```
In previous LLVM versions (<=18, I think?) this also used to throw an error message: `fatal error: error in backend: only one .init_array section fragment supported`.
This makes things difficult for [certain tools](https://github.com/dtolnay/inventory) that rely on writing multiple objects in the `.init_array` section for life-before-main initialisation functions.
----
This PR makes two changes to `WasmObjectWriter`:
1) The existing function that converts entries in the `.init_array` section into start functions, (adding them to an `InitFuncs` vector for writing out later), now loops through all the subsequent fragments of the section until there are none remaining.
This picks up any functions listed in later objects in the `.init_array` section, and they are now successfully invoked before main.
However, a related issue is revealed by changing the functions in the example above to read:
```c
void init1() { puts("1"); }
void init2() { puts("2"); }
void init3() { puts("3"); }
```
Then, with our patched LLVM,
```
-> $ EM_LLVM_ROOT=/[...]/llvm/bin emcc test.c -o test.js
wasm-ld: error: [...]/test_0.o: invalid data symbol offset: `p_init2` (offset: 4 segment size: 2)
emcc: error: '/[...]/llvm/bin/wasm-ld -o test.wasm [...]
```
The symbols in the `.init_array` section are written to the [`WASM_SYMBOL_TABLE` in the custom `linking` section](https://github.com/WebAssembly/tool-conventions/blob/main/Linking.md#linking-metadata-section), with kind `WASM_SYMBOL_TYPE_DATA`. But, these symbols do not reference content in the DATA section, and so the written offset makes no sense.
Is this a mistake? AFAICT I don't think there's an appropriate kind for these symbols to be written to the `WASM_SYMBOL_TABLE` subsection with. In fact, symbols for the functions referenced by the `.init_array` sections are independently written to a different `WASM_INIT_FUNC` subsection in any case.
Interestingly, you can also reveal this problem in an unpatched LLVM by writing a Wasm object file that has no data section but does supply an `.init_array` function:
```c
void init1() {}
__attribute__((section(".init_array"))) void (*p_init1)(void) = &init1;
int main() { return 0; }
```
```
$ clang-20 bug.c -o bug
$ ./bug
$ EM_LLVM_ROOT=/lib/llvm-20/bin emcc bug.c -o bug.js
wasm-ld: error: /[...]/bug_0.o: invalid data segment index: 0
```
---
So, this leads me to the related second change:
2) Do not write symbols that have been defined in the `.init_array` section to the `WASM_SYMBOL_TABLE` subsection of the `linking` custom section, because these are handled elsewhere and written to the `WASM_INIT_FUNCS` subsection instead.
With the two changes above, the modified test program above compiles with Emscripten and runs on Wasm with the expected behaviour:
```
$ EM_LLVM_ROOT=/[...]/llvm/bin emcc test.c -o test.js
$ node test.js
1
2
3
main
```
Finally, I have tried to include a unit test. Hopefully it looks reasonable -- it's based on copying similar test files in that directory.
---
Full diff: https://github.com/llvm/llvm-project/pull/111008.diff
2 Files Affected:
- (modified) llvm/lib/MC/WasmObjectWriter.cpp (+52-42)
- (added) llvm/test/MC/WebAssembly/init-array.ll (+56)
``````````diff
diff --git a/llvm/lib/MC/WasmObjectWriter.cpp b/llvm/lib/MC/WasmObjectWriter.cpp
index f25dc92fa235a2..034e058690744a 100644
--- a/llvm/lib/MC/WasmObjectWriter.cpp
+++ b/llvm/lib/MC/WasmObjectWriter.cpp
@@ -1769,6 +1769,11 @@ uint64_t WasmObjectWriter::writeOneObject(MCAssembler &Asm,
WS.setIndex(InvalidIndex);
continue;
}
+ // Contents of .init_array sections are handled elsewhere.
+ if (WS.isDefined() &&
+ WS.getSection().getName().starts_with(".init_array")) {
+ continue;
+ }
LLVM_DEBUG(dbgs() << "adding to symtab: " << WS << "\n");
uint32_t Flags = 0;
@@ -1853,49 +1858,54 @@ uint64_t WasmObjectWriter::writeOneObject(MCAssembler &Asm,
if (EmptyFrag.getKind() != MCFragment::FT_Data)
report_fatal_error(".init_array section should be aligned");
- const MCFragment &AlignFrag = *EmptyFrag.getNext();
- if (AlignFrag.getKind() != MCFragment::FT_Align)
- report_fatal_error(".init_array section should be aligned");
- if (cast<MCAlignFragment>(AlignFrag).getAlignment() !=
- Align(is64Bit() ? 8 : 4))
- report_fatal_error(".init_array section should be aligned for pointers");
-
- const MCFragment &Frag = *AlignFrag.getNext();
- if (Frag.hasInstructions() || Frag.getKind() != MCFragment::FT_Data)
- report_fatal_error("only data supported in .init_array section");
-
- uint16_t Priority = UINT16_MAX;
- unsigned PrefixLength = strlen(".init_array");
- if (WS.getName().size() > PrefixLength) {
- if (WS.getName()[PrefixLength] != '.')
+ const MCFragment *nextFrag = EmptyFrag.getNext();
+ while (nextFrag != nullptr) {
+ const MCFragment &AlignFrag = *nextFrag;
+ if (AlignFrag.getKind() != MCFragment::FT_Align)
+ report_fatal_error(".init_array section should be aligned");
+ if (cast<MCAlignFragment>(AlignFrag).getAlignment() !=
+ Align(is64Bit() ? 8 : 4))
report_fatal_error(
- ".init_array section priority should start with '.'");
- if (WS.getName().substr(PrefixLength + 1).getAsInteger(10, Priority))
- report_fatal_error("invalid .init_array section priority");
- }
- const auto &DataFrag = cast<MCDataFragment>(Frag);
- const SmallVectorImpl<char> &Contents = DataFrag.getContents();
- for (const uint8_t *
- P = (const uint8_t *)Contents.data(),
- *End = (const uint8_t *)Contents.data() + Contents.size();
- P != End; ++P) {
- if (*P != 0)
- report_fatal_error("non-symbolic data in .init_array section");
- }
- for (const MCFixup &Fixup : DataFrag.getFixups()) {
- assert(Fixup.getKind() ==
- MCFixup::getKindForSize(is64Bit() ? 8 : 4, false));
- const MCExpr *Expr = Fixup.getValue();
- auto *SymRef = dyn_cast<MCSymbolRefExpr>(Expr);
- if (!SymRef)
- report_fatal_error("fixups in .init_array should be symbol references");
- const auto &TargetSym = cast<const MCSymbolWasm>(SymRef->getSymbol());
- if (TargetSym.getIndex() == InvalidIndex)
- report_fatal_error("symbols in .init_array should exist in symtab");
- if (!TargetSym.isFunction())
- report_fatal_error("symbols in .init_array should be for functions");
- InitFuncs.push_back(
- std::make_pair(Priority, TargetSym.getIndex()));
+ ".init_array section should be aligned for pointers");
+
+ const MCFragment &Frag = *AlignFrag.getNext();
+ nextFrag = Frag.getNext();
+ if (Frag.hasInstructions() || Frag.getKind() != MCFragment::FT_Data)
+ report_fatal_error("only data supported in .init_array section");
+
+ uint16_t Priority = UINT16_MAX;
+ unsigned PrefixLength = strlen(".init_array");
+ if (WS.getName().size() > PrefixLength) {
+ if (WS.getName()[PrefixLength] != '.')
+ report_fatal_error(
+ ".init_array section priority should start with '.'");
+ if (WS.getName().substr(PrefixLength + 1).getAsInteger(10, Priority))
+ report_fatal_error("invalid .init_array section priority");
+ }
+ const auto &DataFrag = cast<MCDataFragment>(Frag);
+ const SmallVectorImpl<char> &Contents = DataFrag.getContents();
+ for (const uint8_t *
+ P = (const uint8_t *)Contents.data(),
+ *End = (const uint8_t *)Contents.data() + Contents.size();
+ P != End; ++P) {
+ if (*P != 0)
+ report_fatal_error("non-symbolic data in .init_array section");
+ }
+ for (const MCFixup &Fixup : DataFrag.getFixups()) {
+ assert(Fixup.getKind() ==
+ MCFixup::getKindForSize(is64Bit() ? 8 : 4, false));
+ const MCExpr *Expr = Fixup.getValue();
+ auto *SymRef = dyn_cast<MCSymbolRefExpr>(Expr);
+ if (!SymRef)
+ report_fatal_error(
+ "fixups in .init_array should be symbol references");
+ const auto &TargetSym = cast<const MCSymbolWasm>(SymRef->getSymbol());
+ if (TargetSym.getIndex() == InvalidIndex)
+ report_fatal_error("symbols in .init_array should exist in symtab");
+ if (!TargetSym.isFunction())
+ report_fatal_error("symbols in .init_array should be for functions");
+ InitFuncs.push_back(std::make_pair(Priority, TargetSym.getIndex()));
+ }
}
}
diff --git a/llvm/test/MC/WebAssembly/init-array.ll b/llvm/test/MC/WebAssembly/init-array.ll
new file mode 100644
index 00000000000000..d19ddcf10d5d9c
--- /dev/null
+++ b/llvm/test/MC/WebAssembly/init-array.ll
@@ -0,0 +1,56 @@
+; RUN: llc -mcpu=mvp -filetype=obj %s -o - | obj2yaml | FileCheck %s
+
+target triple = "wasm32-unknown-unknown"
+
+ at p_init1 = hidden global ptr @init1, section ".init_array", align 4
+ at p_init2 = hidden global ptr @init2, section ".init_array", align 4
+
+define hidden void @init1() #0 { ret void }
+define hidden void @init2() #0 { ret void }
+
+
+; CHECK: - Type: IMPORT
+; CHECK-NEXT: Imports:
+; CHECK-NEXT: - Module: env
+; CHECK-NEXT: Field: __linear_memory
+; CHECK-NEXT: Kind: MEMORY
+; CHECK-NEXT: Memory:
+; CHECK-NEXT: Minimum: 0x0
+; CHECK-NEXT: - Module: env
+; CHECK-NEXT: Field: __indirect_function_table
+; CHECK-NEXT: Kind: TABLE
+; CHECK-NEXT: Table:
+; CHECK-NEXT: Index: 0
+; CHECK-NEXT: ElemType: FUNCREF
+; CHECK-NEXT: Limits:
+; CHECK-NEXT: Minimum: 0x0
+; CHECK-NEXT: - Type: FUNCTION
+; CHECK-NEXT: FunctionTypes: [ 0, 0 ]
+; CHECK-NEXT: - Type: CODE
+; CHECK-NEXT: Functions:
+; CHECK-NEXT: - Index: 0
+; CHECK-NEXT: Locals: []
+; CHECK-NEXT: Body: 0B
+; CHECK-NEXT: - Index: 1
+; CHECK-NEXT: Locals: []
+; CHECK-NEXT: Body: 0B
+; CHECK-NEXT: - Type: CUSTOM
+; CHECK-NEXT: Name: linking
+; CHECK-NEXT: Version: 2
+; CHECK-NEXT: SymbolTable:
+; CHECK-NEXT: - Index: 0
+; CHECK-NEXT: Kind: FUNCTION
+; CHECK-NEXT: Name: init1
+; CHECK-NEXT: Flags: [ VISIBILITY_HIDDEN ]
+; CHECK-NEXT: Function: 0
+; CHECK-NEXT: - Index: 1
+; CHECK-NEXT: Kind: FUNCTION
+; CHECK-NEXT: Name: init2
+; CHECK-NEXT: Flags: [ VISIBILITY_HIDDEN ]
+; CHECK-NEXT: Function: 1
+; CHECK-NEXT: InitFunctions:
+; CHECK-NEXT: - Priority: 65535
+; CHECK-NEXT: Symbol: 0
+; CHECK-NEXT: - Priority: 65535
+; CHECK-NEXT: Symbol: 1
+; CHECK-NEXT: ...
``````````
</details>
https://github.com/llvm/llvm-project/pull/111008
More information about the llvm-commits
mailing list