[llvm] [WebAssembly] Support parsing .lto_set_conditional (PR #126546)

Heejin Ahn via llvm-commits llvm-commits at lists.llvm.org
Mon Feb 10 09:09:21 PST 2025


https://github.com/aheejin created https://github.com/llvm/llvm-project/pull/126546

In the split-LTO-unit mode in ThinLTO, a compilation module is split into two and global variables that meet a [specific criteria](https://github.com/llvm/llvm-project/blob/d21fc58aeeaa7f0369a24dbe70a0360e0edbf76f/llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp#L315-L366) is moved to the second module.

And if there is an originally local-linkage global value defined in the first module and referenced in the second module or the vice versa, that value is [_promoted_](https://github.com/llvm/llvm-project/blob/d21fc58aeeaa7f0369a24dbe70a0360e0edbf76f/llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp#L46-L100) by attaching a module has ID to their names in order to prevent name clashes because now they can be referenced from other modules.

And when that promoted global value is a function, a `.lto_set_conditional` entry is written to the first module to avoid breaking references from inline assembly:
https://github.com/llvm/llvm-project/blob/d21fc58aeeaa7f0369a24dbe70a0360e0edbf76f/llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp#L84-L91

The syntax of this is, if the original function name is `a` and the module ID is `123`,
```ll
module asm ".lto_set_conditional symbolA,symbolA.123"
```
These symbols are parsed here:
https://github.com/llvm/llvm-project/blob/648981f913431749c4656268ed670677a88511f6/llvm/lib/MC/MCParser/AsmParser.cpp#L6467

The first function symbol in this `.lto_set_conditional` do not exist as a function in the bitcode anymore because it was renamed to the second. So they are not [assigned as function symbols](https://github.com/llvm/llvm-project/blob/648981f913431749c4656268ed670677a88511f6/llvm/lib/Target/WebAssembly/WebAssemblyAsmPrinter.cpp#L355) but they are not really data either, so the object writer crashes here:
https://github.com/llvm/llvm-project/blob/5b9e6c7993359c16b4d645c851bb7fe2fd7b78c7/llvm/lib/MC/WasmObjectWriter.cpp#L1820

This PR makes the object writer just skip those symbols.

---

This problem was discovered when I was testing with `-fwhole-program-vtables`.
The reason we didn't have this problem before with ThinLTO was because [`-fsplit-lto-unit`](https://clang.llvm.org/docs/UsersManual.html#cmdoption-f-no-split-lto-unit), which splits LTO units when possible, defaults to false, but it defaults to true when `-fwhole-program-vtables` is used.

>From ddc1b123b91470383b9523a12932538e66b9f6f5 Mon Sep 17 00:00:00 2001
From: Heejin Ahn <aheejin at gmail.com>
Date: Sun, 9 Feb 2025 04:51:20 +0000
Subject: [PATCH] [WebAssembly] Support parsing .lto_set_conditional

In the split-LTO-unit mode in ThinLTO, a compilation module is split
into two and global variables that meet a specific criteria is moved to
the second module.

And if there is an originally local-linkage global value defined in the
first module and referenced in the second module or the vice versa, that
value is _promoted_ by attaching a module has ID to their names in order
to prevent name clashes because now they can be referenced from other
modules.

And when that promoted global value is a function, a
`.lto_set_conditional` entry is written to the first module to avoid
breaking references from inline assembly:
https://github.com/llvm/llvm-project/blob/d21fc58aeeaa7f0369a24dbe70a0360e0edbf76f/llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp#L84-L91

The syntax of this is, if the original function name is `a` and the
module ID is `123`,
```ll
module asm ".lto_set_conditional symbolA,symbolA.123"
```
These symbols are parsed here:
https://github.com/llvm/llvm-project/blob/648981f913431749c4656268ed670677a88511f6/llvm/lib/MC/MCParser/AsmParser.cpp#L6467

The first function symbol in this `.lto_set_conditional` do not exist as
a function in the bitcode anymore because it was renamed to the second.
So they are not assigned as function symbols but they are not really
data either, so the object writer crashes here:
https://github.com/llvm/llvm-project/blob/5b9e6c7993359c16b4d645c851bb7fe2fd7b78c7/llvm/lib/MC/WasmObjectWriter.cpp#L1820

This PR makes the object writer just skip those symbols.

---

This problem was discovered when I was testing with
`-fwhole-program-vtables`.
The reason we didn't have this problem before with ThinLTO was because
`-fsplit-lto-unit`, which splits LTO units when possible, defaults to
false, but it defaults to true when `-fwhole-program-vtables` is used.
---
 llvm/lib/MC/WasmObjectWriter.cpp                | 13 ++++++++++++-
 llvm/test/MC/WebAssembly/lto-set-conditional.ll | 13 +++++++++++++
 2 files changed, 25 insertions(+), 1 deletion(-)
 create mode 100644 llvm/test/MC/WebAssembly/lto-set-conditional.ll

diff --git a/llvm/lib/MC/WasmObjectWriter.cpp b/llvm/lib/MC/WasmObjectWriter.cpp
index c5a95cb3da54331..d51bbd642196acb 100644
--- a/llvm/lib/MC/WasmObjectWriter.cpp
+++ b/llvm/lib/MC/WasmObjectWriter.cpp
@@ -1817,7 +1817,18 @@ uint64_t WasmObjectWriter::writeOneObject(MCAssembler &Asm,
       assert(WasmIndices.count(&WS) > 0);
       Info.ElementIndex = WasmIndices.find(&WS)->second;
     } else if (WS.isDefined()) {
-      assert(DataLocations.count(&WS) > 0);
+      if (!DataLocations.count(&WS))
+        // In bitcode generated by split-LTO-unit mode in ThinLTO, these lines
+        // can appear:
+        // module asm ".lto_set_conditional symbolA,symbolA.[moduleId]"
+        // ...
+        // (Here [moduleId] will be replaced by a real module hash ID)
+        //
+        // Here the original symbols (symbolA here) have been renamed to symbol
+        // new names created by attaching their module IDs and the original
+        // symbols do not appear in the bitcode anymore, and thus not in
+        // DataLocations. We should ignore them.
+        continue;
       Info.DataRef = DataLocations.find(&WS)->second;
     }
     WS.setIndex(SymbolInfos.size());
diff --git a/llvm/test/MC/WebAssembly/lto-set-conditional.ll b/llvm/test/MC/WebAssembly/lto-set-conditional.ll
new file mode 100644
index 000000000000000..84270aea60bdebe
--- /dev/null
+++ b/llvm/test/MC/WebAssembly/lto-set-conditional.ll
@@ -0,0 +1,13 @@
+; RUN: llc %s -filetype=obj
+
+target triple = "wasm32-unknown-unknown"
+
+; In .lto_set_conditional, the first symbol is renamed to the second symbol, so
+; the first symbol does not exist anymore in the file. Object writer should not
+; crash when .lto_set_conditional is present.
+
+module asm ".lto_set_conditional a,a.new"
+
+define hidden void @a.new() {
+  ret void
+}



More information about the llvm-commits mailing list