[llvm] [WebAssembly] Support parsing .lto_set_conditional (PR #126546)

Heejin Ahn via llvm-commits llvm-commits at lists.llvm.org
Sat Mar 29 04:50:43 PDT 2025


https://github.com/aheejin updated https://github.com/llvm/llvm-project/pull/126546

>From ddc1b123b91470383b9523a12932538e66b9f6f5 Mon Sep 17 00:00:00 2001
From: Heejin Ahn <aheejin at gmail.com>
Date: Sun, 9 Feb 2025 04:51:20 +0000
Subject: [PATCH 1/2] [WebAssembly] Support parsing .lto_set_conditional

In the split-LTO-unit mode in ThinLTO, a compilation module is split
into two and global variables that meet a specific criteria is moved to
the second module.

And if there is an originally local-linkage global value defined in the
first module and referenced in the second module or the vice versa, that
value is _promoted_ by attaching a module has ID to their names in order
to prevent name clashes because now they can be referenced from other
modules.

And when that promoted global value is a function, a
`.lto_set_conditional` entry is written to the first module to avoid
breaking references from inline assembly:
https://github.com/llvm/llvm-project/blob/d21fc58aeeaa7f0369a24dbe70a0360e0edbf76f/llvm/lib/Transforms/IPO/ThinLTOBitcodeWriter.cpp#L84-L91

The syntax of this is, if the original function name is `a` and the
module ID is `123`,
```ll
module asm ".lto_set_conditional symbolA,symbolA.123"
```
These symbols are parsed here:
https://github.com/llvm/llvm-project/blob/648981f913431749c4656268ed670677a88511f6/llvm/lib/MC/MCParser/AsmParser.cpp#L6467

The first function symbol in this `.lto_set_conditional` do not exist as
a function in the bitcode anymore because it was renamed to the second.
So they are not assigned as function symbols but they are not really
data either, so the object writer crashes here:
https://github.com/llvm/llvm-project/blob/5b9e6c7993359c16b4d645c851bb7fe2fd7b78c7/llvm/lib/MC/WasmObjectWriter.cpp#L1820

This PR makes the object writer just skip those symbols.

---

This problem was discovered when I was testing with
`-fwhole-program-vtables`.
The reason we didn't have this problem before with ThinLTO was because
`-fsplit-lto-unit`, which splits LTO units when possible, defaults to
false, but it defaults to true when `-fwhole-program-vtables` is used.
---
 llvm/lib/MC/WasmObjectWriter.cpp                | 13 ++++++++++++-
 llvm/test/MC/WebAssembly/lto-set-conditional.ll | 13 +++++++++++++
 2 files changed, 25 insertions(+), 1 deletion(-)
 create mode 100644 llvm/test/MC/WebAssembly/lto-set-conditional.ll

diff --git a/llvm/lib/MC/WasmObjectWriter.cpp b/llvm/lib/MC/WasmObjectWriter.cpp
index c5a95cb3da543..d51bbd642196a 100644
--- a/llvm/lib/MC/WasmObjectWriter.cpp
+++ b/llvm/lib/MC/WasmObjectWriter.cpp
@@ -1817,7 +1817,18 @@ uint64_t WasmObjectWriter::writeOneObject(MCAssembler &Asm,
       assert(WasmIndices.count(&WS) > 0);
       Info.ElementIndex = WasmIndices.find(&WS)->second;
     } else if (WS.isDefined()) {
-      assert(DataLocations.count(&WS) > 0);
+      if (!DataLocations.count(&WS))
+        // In bitcode generated by split-LTO-unit mode in ThinLTO, these lines
+        // can appear:
+        // module asm ".lto_set_conditional symbolA,symbolA.[moduleId]"
+        // ...
+        // (Here [moduleId] will be replaced by a real module hash ID)
+        //
+        // Here the original symbols (symbolA here) have been renamed to symbol
+        // new names created by attaching their module IDs and the original
+        // symbols do not appear in the bitcode anymore, and thus not in
+        // DataLocations. We should ignore them.
+        continue;
       Info.DataRef = DataLocations.find(&WS)->second;
     }
     WS.setIndex(SymbolInfos.size());
diff --git a/llvm/test/MC/WebAssembly/lto-set-conditional.ll b/llvm/test/MC/WebAssembly/lto-set-conditional.ll
new file mode 100644
index 0000000000000..84270aea60bde
--- /dev/null
+++ b/llvm/test/MC/WebAssembly/lto-set-conditional.ll
@@ -0,0 +1,13 @@
+; RUN: llc %s -filetype=obj
+
+target triple = "wasm32-unknown-unknown"
+
+; In .lto_set_conditional, the first symbol is renamed to the second symbol, so
+; the first symbol does not exist anymore in the file. Object writer should not
+; crash when .lto_set_conditional is present.
+
+module asm ".lto_set_conditional a,a.new"
+
+define hidden void @a.new() {
+  ret void
+}

>From 793adad4f23955c4d089c9757e7c1e31ca9bbb8c Mon Sep 17 00:00:00 2001
From: Heejin Ahn <aheejin at gmail.com>
Date: Sat, 29 Mar 2025 10:57:12 +0000
Subject: [PATCH 2/2] Address comments

---
 llvm/lib/MC/WasmObjectWriter.cpp              | 25 ++++++++++---------
 .../MC/WebAssembly/lto-set-conditional.ll     | 13 ----------
 .../test/MC/WebAssembly/lto-set-conditional.s |  8 ++++++
 3 files changed, 21 insertions(+), 25 deletions(-)
 delete mode 100644 llvm/test/MC/WebAssembly/lto-set-conditional.ll
 create mode 100644 llvm/test/MC/WebAssembly/lto-set-conditional.s

diff --git a/llvm/lib/MC/WasmObjectWriter.cpp b/llvm/lib/MC/WasmObjectWriter.cpp
index bba97857f5759..9d5a290f70cad 100644
--- a/llvm/lib/MC/WasmObjectWriter.cpp
+++ b/llvm/lib/MC/WasmObjectWriter.cpp
@@ -1785,6 +1785,18 @@ uint64_t WasmObjectWriter::writeOneObject(MCAssembler &Asm,
       WS.setIndex(InvalidIndex);
       continue;
     }
+    // In bitcode generated by split-LTO-unit mode in ThinLTO, these lines can
+    // appear:
+    // module asm ".lto_set_conditional symbolA,symbolA.[moduleId]"
+    // ...
+    // (Here [moduleId] will be replaced by a real module hash ID)
+    //
+    // Here the original symbol (symbolA here) has been renamed to the new name
+    // created by attaching its module ID, so the original symbol does not
+    // appear in the bitcode anymore, and thus not in DataLocations. We should
+    // ignore them.
+    if (WS.isData() && WS.isDefined() && !DataLocations.count(&WS))
+      continue;
     LLVM_DEBUG(dbgs() << "adding to symtab: " << WS << "\n");
 
     uint32_t Flags = 0;
@@ -1817,18 +1829,7 @@ uint64_t WasmObjectWriter::writeOneObject(MCAssembler &Asm,
       assert(WasmIndices.count(&WS) > 0);
       Info.ElementIndex = WasmIndices.find(&WS)->second;
     } else if (WS.isDefined()) {
-      if (!DataLocations.count(&WS))
-        // In bitcode generated by split-LTO-unit mode in ThinLTO, these lines
-        // can appear:
-        // module asm ".lto_set_conditional symbolA,symbolA.[moduleId]"
-        // ...
-        // (Here [moduleId] will be replaced by a real module hash ID)
-        //
-        // Here the original symbols (symbolA here) have been renamed to symbol
-        // new names created by attaching their module IDs and the original
-        // symbols do not appear in the bitcode anymore, and thus not in
-        // DataLocations. We should ignore them.
-        continue;
+      assert(DataLocations.count(&WS) > 0);
       Info.DataRef = DataLocations.find(&WS)->second;
     }
     WS.setIndex(SymbolInfos.size());
diff --git a/llvm/test/MC/WebAssembly/lto-set-conditional.ll b/llvm/test/MC/WebAssembly/lto-set-conditional.ll
deleted file mode 100644
index 84270aea60bde..0000000000000
--- a/llvm/test/MC/WebAssembly/lto-set-conditional.ll
+++ /dev/null
@@ -1,13 +0,0 @@
-; RUN: llc %s -filetype=obj
-
-target triple = "wasm32-unknown-unknown"
-
-; In .lto_set_conditional, the first symbol is renamed to the second symbol, so
-; the first symbol does not exist anymore in the file. Object writer should not
-; crash when .lto_set_conditional is present.
-
-module asm ".lto_set_conditional a,a.new"
-
-define hidden void @a.new() {
-  ret void
-}
diff --git a/llvm/test/MC/WebAssembly/lto-set-conditional.s b/llvm/test/MC/WebAssembly/lto-set-conditional.s
new file mode 100644
index 0000000000000..c9519e232c2d8
--- /dev/null
+++ b/llvm/test/MC/WebAssembly/lto-set-conditional.s
@@ -0,0 +1,8 @@
+# RUN: llvm-mc -triple=wasm32-unknown-unknown
+
+# Tests if `.lto_set_conditional` directives are parsed without crashing.
+.lto_set_conditional a, a.new
+.type  a.new, at function
+a.new:
+  .functype  a.new () -> ()
+  end_function



More information about the llvm-commits mailing list