<html>
<head>
<base href="https://bugs.llvm.org/">
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW - Wasm debug info excessively large due to missing SHF_MERGE support for debug_str"
href="https://bugs.llvm.org/show_bug.cgi?id=48828">48828</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>Wasm debug info excessively large due to missing SHF_MERGE support for debug_str
</td>
</tr>
<tr>
<th>Product</th>
<td>libraries
</td>
</tr>
<tr>
<th>Version</th>
<td>trunk
</td>
</tr>
<tr>
<th>Hardware</th>
<td>PC
</td>
</tr>
<tr>
<th>OS</th>
<td>All
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>enhancement
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>Backend: WebAssembly
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>dblaikie@gmail.com
</td>
</tr>
<tr>
<th>CC</th>
<td>llvm-bugs@lists.llvm.org
</td>
</tr></table>
<p>
<div>
<pre>Wasm isn't deduplicating debug info strings:
$ cat wasm1.c
struct t1 { };
struct t1 v1;
$ cat wasm2.c
struct t1 { };
int main() {
struct t1 v1;
}
$ clang-tot -target wasm32-unknown-unknown wasm{1,2}.c -nostdlib -Wl,-no-entry
-g
$ llvm-dwarfdump-tot a.out -debug-str
a.out: file format WASM
.debug_str contents:
0x00000000: "clang version 12.0.0 (<a href="mailto:git@github.com">git@github.com</a>:llvm/llvm-project.git
439e8f6c05584c36ea3f79d9b83a78098d40e629)"
0x00000065: "wasm1.c"
0x0000006d: "/usr/local/google/home/blaikie/dev/scratch"
0x00000098: "v1"
0x0000009b: "t1"
0x0000009e: "clang version 12.0.0 (<a href="mailto:git@github.com">git@github.com</a>:llvm/llvm-project.git
439e8f6c05584c36ea3f79d9b83a78098d40e629)"
0x00000103: "wasm2.c"
0x0000010b: "/usr/local/google/home/blaikie/dev/scratch"
0x00000136: "main"
0x0000013b: "int"
0x0000013f: "v1"
0x00000142: "t1"
Note the duplicate "t1" and "v1" in the debug_str contents above.
What does wasm do for deduplicating code strings, I wonder - they usually use
SHF_MERGE too.
Contents of section DATA:
0000 02004180 080b1073 7472696e 67310073 ..A....string1.s
0010 7472696e 67310000 4190080b 08000400 tring1..A.......
0020 00080400 00
(from a test case with "string1" in two files linked together) looks like wasm
could use support for deduplicating code strings too. I don't think this is
mandated by the C++ standard, but is done by most implementations.
$ cat wasm1.c
extern const char* x;
const char* x = "string1";
blaikie@blaikie-linux2:~/dev/scratch$ cat wasm2.c
extern const char* y;
extern const char* x;
const char* y = "string1";
int main() {
return x == y; // this doesn't have to be true, but is on most
implementations as far as I understand
}
$ clang-tot -target wasm32-unknown-unknown wasm{1,2}.c -nostdlib -Wl,-no-entry
-g
$ llvm-objdump -s a.out --section=DATA
a.out: file format wasm
Contents of section DATA:
0000 02004180 080b1073 7472696e 67310073 ..A....string1.s
0010 7472696e 67310000 4190080b 08000400 tring1..A.......
0020 00080400 00 .....</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>