<html>
    <head>
      <base href="https://bugs.llvm.org/">
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - Wasm debug info excessively large due to missing SHF_MERGE support for debug_str"
   href="https://bugs.llvm.org/show_bug.cgi?id=48828">48828</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>Wasm debug info excessively large due to missing SHF_MERGE support for debug_str
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>libraries
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>trunk
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>All
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>enhancement
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>Backend: WebAssembly
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>dblaikie@gmail.com
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>llvm-bugs@lists.llvm.org
          </td>
        </tr></table>
      <p>
        <div>
        <pre>Wasm isn't deduplicating debug info strings:

$ cat wasm1.c
struct t1 { };
struct t1 v1;
$ cat wasm2.c
struct t1 { };
int main() {
  struct t1 v1;
}
$ clang-tot -target wasm32-unknown-unknown wasm{1,2}.c -nostdlib -Wl,-no-entry
-g
$ llvm-dwarfdump-tot a.out -debug-str
a.out:  file format WASM

.debug_str contents:
0x00000000: "clang version 12.0.0 (<a href="mailto:git@github.com">git@github.com</a>:llvm/llvm-project.git
439e8f6c05584c36ea3f79d9b83a78098d40e629)"
0x00000065: "wasm1.c"
0x0000006d: "/usr/local/google/home/blaikie/dev/scratch"
0x00000098: "v1"
0x0000009b: "t1"
0x0000009e: "clang version 12.0.0 (<a href="mailto:git@github.com">git@github.com</a>:llvm/llvm-project.git
439e8f6c05584c36ea3f79d9b83a78098d40e629)"
0x00000103: "wasm2.c"
0x0000010b: "/usr/local/google/home/blaikie/dev/scratch"
0x00000136: "main"
0x0000013b: "int"
0x0000013f: "v1"
0x00000142: "t1"


Note the duplicate "t1" and "v1" in the debug_str contents above.

What does wasm do for deduplicating code strings, I wonder - they usually use
SHF_MERGE too.

Contents of section DATA:
 0000 02004180 080b1073 7472696e 67310073  ..A....string1.s
 0010 7472696e 67310000 4190080b 08000400  tring1..A.......
 0020 00080400 00       

(from a test case with "string1" in two files linked together) looks like wasm
could use support for deduplicating code strings too. I don't think this is
mandated by the C++ standard, but is done by most implementations.

$ cat wasm1.c
extern const char* x;
const char* x = "string1";
blaikie@blaikie-linux2:~/dev/scratch$ cat wasm2.c
extern const char* y;
extern const char* x;
const char* y = "string1";
int main() {
  return x == y; // this doesn't have to be true, but is on most
implementations as far as I understand
}
$ clang-tot -target wasm32-unknown-unknown wasm{1,2}.c -nostdlib -Wl,-no-entry
-g
$ llvm-objdump -s a.out --section=DATA  

a.out:  file format wasm

Contents of section DATA:
 0000 02004180 080b1073 7472696e 67310073  ..A....string1.s
 0010 7472696e 67310000 4190080b 08000400  tring1..A.......
 0020 00080400 00                          .....</pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>