[llvm-bugs] [Bug 48828] New: Wasm debug info excessively large due to missing SHF_MERGE support for debug_str

via llvm-bugs llvm-bugs at lists.llvm.org
Wed Jan 20 21:25:38 PST 2021


https://bugs.llvm.org/show_bug.cgi?id=48828

            Bug ID: 48828
           Summary: Wasm debug info excessively large due to missing
                    SHF_MERGE support for debug_str
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: All
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: Backend: WebAssembly
          Assignee: unassignedbugs at nondot.org
          Reporter: dblaikie at gmail.com
                CC: llvm-bugs at lists.llvm.org

Wasm isn't deduplicating debug info strings:

$ cat wasm1.c
struct t1 { };
struct t1 v1;
$ cat wasm2.c
struct t1 { };
int main() {
  struct t1 v1;
}
$ clang-tot -target wasm32-unknown-unknown wasm{1,2}.c -nostdlib -Wl,-no-entry
-g
$ llvm-dwarfdump-tot a.out -debug-str
a.out:  file format WASM

.debug_str contents:
0x00000000: "clang version 12.0.0 (git at github.com:llvm/llvm-project.git
439e8f6c05584c36ea3f79d9b83a78098d40e629)"
0x00000065: "wasm1.c"
0x0000006d: "/usr/local/google/home/blaikie/dev/scratch"
0x00000098: "v1"
0x0000009b: "t1"
0x0000009e: "clang version 12.0.0 (git at github.com:llvm/llvm-project.git
439e8f6c05584c36ea3f79d9b83a78098d40e629)"
0x00000103: "wasm2.c"
0x0000010b: "/usr/local/google/home/blaikie/dev/scratch"
0x00000136: "main"
0x0000013b: "int"
0x0000013f: "v1"
0x00000142: "t1"


Note the duplicate "t1" and "v1" in the debug_str contents above.

What does wasm do for deduplicating code strings, I wonder - they usually use
SHF_MERGE too.

Contents of section DATA:
 0000 02004180 080b1073 7472696e 67310073  ..A....string1.s
 0010 7472696e 67310000 4190080b 08000400  tring1..A.......
 0020 00080400 00       

(from a test case with "string1" in two files linked together) looks like wasm
could use support for deduplicating code strings too. I don't think this is
mandated by the C++ standard, but is done by most implementations.

$ cat wasm1.c
extern const char* x;
const char* x = "string1";
blaikie at blaikie-linux2:~/dev/scratch$ cat wasm2.c
extern const char* y;
extern const char* x;
const char* y = "string1";
int main() {
  return x == y; // this doesn't have to be true, but is on most
implementations as far as I understand
}
$ clang-tot -target wasm32-unknown-unknown wasm{1,2}.c -nostdlib -Wl,-no-entry
-g
$ llvm-objdump -s a.out --section=DATA  

a.out:  file format wasm

Contents of section DATA:
 0000 02004180 080b1073 7472696e 67310073  ..A....string1.s
 0010 7472696e 67310000 4190080b 08000400  tring1..A.......
 0020 00080400 00                          .....

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20210121/86f88298/attachment.html>


More information about the llvm-bugs mailing list