[lld] [LLD][COFF] Demangle ARM64EC export names. (PR #87068)
Jacek Caban via llvm-commits
llvm-commits at lists.llvm.org
Mon May 20 15:17:32 PDT 2024
cjacek wrote:
> Are you sure MSVC actually demangles the name, as opposed to getting the demangled name from somewhere else?
>From my experiments with MSVC, I concluded that it does demangle the export name. Since it's all undocumented, I conducted numerous experiments with MSVC, testing various inputs in both typical situations and edge cases to infer explanations for the observed behavior.
Below are some tests relevant to this PR. I used inputs that are simple enough to rule out dependency on other factors:
```
$ cat unmangled-func.s
.text
.globl func
.p2align 2
func:
mov w0, #2
ret
$ llvm-mc -filetype=obj -triple=arm64ec-windows unmangled-func.s -o unmangled-func.o
$ cat mangled-func.s
.text
.globl "#func"
.p2align 2
"#func":
mov x0, #1
ret
$ llvm-mc -filetype=obj -triple=arm64ec-windows mangled-func.s -o mangled-func.o
$ cat x64-func.s
.text
.globl func
.p2align 2
func:
movq $3, %rax
retq
$ llvm-mc -filetype=obj -triple=x86_64-windows x64-func.s -o x64-func.o
$ cat unmangled-rva.s
.section ".test","dr"
.rva func
$ llvm-mc -filetype=obj -triple=arm64ec-windows unmangled-rva.s -o unmangled-rva.o
$ llvm-mc -filetype=obj -triple=x86_64-windows unmangled-rva.s -o unmangled-rva-x64.o
$ cat mangled-rva.s
.section ".test","dr"
.rva "#func"
$ llvm-mc -filetype=obj -triple=arm64ec-windows mangled-rva.s -o mangled-rva.o
```
The basic test:
```
$ link -nologo -dll -noentry -machine:arm64ec mangled-func.o "-export:#func"
```
creates a DLL with an unmangled export name pointing to the mangled symbol. This suggests that the demangling of the export name is not related to weak anti-dependency aliases or similar mechanisms since it works without them. Another question is whether we should demangle the symbol as well, or just the export name. A similar test defining only an unmangled symbol fails (using x64-func.o gives the same result):
```
$ link -nologo -dll -noentry -machine:arm64ec unmangled-func.o "-export:#func"
```
With an error:
```
LINK : error LNK2001: unresolved external symbol #func (EC Symbol)
```
This indicates that the linker looks for the exact symbol name, not its demangled form. Testing further, using the unmangled export and unmangled symbol works fine:
```
$ link -nologo -dll -noentry -machine:arm64ec unmangled-func.o -export:func
```
The more tricky case is using an unmangled export name and a mangled symbol, which works (unlike the other way around):
```
$ link -nologo -dll -noentry -machine:arm64ec mangled-func.o -export:func
```
This raises the question of how the linker knows about the mangled symbol in this case. The next experiment defines both mangled and unmangled symbol definitions to see what the linker does:
```
$ link -nologo -dll -noentry -machine:arm64ec mangled-func.o unmangled-func.o -export:func
```
This command results in an error (using x64-func.o instead of unmangled-func.o produces the same result):
```
LINK : fatal error LNK1413: ARM64EC symbol '昣湵c' is defined but has no ientry thunk and x64 symbol '畦据' is also defined but doesn't have an exit thunk. There must be either an ientry thunk or an exit thunk for one of these symbols.
```
(The symbol names in the error message are broken, likely due to a UTF-8/UTF-16 mismatch in link.exe.)
This shows that the linker has some more complicated EC mangling awareness. I tried adding entry and exit thunks, but I couldn't find a way to make the linker accept this. This is crucial for other aspects of my work, but in the context of this PR, it shows that the linker understands the relationship between mangled and demangled symbols for exported symbols. This mangling handling seems specific to export handling; if I skip the export directive:
```
$ link -nologo -dll -noentry -machine:arm64ec mangled-func.o unmangled-func.o
```
it builds fine. These symbols can also be resolved from object files:
```
$ link -nologo -dll -noentry -machine:arm64ec mangled-func.o unmangled-func.o mangled-rva.o unmangled-rva.o
```
This matches my other experiments for different features. From my observations, the linker has mangling awareness in specific situations, but it's not something that unconditionally applies to all symbols. Other examples of special handling include:
- Special-casing unmangled->mangled weak anti-dependency symbols
- Allowing references to unmangled names from static libraries that have only mangled variants in their ECSYMBOLS section
- Entry point symbols
- Allowing x64 code to reference symbols defined only in the mangled form
In my WIP tree (https://github.com/cjacek/llvm-project/commits/arm64ec), I implemented these features using a mechanism called "EC aliases," where I create paired symbols with different semantics (allowing any other definition to override the alias symbol; overriding one symbol unmarks the paired symbol as no longer being an EC alias). This code is not yet fully compatible or clean, and I plan to refine it further, conduct more testing, and likely rewrite it before submitting those parts for review. Currently, it's sufficient to get things working, including linking against MSVC default libs. I also updated it to cover all the experiments described here.
Returning to the context of exports, there are a few more interesting tests. If I try to reference an unmangled symbol when only the mangled version is available, it fails (while this worked using the `-export` directive in the example above):
```
$ link -nologo -dll -noentry -machine:arm64ec mangled-func.o unmangled-rva.o
```
results in:
```
unmangled-rva.o : error LNK2001: unresolved external symbol func (EC Symbol)
```
However, if I add the -export directive, not only does the export work, but the unresolved symbol is resolved too:
```
$ link -nologo -dll -noentry -machine:arm64ec mangled-func.o unmangled-rva.o -export:func
$ link -nologo -dll -noentry -machine:arm64ec mangled-func.o unmangled-rva.o "-export:#func"
$ link -nologo -dll -noentry -machine:arm64ec unmangled-func.o mangled-rva.o -export:func
```
This behavior can be explained by the creation of "EC aliases" for exported symbols. One variant that still doesn't work is:
```
$ link -nologo -dll -noentry -machine:arm64ec unmangled-func.o mangled-rva.o "-export:#func"
```
Since the unmangled symbol is defined, the "EC alias" is not created and referencing its mangled form still fails.
Another similar corner case: if an unmangled symbol is referenced from x64 code, it may reference the mangled symbol even without an explicit alias (e.g., no `-export` directive, no explicit weak alias). For example, the following command links fine with MSVC:
```
$ link -nologo -dll -noentry -machine:arm64ec mangled-func.o unmangled-rva-x64.o
```
This PR touches only on export names, not "EC aliases" or similar mechanisms; I mentioned them for better context. The changed part of the code doesn't require additional modifications in my prototype, which otherwise matches the behavior in all the experiments mentioned here (except it doesn't issue an error when both mangled and unmangled symbols are defined).
https://github.com/llvm/llvm-project/pull/87068
More information about the llvm-commits
mailing list