[PATCH] D91203: [WebAssembly] Fixed wasm64 DWARF using 64-bit code pointer sizes

Wed Nov 11 13:44:19 PST 2020

dblaikie added a comment.

In D91203#2387690 <https://reviews.llvm.org/D91203#2387690>, @sbc100 wrote:

> In D91203#2387683 <https://reviews.llvm.org/D91203#2387683>, @dblaikie wrote:
>
>> In D91203#2386990 <https://reviews.llvm.org/D91203#2386990>, @aardappel wrote:
>>
>>> @dblaikie Confusingly, these are all different: function pointers at runtime (in a Wasm VM) are 32-bit indices. LLVM function pointers are 64-bit in wasm64 for consistency, but get truncated when lowered in Isel. Then here we have a 3rd kind of code pointer, just for DWARF, since Wasm doesn't have the concept of a pointer to an instruction inside a function (which DWARF needs for DW_AT_low_pc, and we need to relocate).
>>>
>>> And yes, if this is also used for globals, and there is an architecture for which code pointers would be smaller than global pointers, then this patch would break them.
>>>
>>> An alternative solution for Wasm would be to just go along with the expected 64-bit size for DW_FORM_addr, but that would mean wasted space, and a new reloc type for us. Also means DWARF data between wasm32 and wasm64 is different.
>>
>> Not sure we're perhaps talking about the same thing or there's some overlap/confusion.
>>
>> What's the actual sizeof(int*) and sizeof(int(*)()) if I printed those out in some C++ code compiled to wasm? The size has to be known by the frontend/at the language level, so it can't change due to lowering.
>> If I have an array of int*, that array needs elements of a size known by the frontend/language-level, similarly if I have an array of int(*)() - and in both cases I'd need to use some relocation to fill in my array initializer that initializes that array?
>
> At the C/C++/llvm layers all pointer are the same size (i.e. 64bit on wasm64), including function pointers.   The difference with wasm is that those functions pointers are not in the same address space as the bytecode offsets used in the DWARF information.

Ah, so the language level doesn't support the ability to point to sub-function granularity. Yeah... that might be pretty esoteric/weird for DWARF to understand that its code pointers are vastly different from language code pointers. I doubt DWARF consumers are written with support for that concept - I expect they'd intend to read a code pointer from program memory (eg: a void (*)()) and expect it means the same thing as reading a code pointer from the DWARF data itself (eg: low_pc of a function, etc).

I think probably before this patch/direction is pursued it'd be reasonable to have some data on what existing DWARF consumers do here/whether any are being made to handle wasm in some way, and if so what the plan is for that sort of possible divergence between code pointers in the program/process and those in the DWARF.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91203/new/

https://reviews.llvm.org/D91203