[llvm-bugs] [Bug 52599] New: lld-link /delayload - first call of a function with bad floating point parameter on x64

via llvm-bugs llvm-bugs at lists.llvm.org
Wed Nov 24 08:49:48 PST 2021


https://bugs.llvm.org/show_bug.cgi?id=52599

            Bug ID: 52599
           Summary: lld-link /delayload - first call of a function with
                    bad floating point parameter on x64
           Product: lld
           Version: 13.0
          Hardware: PC
                OS: Windows NT
            Status: NEW
          Severity: normal
          Priority: P
         Component: COFF
          Assignee: unassignedbugs at nondot.org
          Reporter: thomas.ferrand at hexagon.com
                CC: llvm-bugs at lists.llvm.org

Created attachment 25475
  --> https://bugs.llvm.org/attachment.cgi?id=25475&action=edit
Source files to reproduce the bug

When linking a program with a DLL using the /delayload switch, the first call
to a function defined in the DLL will get bad value for (at least one of) the
floating point parameters.

Attached are 2 sources file my_lib.cpp and my_exe.cpp to reproduce the bug.
They should be built as folow:
 - "C:\Program Files\LLVM\bin\clang-cl.exe" my_lib.cpp /link /DLL
/OUT:my_dll.dll
 - "C:\Program Files\LLVM\bin\clang-cl.exe" /c my_exe.cpp /OUT:my_exe.obj
 - "C:\Program Files\LLVM\bin\lld-link.exe" my_dll.lib Delayimp.lib
/delayload:my_dll.dll my_exe.obj /OUT:my_exe.exe

When running my_exe.exe, the output will be "1 0 3" instead of the expected "1
2 3".

The last step can be replaced with
"C:\Program Files (x86)\Microsoft Visual
Studio\2019\Professional\VC\Tools\MSVC\14.28.29910\bin\Hostx64\x64\link.exe"
my_dll.lib Delayimp.lib /delayload:my_dll.dll my_exe.obj /OUT:my_exe.exe
to use link.exe with the same options or with
"C:\Program Files\LLVM\bin\lld-link.exe" my_dll.lib my_exe.obj /OUT:my_exe.exe
to use lld with /delayload. In both of those cases the resulting executable
will give the expected "1 2 3".

I believe the bug occurs because __delayLoadHelper2 (the function defined in
delayimp.lib that actually loads the DLL and locate the function we want to
call during the first usage) writes into the top of the stack space of its
caller (I don't know why, is it a weird Windows caling convention?) but the
thunk generated by lld doesn't that space.

Specifically, the thunk generated by lld (for x64) looks like this:
push        rcx  
push        rdx  
push        r8  
push        r9  
sub         rsp,48h  
movdqa      xmmword ptr [rsp],xmm0  
movdqa      xmmword ptr [rsp+10h],xmm1  
movdqa      xmmword ptr [rsp+20h],xmm2  
movdqa      xmmword ptr [rsp+30h],xmm3  
mov         rdx,rax  
lea         rcx,[__xt_z+28h (01401C9E88h)]  
call        __delayLoadHelper2 (01401A3464h)  
movdqa      xmm0,xmmword ptr [rsp]  
movdqa      xmm1,xmmword ptr [rsp+10h]  
movdqa      xmm2,xmmword ptr [rsp+20h]  
movdqa      xmm3,xmmword ptr [rsp+30h]  
add         rsp,48h  
pop         r9  
pop         r8  
pop         rdx  
pop         rcx  
jmp         rax

(it allocates space on the stack and uses it to save the register prior to
calling __delayLoadHelper2 and restore them later)

Whereas the thunk generated by link.exe looked like that:
mov         qword ptr [rsp+8],rcx  
mov         qword ptr [rsp+10h],rdx  
mov         qword ptr [rsp+18h],r8  
mov         qword ptr [rsp+20h],r9  
sub         rsp,68h  
movdqa      xmmword ptr [rsp+20h],xmm0  
movdqa      xmmword ptr [rsp+30h],xmm1  
movdqa      xmmword ptr [rsp+40h],xmm2  
movdqa      xmmword ptr [rsp+50h],xmm3  
mov         rdx,rax  
lea         rcx,[__DELAY_IMPORT_DESCRIPTOR_my_dll (0140435020h)]  
call        __delayLoadHelper2 (01400089C2h)  
movdqa      xmm0,xmmword ptr [rsp+20h]  
movdqa      xmm1,xmmword ptr [rsp+30h]  
movdqa      xmm2,xmmword ptr [rsp+40h]  
movdqa      xmm3,xmmword ptr [rsp+50h]  
mov         rcx,qword ptr [rsp+70h]  
mov         rdx,qword ptr [rsp+78h]  
mov         r8,qword ptr [rsp+80h]  
mov         r9,qword ptr [rsp+88h]  
add         rsp,68h  
jmp         __tailMerge_my_dll+77h (01402237B8h)  
jmp         rax

It looks very similar but, for some reason, it doesn't save the xmmX register
on the top of the stack like lld, it leave 32 bytes that __delayLoadHelper2 is
free to mess with.

Indeed, (at least on my machine), the first 2 instruction of __delayLoadHelper2
are:
mov         qword ptr [rsp+10h],rbx  
mov         qword ptr [rsp+18h],rsi  

which, if I'm not mistaken are writting into the stack space where xmm0 and
xmm1 were saved.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20211124/cf436f07/attachment.html>


More information about the llvm-bugs mailing list