<div dir="ltr"><div dir="ltr">I just compiled the two attached files in 32-bit mode and ran it.<div><br></div><div>It printed <span style="font-variant-ligatures:no-common-ligatures;color:rgb(0,0,0);font-family:Menlo;font-size:11px">efcdab8967452301.</span></div><div><font color="#000000" face="Menlo"><span style="font-size:11px;font-variant-ligatures:no-common-ligatures"><br></span></font></div><div><font color="#000000" face="Menlo"><span style="font-size:11px;font-variant-ligatures:no-common-ligatures">I verified via objdump that the my_bswap function contains the follow assembly which I believe matches the assembly you linked to on godbolt.</span></font></div><div><font color="#000000" face="Menlo"><span style="font-size:11px;font-variant-ligatures:no-common-ligatures"><br></span></font></div><div><font color="#000000" face="Menlo"><div><span style="font-size:11px;font-variant-ligatures:no-common-ligatures">_my_bswap:</span></div><div><span style="font-size:11px;font-variant-ligatures:no-common-ligatures">    1f70:<span style="white-space:pre">       </span>55 <span style="white-space:pre">  </span>pushl<span style="white-space:pre">        </span>%ebp</span></div><div><span style="font-size:11px;font-variant-ligatures:no-common-ligatures">    1f71:<span style="white-space:pre">  </span>89 e5 <span style="white-space:pre">       </span>movl<span style="white-space:pre"> </span>%esp, %ebp</span></div><div><span style="font-size:11px;font-variant-ligatures:no-common-ligatures">    1f73:<span style="white-space:pre">    </span>8b 55 08 <span style="white-space:pre">    </span>movl<span style="white-space:pre"> </span>8(%ebp), %edx</span></div><div><span style="font-size:11px;font-variant-ligatures:no-common-ligatures">    1f76:<span style="white-space:pre"> </span>8b 45 0c <span style="white-space:pre">    </span>movl<span style="white-space:pre"> </span>12(%ebp), %eax</span></div><div><span style="font-size:11px;font-variant-ligatures:no-common-ligatures">    1f79:<span style="white-space:pre">        </span>0f c8 <span style="white-space:pre">       </span>bswapl<span style="white-space:pre">       </span>%eax</span></div><div><span style="font-size:11px;font-variant-ligatures:no-common-ligatures">    1f7b:<span style="white-space:pre">  </span>0f ca <span style="white-space:pre">       </span>bswapl<span style="white-space:pre">       </span>%edx</span></div><div><span style="font-size:11px;font-variant-ligatures:no-common-ligatures">    1f7d:<span style="white-space:pre">  </span>5d <span style="white-space:pre">  </span>popl<span style="white-space:pre"> </span>%ebp</span></div><div><span style="font-size:11px;font-variant-ligatures:no-common-ligatures">    1f7e:<span style="white-space:pre">  </span>c3 <span style="white-space:pre">  </span>retl</span></div></font><div><br></div><div><br clear="all"><div><div dir="ltr" class="gmail_signature">~Craig</div></div><br></div></div></div></div><br><div class="gmail_quote"><div dir="ltr">On Sun, Nov 25, 2018 at 11:39 AM Stefan Kanthak <<a href="mailto:stefan.kanthak@nexgo.de">stefan.kanthak@nexgo.de</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">"Craig Topper" <<a href="mailto:craig.topper@gmail.com" target="_blank">craig.topper@gmail.com</a>> wrote:<br>

<br>

> bswapdi2 for i386 is correct<br>

<br>

OUCH!<br>

<br>

> Bits 31:0 of the source are loaded into edx. Bits 63:32 are loaded into<br>

> eax. Those are each bswapped.<br>

<br>

This exchanges the high byte of each 32-bit PART with its low byte, but<br>

NOT the high byte of the whole 64-bit operand with its low byte!<br>

<br>

Please get a clue!<br>

<br>

> The ABI for the return is edx contains bits [63:32] and eax contains<br>

> [31:0]. This is opposite of how the register were loaded.<br>

<br>

My post is NOT about swapping EDX with EAX, but the bytes WITHIN both.<br>

<br>

With the 64-bit argument loaded into EDX:EAX, the instruction sequence<br>

<br>

    bswap  edx<br>

    bswap  eax<br>

    xchg   eax, edx<br>

<br>

is NOT equivalent to<br>

<br>

    bswap    rdi<br>

<br>

with the 64-bit argument loaded into RDI.<br>

<br>

Just run the following code on x86-64:<br>

<br>

    mov    rdi, 0123456789abcdefh    ; pass (fake) argument in RDI<br>

; split argument into high and low part<br>

    mov    rdx, rdi<br>

    shr    rdx, 32                   ; high part in EDX<br>

    mov    eax, rdi                  ; low part in EAX<br>

; perform __bswapdi2() as in 32-bit mode<br>

    xchg   eax, edx                  ; swap parts, argument now loaded<br>

                                     ;  like in 32-bit mode<br>

    bswap  edx<br>

    bswap  eax                       ; result like that in 32-bit mode<br>

; load result into 64-bit register<br>

    shl    rdx, 32<br>

    or     rax, rdx<br>

; perform _bswapdi2() in native 64-bit mode<br>

    bswap  rdi<br>

; compare results<br>

    xor    rax, rdi<br>

<br>

not amused<br>

Stefan Kanthak<br>

<br>

> On Sun, Nov 25, 2018 at 10:36 AM Craig Topper <<a href="mailto:craig.topper@gmail.com" target="_blank">craig.topper@gmail.com</a>><br>

> wrote:<br>

> <br>

>> bswapsi2 on the x86-64 isn't using the bswap instruction because "unsigned<br>

>> long" is 64-bits on x86-64 linux. But its 32-bits on x86-64 msvc.<br>

>><br>

>> Not sure about the bswapdi2 i386 case.<br>

>><br>

>><br>

>> ~Craig<br>

>><br>

>><br>

>> On Sun, Nov 25, 2018 at 8:03 AM Stefan Kanthak via llvm-dev <<br>

>> <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>> wrote:<br>

>><br>

>>> Hi @ll,<br>

>>><br>

>>> targetting i386, LLVM/clang generates wrong code for the following<br>

>>> functions:<br>

>>><br>

>>> unsigned long __bswapsi2 (unsigned long ul)<br>

>>> {<br>

>>>     return (((ul) & 0xff000000ul) >> 3 * 8)<br>

>>>          | (((ul) & 0x00ff0000ul) >>     8)<br>

>>>          | (((ul) & 0x0000ff00ul) <<     8)<br>

>>>          | (((ul) & 0x000000fful) << 3 * 8);<br>

>>> }<br>

>>><br>

>>> unsigned long long __bswapdi2(unsigned long long ull)<br>

>>> {<br>

>>>     return ((ull & 0xff00000000000000ull) >> 7 * 8)<br>

>>>          | ((ull & 0x00ff000000000000ull) >> 5 * 8)<br>

>>>          | ((ull & 0x0000ff0000000000ull) >> 3 * 8)<br>

>>>          | ((ull & 0x000000ff00000000ull) >>     8)<br>

>>>          | ((ull & 0x00000000ff000000ull) <<     8)<br>

>>>          | ((ull & 0x0000000000ff0000ull) << 3 * 8)<br>

>>>          | ((ull & 0x000000000000ff00ull) << 5 * 8)<br>

>>>          | ((ull & 0x00000000000000ffull) << 7 * 8);<br>

>>> }<br>

>>><br>

>>> You can find these sources in "compiler-rt/lib/builtins/bswapsi2.c"<br>

>>> and "compiler-rt/lib/builtins/bswapdi2.c", for example!<br>

>>><br>

>>><br>

>>> Compiled with "-O3 -target i386" this yields the following code<br>

>>> (see <<a href="https://godbolt.org/z/F4UIl4" rel="noreferrer" target="_blank">https://godbolt.org/z/F4UIl4</a>>):<br>

>>><br>

>>> __bswapsi2: # @__bswapsi2<br>

>>>     push  ebp<br>

>>>     mov   ebp, esp<br>

>>>     mov   eax, dword ptr [ebp + 8]<br>

>>>     bswap eax<br>

>>>     pop   ebp<br>

>>>     ret<br>

>>><br>

>>> __bswapdi2: # @__bswapdi2<br>

>>>     push  ebp<br>

>>>     mov   ebp, esp<br>

>>>     mov   edx, dword ptr [ebp + 8]<br>

>>>     mov   eax, dword ptr [ebp + 12]<br>

>>>     bswap eax<br>

>>>     bswap edx<br>

>>>     pop   ebp<br>

>>>     ret<br>

>>><br>

>>> __bswapsi2() is correct, but __bswapdi2() NOT: swapping just the<br>

>>> halves of a "long long" is OBVIOUSLY WRONG!<br>

>>><br>

>>> From the C source, the expected result for the input value<br>

>>> 0x0123456789ABCDEF is 0xEFCDAB8967452301; the compiled code but<br>

>>> produces 0x67452301EFCDAB89<br>

>>><br>

>>><br>

>>> And compiled for x86-64 this yields the following code (see<br>

>>> <<a href="https://godbolt.org/z/uM9nvN" rel="noreferrer" target="_blank">https://godbolt.org/z/uM9nvN</a>>):<br>

>>><br>

>>> __bswapsi2: # @__bswapsi2<br>

>>>     mov   eax, edi<br>

>>>     shr   eax, 24<br>

>>>     mov   rcx, rdi<br>

>>>     shr   rcx, 8<br>

>>>     and   ecx, 65280<br>

>>>     or    rax, rcx<br>

>>>     mov   rcx, rdi<br>

>>>     shl   rcx, 8<br>

>>>     and   ecx, 16711680<br>

>>>     or    rax, rcx<br>

>>>     and   rdi, 255<br>

>>>     shl   rdi, 24<br>

>>>     or    rax, rdi<br>

>>>     ret<br>

>>><br>

>>> __bswapdi2: # @__bswapdi2<br>

>>>     bswap rdi<br>

>>>     mov   rax, rdi<br>

>>>     ret<br>

>>><br>

>>> Both are correct, but __bswapsi2() should of course use BSWAP too!<br>

>>><br>

>>><br>

>>> Stefan Kanthak<br>

>>><br>

>>> PS: for comparision with another compiler, take a look at<br>

>>>     <<a href="https://skanthak.homepage.t-online.de/msvc.html#example5" rel="noreferrer" target="_blank">https://skanthak.homepage.t-online.de/msvc.html#example5</a>><br>

>>> _______________________________________________<br>

>>> LLVM Developers mailing list<br>

>>> <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>

>>> <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>

>>><br>

>><br>

</blockquote></div>