[cfe-dev] "Optimized implementations"?

Sun Sep 6 06:08:58 PDT 2020

<https://compiler-rt.llvm.org/index.html> boasts:

| The builtins library provides optimized implementations of this
| and other low-level routines, either in target-independent C form,
| or as a heavily-optimized assembly.

Really?

Left: inperformant code shipped in    # Right: proper code, just one or
      clang_rt.builtins-*             #        two bits faster and shorter

___paritysi2:
        mov     eax, [esp+4]          #       mov     ax, [esp+4]
        mov     ecx, eax              #
        shr     ecx, 16               #
        xor     ecx, eax              #       xor     ax, [esp+6]
        mov     eax, ecx              #
        shr     eax, 8                #
        xor     eax, ecx              #       xor     al, ah
        mov     ecx, eax              #
        shr     ecx, 4                #
        xor     ecx, eax              #
        mov     eax, 0x6996           #
        and     cl, 15                #
        shr     eax, cl               #       setnp   al
        and     eax, 1                #       movzx   eax, al
        ret                           #       ret

___paritydi2:
        mov     eax, [esp+8]          #       mov     ax, [esp+4]
        xor     eax, [esp+4]          #       xor     ax, [esp+6]
        push    eax                   #       xor     ax, [esp+8]
        call    ___paritysi2          #       xor     ax, [esp+10]
        add     esp, 4                #       xor     al, ah
                                      #       setnp   al
                                      #       movzx   eax, al
        ret                           #       ret

The proper code needs 14 instead of 21 instructions in 48 instead of 57
bytes for both functions together, more than halving the instructions
executed per function call!

AGAIN:
Remove every occurance of the word "optimized" on the above web page.

'nuff said
Stefan