[llvm-dev] The builtins library of compiler-rt is a performance HOG^WKILLER

Mon Dec 3 09:53:12 PST 2018

None of the "si" division routines will be used by x86. They exist for
targets that don't support the operations natively. X86 supports them
natively so will never use the library functions.

X86 has its own assembly implementation of __muldi3 that uses 32-bit pieces.

We should be using the assembly versions of the "di" division routines on
i386. Except when compiler-rt is built with MSVC because MSVC can't parse
the at&t assembly syntax.

~Craig

On Mon, Dec 3, 2018 at 5:51 AM Stefan Kanthak via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> Hi @ll,
>
> LLVM-7.0.0-win32.exe contains and installs
> lib\clang\7.0.0\lib\windows\clang_rt.builtins-i386.lib
>
> The implementation of (at least) the multiplication and division
> routines __[u]{div,mod,divmod,mul}[sdt]i[34] shipped with this
> libraries SUCKS: they are factors SLOWER than even Microsoft's
> NOTORIOUS POOR implementation of 64-bit division shipped with
> MSVC and Windows!
>
> The reasons: 1. subroutine matroschka, 2. "C" implementation!
>
> JFTR: the target processor "i386" (introduced October 1985) is
>       a 32-bit processor, it has instructions to divide 64-bit
>       integers by 32-bit integers, and to multiply two 32-bit
>       integers giving a 64-bit product!
>       I expect that a library written 20+ years later takes
>       advantage of these instructions!
>
> __divsi3 (18 instructions) perform a DIV after 2 calls of abs(),
>                            plus a final negation, instead of just
>                            a single IDIV
> __modsi3 (14 instructions) calls __divsi3 (18 instructions)
> __divmodsi4 (17 instructions) calls __divsi3 (18 instructions)
>
> __udivsi3 (52 instructions) does NOT use DIV, but performs BITWISE
>                             division using shifts and additions!
> __umodsi3 (14 instructions) calls __udivsi3 (52 instructions)
> __udivmodsi4 (17 instructions) calls __udivsi3 (52 instructions)
>
> __muldi3 (41 instructions) performs a "long" multiplication on
>                            16-bit "digits"
>
> JFTR: I haven't checked whether clang actually calls these
>       SUPERFLUOUS routines listed above.
>       IT BETTER SHOULD NOT, NEVER!
>
> __divdi3 (37 instructions) calls __udivmoddi4 (254 instructions)
> __moddi3 (51 instructions) calls __udivmoddi4 (254 instructions)
> __divmoddi4 (36 instructions) calls __divdi3 (37 instructions) which
>                               calls __udivmoddi4 (254 instructions)
> __udivdi3 (8 instructions) calls __udivmoddi4 (254 instructions)
> __umoddi3 (33 instructions) calls __udivmoddi4 (254 instructions)
>
> JFTR: the subdirectory compiler-rt/lib/builtins/i386/ contains FAR
>       better (although suboptimal) __divdi3, __moddi3, __udivdi3 and
>       __umoddi3 routines written in assembler, which SHOULD be
>       shipped with clang_rt.builtins-i386.lib instead of the above
>       listed POOR and NOT optimised implementations!
>
> NOT AMUSED
> Stefan Kanthak
>
> PS: <https://lists.llvm.org/pipermail/llvm-dev/2018-November/128094.html>
>     has patches for the assembler routines!
>
> PPS: please remove the blatant lie
>      | The builtins library provides optimized implementations of
>      | this and other low-level routines, either in target-independent
>      | C form, or as a heavily-optimized assembly.
>      seen on <https://compiler-rt.llvm.org/>
>      These routines are NOT optimized, and for sure NOT heavily-
>      optimized!
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20181203/4a2c9e20/attachment.html>