[llvm-dev] The builtins library of compiler-rt is a performance HOG^WKILLER
Stefan Kanthak via llvm-dev
llvm-dev at lists.llvm.org
Mon Dec 3 05:40:37 PST 2018
Hi @ll,
LLVM-7.0.0-win32.exe contains and installs
lib\clang\7.0.0\lib\windows\clang_rt.builtins-i386.lib
The implementation of (at least) the multiplication and division
routines __[u]{div,mod,divmod,mul}[sdt]i[34] shipped with this
libraries SUCKS: they are factors SLOWER than even Microsoft's
NOTORIOUS POOR implementation of 64-bit division shipped with
MSVC and Windows!
The reasons: 1. subroutine matroschka, 2. "C" implementation!
JFTR: the target processor "i386" (introduced October 1985) is
a 32-bit processor, it has instructions to divide 64-bit
integers by 32-bit integers, and to multiply two 32-bit
integers giving a 64-bit product!
I expect that a library written 20+ years later takes
advantage of these instructions!
__divsi3 (18 instructions) perform a DIV after 2 calls of abs(),
plus a final negation, instead of just
a single IDIV
__modsi3 (14 instructions) calls __divsi3 (18 instructions)
__divmodsi4 (17 instructions) calls __divsi3 (18 instructions)
__udivsi3 (52 instructions) does NOT use DIV, but performs BITWISE
division using shifts and additions!
__umodsi3 (14 instructions) calls __udivsi3 (52 instructions)
__udivmodsi4 (17 instructions) calls __udivsi3 (52 instructions)
__muldi3 (41 instructions) performs a "long" multiplication on
16-bit "digits"
JFTR: I haven't checked whether clang actually calls these
SUPERFLUOUS routines listed above.
IT BETTER SHOULD NOT, NEVER!
__divdi3 (37 instructions) calls __udivmoddi4 (254 instructions)
__moddi3 (51 instructions) calls __udivmoddi4 (254 instructions)
__divmoddi4 (36 instructions) calls __divdi3 (37 instructions) which
calls __udivmoddi4 (254 instructions)
__udivdi3 (8 instructions) calls __udivmoddi4 (254 instructions)
__umoddi3 (33 instructions) calls __udivmoddi4 (254 instructions)
JFTR: the subdirectory compiler-rt/lib/builtins/i386/ contains FAR
better (although suboptimal) __divdi3, __moddi3, __udivdi3 and
__umoddi3 routines written in assembler, which SHOULD be
shipped with clang_rt.builtins-i386.lib instead of the above
listed POOR and NOT optimised implementations!
NOT AMUSED
Stefan Kanthak
PS: <https://lists.llvm.org/pipermail/llvm-dev/2018-November/128094.html>
has patches for the assembler routines!
PPS: please remove the blatant lie
| The builtins library provides optimized implementations of
| this and other low-level routines, either in target-independent
| C form, or as a heavily-optimized assembly.
seen on <https://compiler-rt.llvm.org/>
These routines are NOT optimized, and for sure NOT heavily-
optimized!
More information about the llvm-dev
mailing list