[llvm-dev] The builtins library of compiler-rt is a performance HOG^WKILLER
Stefan Kanthak via llvm-dev
llvm-dev at lists.llvm.org
Mon Dec 3 10:50:14 PST 2018
"Craig Topper" <craig.topper at gmail.com> wrote:
> None of the "si" division routines will be used by x86.
That was my expectation too.
> They exist for targets that don't support the operations natively.
> X86 supports them natively so will never use the library functions.
So they SHOULD not be built (or at least not shipped) with the
builtins library for x86.
> X86 has its own assembly implementation of __muldi3 that uses 32-bit
> pieces.
I know; that's why I placed this ABOVE my "JFTR:"
> We should be using the assembly versions of the "di" division routines on
> i386. Except when compiler-rt is built with MSVC because MSVC can't parse
> the at&t assembly syntax.
Again: my offer to provide these routines still stands!
I have OPTIMISED __divdi3, __moddi3, __udivdi3 and __umoddi3 in
Intel syntax, wrapped as inline files into an NMakefile, for use
with ML.EXE.
For the optimisations see the patch I sent last week.
Since Howard Hinnant is NO MORE with LLVM: who is the CURRENT
code owner and reviewer for the builtins library, especially for
x86?
I'm asking this SIMPLE question now for the 3rd time!
I also have __udivmoddi3: adding the pointer to the remainder as
argument and 4 more instructions will turn it into __udivmoddi4.
Compiling them with MSVC is of course easy to achieve: remove the
MASM/ML statements, put the assembler source inside an __asm block,
and add a function definition with __declspec(naked)
But then someone will have to find new filenames; I'd prefer to
leave them as *.ASM, so they can be added to YOUR source tree
without clobbering existing files.
The same holds for __alldiv, __alldvrm, __allrem, __aulldiv,
__aulldvrm and __aullrem, plus __allmul, __allshl, _allshr and
__aullshr.
If you name a reviewer I'll send them to llvm-commits!
regards
Stefan
> On Mon, Dec 3, 2018 at 5:51 AM Stefan Kanthak via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Hi @ll,
>>
>> LLVM-7.0.0-win32.exe contains and installs
>> lib\clang\7.0.0\lib\windows\clang_rt.builtins-i386.lib
>>
>> The implementation of (at least) the multiplication and division
>> routines __[u]{div,mod,divmod,mul}[sdt]i[34] shipped with this
>> libraries SUCKS: they are factors SLOWER than even Microsoft's
>> NOTORIOUS POOR implementation of 64-bit division shipped with
>> MSVC and Windows!
>>
>> The reasons: 1. subroutine matroschka, 2. "C" implementation!
>>
>> JFTR: the target processor "i386" (introduced October 1985) is
>> a 32-bit processor, it has instructions to divide 64-bit
>> integers by 32-bit integers, and to multiply two 32-bit
>> integers giving a 64-bit product!
>> I expect that a library written 20+ years later takes
>> advantage of these instructions!
>>
>> __divsi3 (18 instructions) perform a DIV after 2 calls of abs(),
>> plus a final negation, instead of just
>> a single IDIV
>> __modsi3 (14 instructions) calls __divsi3 (18 instructions)
>> __divmodsi4 (17 instructions) calls __divsi3 (18 instructions)
>>
>> __udivsi3 (52 instructions) does NOT use DIV, but performs BITWISE
>> division using shifts and additions!
>> __umodsi3 (14 instructions) calls __udivsi3 (52 instructions)
>> __udivmodsi4 (17 instructions) calls __udivsi3 (52 instructions)
>>
>> __muldi3 (41 instructions) performs a "long" multiplication on
>> 16-bit "digits"
>>
>> JFTR: I haven't checked whether clang actually calls these
>> SUPERFLUOUS routines listed above.
>> IT BETTER SHOULD NOT, NEVER!
>>
>> __divdi3 (37 instructions) calls __udivmoddi4 (254 instructions)
>> __moddi3 (51 instructions) calls __udivmoddi4 (254 instructions)
>> __divmoddi4 (36 instructions) calls __divdi3 (37 instructions) which
>> calls __udivmoddi4 (254 instructions)
>> __udivdi3 (8 instructions) calls __udivmoddi4 (254 instructions)
>> __umoddi3 (33 instructions) calls __udivmoddi4 (254 instructions)
>>
>> JFTR: the subdirectory compiler-rt/lib/builtins/i386/ contains FAR
>> better (although suboptimal) __divdi3, __moddi3, __udivdi3 and
>> __umoddi3 routines written in assembler, which SHOULD be
>> shipped with clang_rt.builtins-i386.lib instead of the above
>> listed POOR and NOT optimised implementations!
>>
>> NOT AMUSED
>> Stefan Kanthak
>>
>> PS: <https://lists.llvm.org/pipermail/llvm-dev/2018-November/128094.html>
>> has patches for the assembler routines!
>>
>> PPS: please remove the blatant lie
>> | The builtins library provides optimized implementations of
>> | this and other low-level routines, either in target-independent
>> | C form, or as a heavily-optimized assembly.
>> seen on <https://compiler-rt.llvm.org/>
>> These routines are NOT optimized, and for sure NOT heavily-
>> optimized!
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>
More information about the llvm-dev
mailing list