[llvm-dev] Where's the optimiser gone? (part 4): 64-bit division routines for IA32

Thu Nov 29 06:20:08 PST 2018

Hi @ll,

compiler-rt implements the 64-bit division routines __divdi3(),
__moddi3(), __udivdi3() and __umoddi3() for IA32 alias x86 in
assembler (see the directory compiler-rt/lib/builtins/i386/)

While Stephen Canon did a decent job back in December 2008, he
left QUITE some room for improvement^Woptimisation: see the
attached patch.

All 4 routines have two almost identical code branches of 20+
and 22+ instructions, with just TWO additional instructions in
the second branch:
- divdi3.S  lines 72-102 vs. 103-104
- moddi3.S  lines 71-104 vs. 104-144
- udivdi3.S lines 43-67  vs.  68-100
- umoddi3.S lines 44-72  vs.  73-108

These two branches can of course be folded into just one branch,
saving 20+ instructions.

The third branch, where both dividend and divisor are below 2**32,
always performs a "long division", even if a single DIV would be
sufficient, at the expense of an additional CMP and Jcc: adding
these 2 instructions saves the execution of a DIV and QUITE some
processor cycles (on average about 10-16 cycles per call, from a
total of about 42-56 cycles).

See <https://skanthak.homepage.t-online.de/msvc.html#sidenote> for
comparision of these improved routines with other implementations.

regards
Stefan Kanthak

PS: is there any special reason why __divmoddi4() and __udivmoddi4()
    are not implemented in assembler?
    What about __udivmodti4() etc. for AMD64 alias x86-64?
    See the directory compiler-rt/lib/builtins/x86_64/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: i386_di3.patch
Type: application/octet-stream
Size: 26444 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20181129/2cc0afd2/attachment-0001.obj>