[CORRECTION] [compiler-rt] _udivdi3(), _umoddi3(), _moddi3() and _divdi3() routines not properly "tuned"

Thu Nov 9 01:33:43 PST 2017

Hi,

replace the proposed fix #3 from my initial post with the following
faster version:

> Bug #3: such  "highly tuned" routines should but come without large
> ~~~~~~~ duplicate code sequences.
> 
> In the 4 routines named in the subject, the code from label 1: to the
> respective return is almost identical to the code preceeding label 1:;
> the only difference is the initial subtraction and the insertion of
> a leading 1 into the quotient.
>
> Fix #3: remove all lines between "jae 1f" (including the wrong
> ~~~~~~~ comment which follows "jae 1f") and the label 1:, then
>         apply the following diff (yes, this adds one or two
>         instructions to the overall execution path, but should
>         typically cost no cycles, since they can execute in parallel).

+   pushl %edi
+   xorl  %edi, %edi // MSB of quotient
    cmpl  %ebx, %edx // to avoid overflowing the upcoming divide.
+   jb    0f
-   jae   1f

1: /* High word of a is greater than or equal to (b >> (1 + i)) on this branch */

+   movl  $0x80000000, %edi       // MSB of quotient
    subl  %ebx, %edx // subtract bhi from ahi so that divide will not
+
+0: /* High word of a is smaller than (b >> (1 + i)) on this branch */
+
    divl %ebx // overflow, and find q and r such that
    //
    // ahi:alo = (1:q)*bhi + r
    //
    // Note that q is a number in (31-i).(1+i)
    // fix point.
-   pushl %edi
    notl  %ecx
    shrl  %eax
+   orl   %eax       // insert proper MSB into quotient
-   orl   $0x80000000, %eax

regards
Stefan Kanthak