[cfe-dev] Missed opportunity: quotient and remainder down to 60 times slower than necessary

Stefan Kanthak via cfe-dev cfe-dev at lists.llvm.org
Sun Sep 6 05:57:46 PDT 2020


Spoiler alert: this is not the demonstration of the factor 30 achievable
with a properly coded SINGLE 128/128-bit division ... although it can
show even a factor 60!

--- bugs-bunny.c ---
// Copyleft © 2014-2020, Stefan Kanthak <stefan.kanthak at nexgo.de>

__uint128_t udivmodti4(__uint128_t dividend, __uint128_t divisor, __uint128_t *remainder) {
    if (remainder != 0)
        *remainder = dividend % divisor;
    return dividend / divisor;
}
--- EOF ---

clang -c -o- -O3 -S -target amd64-pc-windows bugs-bunny.c

udivmodti4:     # @udivmodti4
# %bb.0:
[...]
        testq   %r8, %r8
        je      .LBB0_2
# %bb.1:
        movq    %r8, %rbx
        movq    %r13, %rdi
        movq    %r12, %rsi
        movq    %r15, %rdx
        movq    %r14, %rcx
        callq   __umodti3
        movq    %rdx, 8(%rbx)
        movq    %rax, (%rbx)
.LBB0_2:
        movq    %r13, %rdi
        movq    %r12, %rsi
        movq    %r15, %rdx
        movq    %r14, %rcx
        callq   __udivti3
[...]


OOPS!
Instead to generate a single call of the library function __udivmodti4()
which returns both quotient and remainder, even with the option -O3
specified clang generates two separate calls of the library functions
__umodti3() and __udivti3() -- which is especially "funny", since both
call the library function __udivmodti4() in turn!



More information about the cfe-dev mailing list