[cfe-dev] Missed opportunity: quotient and remainder down to 60 times slower than necessary
Stefan Kanthak via cfe-dev
cfe-dev at lists.llvm.org
Sun Sep 6 05:57:46 PDT 2020
Spoiler alert: this is not the demonstration of the factor 30 achievable
with a properly coded SINGLE 128/128-bit division ... although it can
show even a factor 60!
--- bugs-bunny.c ---
// Copyleft © 2014-2020, Stefan Kanthak <stefan.kanthak at nexgo.de>
__uint128_t udivmodti4(__uint128_t dividend, __uint128_t divisor, __uint128_t *remainder) {
if (remainder != 0)
*remainder = dividend % divisor;
return dividend / divisor;
}
--- EOF ---
clang -c -o- -O3 -S -target amd64-pc-windows bugs-bunny.c
udivmodti4: # @udivmodti4
# %bb.0:
[...]
testq %r8, %r8
je .LBB0_2
# %bb.1:
movq %r8, %rbx
movq %r13, %rdi
movq %r12, %rsi
movq %r15, %rdx
movq %r14, %rcx
callq __umodti3
movq %rdx, 8(%rbx)
movq %rax, (%rbx)
.LBB0_2:
movq %r13, %rdi
movq %r12, %rsi
movq %r15, %rdx
movq %r14, %rcx
callq __udivti3
[...]
OOPS!
Instead to generate a single call of the library function __udivmodti4()
which returns both quotient and remainder, even with the option -O3
specified clang generates two separate calls of the library functions
__umodti3() and __udivti3() -- which is especially "funny", since both
call the library function __udivmodti4() in turn!
More information about the cfe-dev
mailing list