[llvm-dev] Where's the optimiser gone (part 12): superfluous 64-bit integer divisions and multiplications
Stefan Kanthak via llvm-dev
llvm-dev at lists.llvm.org
Fri Mar 15 07:52:40 PDT 2019
Pick <https://www.hackersdelight.org/hdcodetxt/divmnu64.c.txt>,
perform some MINOR modifications -- remove the code for the case
n==1, replace nlz() with __builtin_clz() -- then compile it for
a 32-bit machine: see <https://godbolt.org/z/BynK7S>
For a slight variation see <https://godbolt.org/z/DZdn9l>.
Deficiency #1 (lines 35 to 39)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
qhat = (un[j+n]*b + un[j+n-1])/vn[n-1];
rhat = (un[j+n]*b + un[j+n-1]) - qhat*vn[n-1];
clang does NOT recognize the explicit computation of the remainder
rhat directly following the quotient qhat and generates a call to
__udivdi3() instead of a call to __udivmoddi4() (which __udivdi3()
calls anyway!) which provides both quotient and remainder.
Even after defining the macro OPTIMIZE which produces
qhat = (un[j+n]*b + un[j+n-1])/vn[n-1];
rhat = (un[j+n]*b + un[j+n-1])%vn[n-1];
clang gets no clue!
JFTR: GCC and even MSVC recognize this sequence and generate calls
to __udivmoddi4() and _aulldvrm() respectively!
Deficiency #2 (line 42)
~~~~~~~~~~~~~~~~~~~~~~~
if (qhat >= b || qhat*vn[n-2] > b*rhat + un[j+n-2])
The second term is evaluated only if qhat is LESS than 2**32, so
a single 32x32-bit multiplication producing a 64-bit product is
sufficient here; clang but performs a 64x64-bit multiplication.
JFTR: with OPTIMIZE defined, clang (like MSVC; GCC but fails)
generates a single MUL here.
Deficiency #3 (line 51)
~~~~~~~~~~~~~~~~~~~~~~~
The "loop" in lines 42 to 46 leaves qhat LESS than 2**32, so for
p = qhat*vn[i];
again a single 32x32-bit multiplication producing a 64-bit product
is sufficient; clang but performs a 64x64-bit multiplication.
JFTR: with OPTIMIZE defined, clang (like MSVC and GCC) generates
a single MUL here.
stay tuned
Stefan Kanthak
More information about the llvm-dev
mailing list