[cfe-dev] Loop optimised into div (?)

Bruce Hoult via cfe-dev cfe-dev at lists.llvm.org
Mon Oct 28 15:15:37 PDT 2019


On Mon, Oct 28, 2019 at 2:28 PM Joerg Sonnenberger via cfe-dev
<cfe-dev at lists.llvm.org> wrote:
>
> On Mon, Oct 28, 2019 at 02:17:50PM -0700, Bruce Hoult via cfe-dev wrote:
> > > I found this when compiling a function that will convert a binary number to a sequence of decimal digits expressed in ascii. Every digit is determined by counting the number of subtractions required for power-or-ten factors . For that algorithm, only up to 9 subtractions are required at every step (or factor). Replacing that by divisions is a lot more expensive for the affected targets.
> >
> > Aha. I see, yes.
> >
> > I think the divisions are not as expensive as you might be assuming.
> > The key is to do the same as for the subtact version and divide by
> > 10000, then when the remainder is less than 10000 divide by 1000, and
> > so on. Don't repeatedly divide by 10 as in the recursive algorithm --
> > that *will* be slow.
>
> It is also important to note that on any somewhat sane architecture, no
> division will actually be created here. Now if your architecture has
> neither a division nor a multiplication in hardware... Even then,
> division by 10 needs ~10 adds and shift in total.

How do you do that?

The only ways I know of, division by 10 needs up to wordSize-3
iterations of a loop, with each loop containing two
compare-and-branches, a subtract and an add (or OR) controlled by one
of the compares, and two shifts. That's eight instructions on most
ISAs.

If you want to optimize for most of the inputs being small then you
need a second loop that initially shifts 10 left bit by bit until it
becomes bigger than the input you're dividing (or the MSB gets set).
That's six instructions per loop on most ISAs, so you win if most
inputs have fewer than about 57% of the bits significant.
Alternatively, if you've got a Count_Leading_Zeros instruction and
constant time multiple-bit dynamic count shifts then you can do this
second (initial) loop in constant time instead.



More information about the cfe-dev mailing list