[cfe-dev] Loop optimised into div (?)

Mon Oct 28 16:26:34 PDT 2019

> -----Original Message-----
> From: cfe-dev <cfe-dev-bounces at lists.llvm.org> On Behalf Of Joerg
> Sonnenberger via cfe-dev
> Sent: Monday, October 28, 2019 6:54 PM
> To: cfe-dev at lists.llvm.org
> Subject: Re: [cfe-dev] Loop optimised into div (?)
> 
> On Mon, Oct 28, 2019 at 03:15:37PM -0700, Bruce Hoult via cfe-dev wrote:
> > On Mon, Oct 28, 2019 at 2:28 PM Joerg Sonnenberger via cfe-dev
> > <cfe-dev at lists.llvm.org> wrote:
> > >
> > > On Mon, Oct 28, 2019 at 02:17:50PM -0700, Bruce Hoult via cfe-dev
> wrote:
> > > > > I found this when compiling a function that will convert a binary
> number to a sequence of decimal digits expressed in ascii. Every digit is
> determined by counting the number of subtractions required for power-or-
> ten factors . For that algorithm, only up to 9 subtractions are required
> at every step (or factor). Replacing that by divisions is a lot more
> expensive for the affected targets.
> > > >
> > > > Aha. I see, yes.
> > > >
> > > > I think the divisions are not as expensive as you might be assuming.
> > > > The key is to do the same as for the subtact version and divide by
> > > > 10000, then when the remainder is less than 10000 divide by 1000,
> and
> > > > so on. Don't repeatedly divide by 10 as in the recursive algorithm -
> -
> > > > that *will* be slow.
> > >
> > > It is also important to note that on any somewhat sane architecture,
> no
> > > division will actually be created here. Now if your architecture has
> > > neither a division nor a multiplication in hardware... Even then,
> > > division by 10 needs ~10 adds and shift in total.
> >
> > How do you do that?
> >
> > The only ways I know of, division by 10 needs up to wordSize-3
> > iterations of a loop, with each loop containing two
> > compare-and-branches, a subtract and an add (or OR) controlled by one
> > of the compares, and two shifts. That's eight instructions on most
> > ISAs.
> 
> Division by ten is essentially multiplication by 0xcccd for 16bit ops
> and shifting the result by 15 or so. That's multiplying by 0xc (shift
> and add), 0x1111 = 0x101 * 0x11 (two adds and two shifts).
> and adding the original value. Roughly at least. Double the number if
> there is no good way to express 32bit numbers.
> 
> Joerg

The book _Hacker's Delight_ has a whole chapter on integer division by
a constant, with proofs, turning into mostly multiplies and shifts.

I remember seeing a particularly clever sequence for converting binary
to decimal, years ago, that used these tricks in parallel to do more
than one digit at a time, allowing it to use fewer total multiplies.
(Which aren't exactly cheap, either.)
--paulr

> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev