[cfe-dev] Loop optimised into div (?)

Bruce Hoult via cfe-dev cfe-dev at lists.llvm.org
Sun Oct 27 20:30:15 PDT 2019


On Sun, Oct 27, 2019 at 7:38 PM Bruce Hoult <brucehoult at sifive.com> wrote:
>
> On Wed, Oct 23, 2019 at 10:29 AM Joan Lluch via cfe-dev
> <cfe-dev at lists.llvm.org> wrote:
> >
> > Hi all,
> >
> > For the following code
> >
> > int countHundred( int num )
> > {
> >   int count = 0;
> >   while ( num >= 100) { count++ ; num = num - 100; }
> >   return count;
> > }
> >
> > the loop shown above is optimised away and replaced by ‘udiv’ instruction.
> >
> > This is undesirable for targets requiring libcalls for integer divisions, such as the MSP430.
>
> That's not necessarily the case, unless you know in advance that "num"
> is a relatively small positive number.
>
> Even on a 16 bit machine, large positive numbers could have loop trip
> counts up to 327 which is going to be much slower than a library call
> using even a simple shift-and-subtract division algorithm.
>
> I hope the udiv isn't used until after an initial test to make sure
> "num" is >100 or at least positive, otherwise using udiv is an actual
> bug.
>
> I'd suggest doing some testing to find out at which point a library
> call is actually faster than this loop (probably anything above 2000
> or 4000 or so), modifying the compiler to not do the transform if the
> trip count is *known* to be less than this (i.e. continue to use the
> library routine if the trip count is unknown), and also rewrite the
> code to:
>
> #define ITER_LIMIT 24 // for example -- determine by experiment and
> modify compiler appropriately
> int countHundred( int num )
> {
>   if (num > (ITER_LIMIT * 100)) return num / 100;
>   int count = 0;
>   while ( num >= 100) { count++ ; num = num - 100; }
>   return count;
> }

I'm seeing the same behaviour with RISC-V rv32i (no multiply or divide
instructions), even with -O1.

So this is something that should be done in general for ISAs without
hardware divide.



More information about the cfe-dev mailing list