[LLVMbugs] [Bug 2621] New: Improve SCEV AddRec binomial expansion

bugzilla-daemon at cs.uiuc.edu bugzilla-daemon at cs.uiuc.edu
Fri Aug 1 00:17:38 PDT 2008

```http://llvm.org/bugs/show_bug.cgi?id=2621

Summary: Improve SCEV AddRec binomial expansion
Product: new-bugs
Version: unspecified
Platform: PC
OS/Version: Linux
Status: NEW
Severity: normal
Priority: P2
Component: new bugs
AssignedTo: unassignedbugs at nondot.org
ReportedBy: sharparrow1 at yahoo.com
CC: llvmbugs at cs.uiuc.edu

Created an attachment (id=1877)
--> (http://llvm.org/bugs/attachment.cgi?id=1877)
Patch

Per description; basically, the current binomial expansion for AddRec sucks,
especially for K > 2.

A rewrite of the expansion is attached.  The primary improvement is that it
replaces the division operation with a shift and inverse multiply, which makes
the generated code smaller, faster, and more accurate.

The way that this method works is slightly unintuitive... the code might need
more comments.  The fundamental trick is that we can do exact division by an
odd number by multiplying by the inverse.  Here, we first shift out the factors
of 2, then do the remaining division by multiplication.  By itself, this is a
nice improvement, because the division can be expensive.  However, the reason
this is really useful here is that it radically reduces the number of bits we
need to keep track of while multiplying.  With this patch, instead of requiring
K * Width bits to be accurate, the algorithm requires only K + Width bits; the
number of 2's that can be factored out of K! is always less than K. The bits of
overflow beyond the part that's going to be shifted into the final result can
be safely ignored because multiplying by the inverse doesn't use those bits.

Other smaller changes: this makes the binomial expansion always accurate; it
will bail out instead of silently returning a wrong answer (the algorithm is
sound for any reasonable size K and arbitrary-width integers, but the patch
avoids generating >64-bit integers because CodeGen doesn't like them).  It also
computes the terms before zero-extending them; this helps a lot when the size
used for calculation is larger than the native register width because CodeGen
isn't smart enough to eliminate the carry when the extension is done first.

An extreme example:

int a(unsigned u) {unsigned v=0,w=0,x=0,y=0,z=0; for (unsigned i = 0; i < u;
i++) {v += w; w += x; x += y; y += z; z += 1;} return v;}

The attached reduces the x86 code for the expansion of the return value from 73
instructions, including two calls to __udivdi3, to 32 instructions without any
libcalls.  Not to mention that it isn't a fair comparison because the code
without this patch generates the wrong answer for large values of u.

Does this approach seem reasonable?

--
Configure bugmail: http://llvm.org/bugs/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

```