[llvm-commits] [llvm] r127828 - in /llvm/trunk: include/llvm/ADT/APInt.h lib/Support/APInt.cpp unittests/ADT/APIntTest.cpp

Fri Mar 18 13:41:38 PDT 2011

On Fri, Mar 18, 2011 at 7:23 AM, Cameron Zwarich <zwarich at apple.com> wrote:
> On Mar 18, 2011, at 4:13 AM, Benjamin Kramer wrote:
>
>> On 18.03.2011, at 09:03, Eli Friedman wrote:
>>
>>> On Thu, Mar 17, 2011 at 1:39 PM, Benjamin Kramer
>>> <benny.kra at googlemail.com> wrote:
>>>> Author: d0k
>>>> Date: Thu Mar 17 15:39:06 2011
>>>> New Revision: 127828
>>>>
>>>> URL: http://llvm.org/viewvc/llvm-project?rev=127828&view=rev
>>>> Log:
>>>> Add an argument to APInt's magic udiv calculation to specify the number of bits that are known zero in the divided number.
>>>>
>>>> This will come in handy soon.
>>>
>>> Hmm... what exactly is the effect of using LeadingZeros as opposed to
>>> truncating the input APInt?
>>
>> The algorithm takes the sign bit into account to determine the needed fixups after multiplying with
>> the magic constant. Truncating the input APInt (and thus reducing its BitWidth) will give different
>> results.
>
> There is actually a newer algorithm that is better than the Hacker's Delight one in many cases (for unsigned division it requires only a multiply and an AND for ~80% of all divisors):
>
> http://comjnl.oxfordjournals.org/content/51/4/470.abstract
>
> I've been meaning to implement it some time, at least for the cases where it is better.

Hmm... the savings isn't all that great; take the case of the
following function:

unsigned a(unsigned x) { return x / 7; }

Current code:
	movl	%edi, %eax
	movl	$613566757, %edx
	mull	%edx
	subl	%edx, %edi
	shrl	%edi
	leal	(%rdx,%rdi), %eax
	shrl	$2, %eax
	ret

Possibility 1 (if we fixed the README entry about commuting mull):
	movl	$613566757, %eax
	mull	%edi
	subl	%edx, %edi
	shrl	%edi
	leal	(%rdx,%rdi), %eax
	shrl	$2, %eax
	ret

Possibility 2 (from that paper):
	cmpl	$0xccccccd1, %edi
	adcl	$-1, %edi
	movl	$0x92492493, %eax
	mull	%edi
	shrl	$2, %edx
	movl	%edx, %eax
	ret

Possibility 3 (from agner.org)
	movl	%edi, %eax
	movl	$0x92492492, %edx
	addl	$1, %eax
	jc	OVERFLOW
	mull	%edx
OVERFLOW:
	shrl	$2, %edx
	movl	%edx, %eax
	ret

"Possibility 2" looks like it's a little better than "Possibility 1",
but not by much... if we can prove the input isn't UINT_MAX,
"Possibility 3" is better than either of the other methods.

For everything besides the nasty 10% cases like 7, I think our current
code is optimal.

-Eli