[llvm-commits] [llvm] r127828 - in /llvm/trunk: include/llvm/ADT/APInt.h lib/Support/APInt.cpp unittests/ADT/APIntTest.cpp
Eli Friedman
eli.friedman at gmail.com
Fri Mar 18 13:41:38 PDT 2011
On Fri, Mar 18, 2011 at 7:23 AM, Cameron Zwarich <zwarich at apple.com> wrote:
> On Mar 18, 2011, at 4:13 AM, Benjamin Kramer wrote:
>
>> On 18.03.2011, at 09:03, Eli Friedman wrote:
>>
>>> On Thu, Mar 17, 2011 at 1:39 PM, Benjamin Kramer
>>> <benny.kra at googlemail.com> wrote:
>>>> Author: d0k
>>>> Date: Thu Mar 17 15:39:06 2011
>>>> New Revision: 127828
>>>>
>>>> URL: http://llvm.org/viewvc/llvm-project?rev=127828&view=rev
>>>> Log:
>>>> Add an argument to APInt's magic udiv calculation to specify the number of bits that are known zero in the divided number.
>>>>
>>>> This will come in handy soon.
>>>
>>> Hmm... what exactly is the effect of using LeadingZeros as opposed to
>>> truncating the input APInt?
>>
>> The algorithm takes the sign bit into account to determine the needed fixups after multiplying with
>> the magic constant. Truncating the input APInt (and thus reducing its BitWidth) will give different
>> results.
>
> There is actually a newer algorithm that is better than the Hacker's Delight one in many cases (for unsigned division it requires only a multiply and an AND for ~80% of all divisors):
>
> http://comjnl.oxfordjournals.org/content/51/4/470.abstract
>
> I've been meaning to implement it some time, at least for the cases where it is better.
Hmm... the savings isn't all that great; take the case of the
following function:
unsigned a(unsigned x) { return x / 7; }
Current code:
movl %edi, %eax
movl $613566757, %edx
mull %edx
subl %edx, %edi
shrl %edi
leal (%rdx,%rdi), %eax
shrl $2, %eax
ret
Possibility 1 (if we fixed the README entry about commuting mull):
movl $613566757, %eax
mull %edi
subl %edx, %edi
shrl %edi
leal (%rdx,%rdi), %eax
shrl $2, %eax
ret
Possibility 2 (from that paper):
cmpl $0xccccccd1, %edi
adcl $-1, %edi
movl $0x92492493, %eax
mull %edi
shrl $2, %edx
movl %edx, %eax
ret
Possibility 3 (from agner.org)
movl %edi, %eax
movl $0x92492492, %edx
addl $1, %eax
jc OVERFLOW
mull %edx
OVERFLOW:
shrl $2, %edx
movl %edx, %eax
ret
"Possibility 2" looks like it's a little better than "Possibility 1",
but not by much... if we can prove the input isn't UINT_MAX,
"Possibility 3" is better than either of the other methods.
For everything besides the nasty 10% cases like 7, I think our current
code is optimal.
-Eli
More information about the llvm-commits
mailing list