[LLVMdev] Bignum development
goodwillhart at googlemail.com
Sun Jun 13 08:28:22 PDT 2010
I should quickly add that the whole thing should be made to work for
subtraction and borrow too.
I think multiplication is OK. There you do a full i64xi64
multiplication by casting to i128's. It currently does the right thing
You then add the low i64 to the result with any previous carry,
yielding a carry. You then add the high limb to the result, yielding a
I think that all works, just by making use of the optimisation we've
Division currently works as expected, I think, i.e. div and mod get
coallesced into a single DIV. So it really is just carry handling that
needs to be fixed I think.
Everything else, like shifting i64's by first casting to i128's first,
no doubt works as expected. Perhaps there's some code sequences here
or there that could be improved slightly, but otherwise I think,
technically, everything is there to produce efficient code for bignum
On 13 June 2010 16:16, Bill Hart <goodwillhart at googlemail.com> wrote:
> I think from the C compiler's point of view, it is going to want it to
> work for any size above an i64, i.e. all the way up to an i128 so that
> if the user of the C compiler does this computation with __uint128_t's
> then it will Do The Right Thing TM.
> Basically, you want
> unsigned long a, b, c, d;
> const __uint128_t u = (__uint128_t) a + b;
> const unsigned long v = u >> 64;
> const __uint128_t w = (__uint128_t) c + d + v;
> to do the right thing, namely do the first addition with an ADD, the
> carry flag holding the shift value v, and do the third addition with a
> single ADC if available (assuming there are no instructions between
> which kill the carry flag, in which case a byte would have to be
> In IR, this is going to look like:
> %0 = load i64* %a, align 8 ; <i64> [#uses=1]
> %1 = zext i64 %0 to i128 ; <i128> [#uses=1]
> %2 = load i64* %b, align 8 ; <i64> [#uses=1]
> %3 = zext i64 %2 to i128 ; <i128> [#uses=1]
> %4 = add i128 %1, %3 ; <i128> [#uses=1]
> %5 = lshr i128 %4, 64 ; <i128> [#uses=1]
> %6 = trunc i128 %5 to i64 ; <i64> [#uses=1]
> %7 = load i64* %c, align 8 ; <i64> [#uses=1]
> %8 = zext i64 %7 to i128 ; <i128> [#uses=1]
> %9 = load i64* %d, align 8 ; <i64> [#uses=1]
> %10 = zext i64 %9 to i128 ; <i128> [#uses=1]
> %11 = add i128 %8, %10 ; <i128> [#uses=1]
> %12 = zext i64 %6 to i128 ; <i128> [#uses=1]
> %13 = add i128 %11, %12 ; <i128> [#uses=1]
> But as I mentioned, some care needs to be taken with loop counters so
> that comparing them to their limit does not destroy the carry flag.
> And of course the whole optimisation needs to work for the similar
> thing involving __uint64_t's on 32 bit machines.
> It's one hell of an optimisation. But also a radical speedup (33% in a
> loop without unrolling, better than 100% with unrolling).
> On 13 June 2010 13:22, Duncan Sands <baldrick at free.fr> wrote:
>>> Yeah I had a think about it, and I think intrinsics are the wrong way
>>> to do it. So I'd say you are likely right.
>> For this to work well, the way the code generators handle flags will need
>> to be improved: currently it is suboptimal, in fact kind of a hack.
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
More information about the llvm-dev