[llvm-dev] Generating Big Num addition code which uses ADC (add with carry) instructions

Fri Nov 13 06:49:47 PST 2015

You can make up the i128 by shifting and masking.

Here's a snippet of IR from my compiler showing how i64 quantities are
loaded from memory, combined to make an i128, added, and decomposed back to
i64. LLVM is smart enough to figure out that the shifting isn't really
required.

  %17 = getelementptr inbounds i64* %13, i32 1
  %18 = load i64* %17
  %19 = zext i64 %18 to i128
  %20 = shl i128 %19, 64
  %21 = getelementptr inbounds i64* %13, i32 0
  %22 = load i64* %21
  %23 = zext i64 %22 to i128
  %24 = or i128 %20, %23
  %25 = getelementptr inbounds i64* %15, i32 1
  %26 = load i64* %25
  %27 = zext i64 %26 to i128
  %28 = shl i128 %27, 64
  %29 = getelementptr inbounds i64* %15, i32 0
  %30 = load i64* %29
  %31 = zext i64 %30 to i128
  %32 = or i128 %28, %31
  %33 = add i128 %24, %32
  %34 = trunc i128 %33 to i64
  %35 = getelementptr inbounds i64* %16, i32 0
  store i64 %34, i64* %35
  %36 = lshr i128 %33, 64
  %37 = trunc i128 %36 to i64
  %38 = getelementptr inbounds i64* %16, i32 1
  store i64 %37, i64* %38

And here's the x64_64 assembly that LLVM produced from the above:

0000000000000000 <LMtop.Fadd>:
   0:   48 8b 47 18             mov    0x18(%rdi),%rax
   4:   48 8b 48 40             mov    0x40(%rax),%rcx
   8:   48 8b 50 48             mov    0x48(%rax),%rdx
   c:   48 03 48 30             add    0x30(%rax),%rcx
  10:   48 13 50 38             adc    0x38(%rax),%rdx
  14:   48 89 48 50             mov    %rcx,0x50(%rax)
  18:   48 89 50 58             mov    %rdx,0x58(%rax)
  1c:   c3                      retq

The approach looks horrid, but the results are what you want, and work with
LLVM as-is today.

On Fri, Nov 13, 2015 at 7:52 AM, Clinton Mead <clintonmead at gmail.com> wrote:

> I'm looking for code that produces the results that the i128 code produces
> but by instead using multiple i64s. Like I said, this is Haskell primops
> generating the llvm code, so it would really good if I could just point
> LLVM in the right direction instead of having to write a new primop for
> GHC. If I can see working LLVM code that does an efficient addition with
> carry on multiple machine size words I can then perhaps work out the
> Haskell primops that would produce that code.
>
> On Wed, Nov 11, 2015 at 3:44 AM, David Jones <djones at xtreme-eda.com>
> wrote:
>
>> This might "just work" on many targets by using the appropriate integral
>> type.
>>
>> e.g. on x86_64, something like
>>
>> %a = add i128 %b, %c
>>
>> will lower to
>>
>> add rax,rbx
>> adc rcx,rdx
>>
>> Some caveats, based on my experience with LLVM 3.3, so things may have
>> changed by then:
>>
>> - Not all targets support operations on integers wider than the "native
>> width". e.g. on x86_64, trying to multiply i256 x i256 will net you an
>> assert. I believe i128 x i128 is handled, perhaps by a function call, but
>> nothing larger.
>> - At least on x86_64, everything is fully unrolled. So adding i2048 +
>> i2048 will generate an ADD and 31 ADC instructions, with no looping. This
>> makes sense for "small" multiprecision integers, but can get silly as
>> things get wider.
>>
>> In my project, I am using i128 in a few places (add, subtract, shift) to
>> get efficient code, and relying on my own library functions for anything
>> wider.
>>
>>
>> On Mon, Nov 9, 2015 at 11:46 PM, Clinton Mead via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> I'm trying to work out LLVM code which generates something similar to
>>> the following when adding large multiword numbers stored as separate words:
>>>
>>> ADD x1 x1
>>> ADC x2 y2
>>> ADC x3 y3
>>>
>>> etc, where such a three argument add like ADC on x86 (which includes a
>>> carry in the addition) is available as a machine op.
>>>
>>> The background to this is that I'm trying to implement fast multiword
>>> addition in Haskell, which can compile via LLVM, so I thought if I knew
>>> what LLVM code generates the ideal assembly I can work backwards and use
>>> the appropriate Haskell prim-ops.
>>>
>>> Of course one can use GMP, but I'd suspect with two-four word additions
>>> the loss of inlining opportunities and the cost of the external call would
>>> be greater than the addition itself. So it would be great to be able to
>>> work out how to do it in LLVM and then either implement it with current GHC
>>> or add an extra GHC prim-op which allows GHC to access the appropriate LLVM
>>> code.
>>>
>>> Sorry I don't know much about LLVM besides my brief readings of the
>>> docs, and I've posted here because I couldn't find an llvm-users list.
>>> There's a clang-users list, but I didn't think that would be appropriate
>>> because GHC compiles via LLVM directly, not via clang.
>>>
>>> Any help or guidance appreciated.
>>>
>>> Clinton
>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151113/297f2d3d/attachment-0001.html>