[LLVMdev] Modular arithmetic processors

Mon Nov 18 02:29:50 PST 2013

On Mon, Nov 18, 2013 at 4:02 AM, Shang-Yi Yang <ilway25 at gmail.com> wrote:

> Thanks for your insightful suggestions.
>
> Yes, I am programming for a real device that does modular arithmetic (and
> only modular arithmetic). The modulus N is fixed during a single launch of
> a program. One way I could also come up with is to simply use add i256 %a,
> %b to represent a + b mod n, and let LLVM passes to reason about possible
> optimizations. However these are not semantically identical and it may
> produce a wrong result, though in a rather low possibility.
>
> Here is another issue I have been thinking these days. As far as I know,
> legalizing a DAG node may expand the node to fit into the ISA. What happens
> if a node is not representable by any combination of the instructions from
> the ISA? An extremely opposite situation is that, two nodes must have some
> particular semantic in the DAG so it can be translate to a single
> instruction. For example, if MUL a, b of the ISA actually does a * b *
> 12345. So only
>
> %1 = mul i256 %a, %b
> %2 = mul i256 %1, i256 12345
>
> can be translated into a legal instruction (and they may be scattered
> around the DAG after some passes), but I think
>
> %1 = mul i256 %a, %b
>
> might fail to match. Is there any workaround?
>
> I'm afraid you've rather lost me here: is your instruction (which we'll
call I for convenience) something where the "a * b * 12345" is all visible
at the user level, or is it something that should be conceptualised as
"multiplication of a and b within a finite field" and the 12345 is just an
implementation detail that the compiler just has to orchestrate?

In any case, having machine instructions which come from more than one IR
instruction is possible in LLVM, see eg how the use of fma (fused
multiply-add) instructions (when allowed by floating point operations) can
occur:

http://llvm.org/docs/CodeGenerator.html#id44

This is in the context of an optimization, I don't know how things would
work out in practice if there's no simpler instructions around as a
fallback if matching doesn't occur.

Cheers,
Dave

>
> 2013/11/16 David Tweed <david.tweed at gmail.com>
>
>> Hi,
>>
>> My personal opinion: Just to be sure I understand what you're
>> considering: you want to write a backend that will produce optimized
>> machine code for a device with modular arithmetic instructions (not
>> simulate such a device on a standard CPU)? In which case, won't the same
>> assumptions that are embodied in the transformations for the case of
>> unsigned 2's complement arithmetic (in your case with 256 bit unsigned
>> words) with wraparound result in correct code? I suspect two issues will
>> come up:
>>
>
>> 1. The optimizations in LLVM which can reason about "useful for
>> optimization" features of wrapping unsigned values are probably not
>> particularly strong at the moment, since most interest tends to be in the
>> case of optimizations that are valid in the case it's known there won't be
>> a wraparound (eg, monotonically increasing of the result of adding
>> elements). This is going to be just a lack of implementation rather than a
>> fundamental issue.
>>
>> 2. It's not clear from your description whether changing the value in the
>> "N register" is a very infrequent event or not. If it happens so
>> infrequently you can get away with just putting compiler barriers before
>> and after the change that would fit reasonably. On the other hand if it's
>> changed frequently getting performance might require allowing the optimizer
>> to move instructions around significantly enough that it's necessary to
>> keep track of what the N register should be for each instruction. LLVM
>> doesn't really have any built-in notion of this, so you're going to have to
>> make significant changes to the compiler to do that. Whether this means
>> writing your own compiler would be less work I don't know.
>>
>> Hope this helps,
>>
>>
>> On Fri, Nov 15, 2013 at 9:15 AM, Shang-Yi Yang <ilway25 at gmail.com> wrote:
>>
>>> I've been playing around with LLVM to write a backend for a rather
>>> "simple" (co-)processor. Assume that only three arithmetic instructions
>>> exist: ADD mod N, SUB mod N and MUL mod N. The modulus N is programmable
>>> and stored in a register. No ordinary arithmetic instructions are
>>> available. The word size is 256-bit.
>>>
>>> In other words, the following function, b + c mod N, corresponds to only
>>> one instruction on my target machine, given the modulus N set by another
>>> instruction to an auxiliary register.
>>>
>>> # this should translate to one instruction like: r3 = add r1, r2
>>> # n is extended to 257-bit for simplicity
>>> define i256 @add(i256 %b, i256 %c, i257 %n) {
>>>   %1 = zext i256 %b to i257
>>>   %2 = zext i256 %c to i257
>>>   %3 = add i257 %2, %1
>>>   %4 = icmp uge i257 %3, %n
>>>   %5 = select i1 %4, i257 %n, i257 0
>>>   %6 = sub i257 %3, %5
>>>   %7 = trunc i257 %6 to i256
>>>   ret i256 %7
>>> }
>>>
>>> The ultimate goal is to use LLVM to optimize and emit machine code for
>>> programs full of modular addition, subtraction and multiplication. For
>>> example, I might want off-the-shelf LLVM passes to optimize (a+b+c)-(a+c)
>>> mod p = b mod p for me. For a large syntax tree, say more than 10,000
>>> nodes, I might even need common subexpression elimination. Of course, we
>>> can target the same program to ARM or X86 for performance comparison.
>>>
>>> The questions is: We have to recognize the patterns for modular
>>> arithmetic from optimized IR and translate into corresponding instructions.
>>> After investigation, I think LLVM can not do a good job on such a simple
>>> but unordinary instruction set. Is it possible that the patterns still
>>> recognizable after optimization? On the whole, is this viable using LLVM or
>>> is it better to write a compiler for this particular DSL from scratch?
>>>
>>> Thanks,
>>> Shang-Yi Yang
>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>
>>>
>>
>>
>> --
>> cheers, dave tweed__________________________
>> high-performance computing and machine vision expert:
>> david.tweed at gmail.com
>> "while having code so boring anyone can maintain it, use Python." --
>> attempted insult seen on slashdot
>>
>>
>
>

-- 
cheers, dave tweed__________________________
high-performance computing and machine vision expert: david.tweed at gmail.com
"while having code so boring anyone can maintain it, use Python." --
attempted insult seen on slashdot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131118/275400c2/attachment.html>