[llvm-dev] multiply-accumulate instruction

Mon Sep 21 06:16:10 PDT 2015

I think the canonical example is likely to be matrix multiplication, but any kind of "sum of products"-type method would, I expect, be improved by using this instruction.

e.g. (and I haven't syntax-checked this, so apologies for any errors):

int SumOfProducts(int xs[], int ys[], int count)
{
   int sum = 0; // Use ASR18 & Y for the SMAC / UMAC instructions.

   for (int index=0 ; index<count ; index++)
   {
      sum += xs[index] * ys[index]; // Could all be done with SMAC in one instruction.
   }

   return sum; // Needs retrieving from ASR18 & Y
}
________________________________________
From: Hal Finkel [hfinkel at anl.gov]
Sent: 21 September 2015 13:48
To: Chris.Dewhurst
Cc: llvm-dev at lists.llvm.org; James Y Knight
Subject: Re: [llvm-dev] multiply-accumulate instruction

----- Original Message -----
> From: "Chris.Dewhurst via llvm-dev" <llvm-dev at lists.llvm.org>
> To: "James Y Knight" <jyknight at google.com>
> Cc: llvm-dev at lists.llvm.org
> Sent: Monday, September 21, 2015 2:43:30 AM
> Subject: Re: [llvm-dev] multiply-accumulate instruction
>
> I've been looking to see if there's a way to get the instruction
> below (SMAC) emitted from a higher-level construct, but I'm starting
> to think this is unrealistic.
>
> To do so, I'd have to tie-in two other instructions: Firstly,
> clearing the ASR18 and Y register somewhere near the start of the
> method, then copying out the value of these registers somewhere near
> the end of the method, or wherever the value needs to be used.
>
> In addition, it would only make sense to use the construct inside a
> loop of some form, otherwise, some variation on MUL would be better.
> That would either require detecting the loop, or optimising further
> down the line to convert the above construct *into* a simple MUL.
>
> This now feels to me to be unrealistic and likely to be prone to
> bugs.
>
> On that basis, I'm going to go with the simple "assembler-only
> support" recommended below, unless anyone can recommend a simple way
> of achieving the above (and direct me to a suitable reference). I
> can't find anything sufficiently similar in any of the other
> processors supported by LLVM.

Can you provide an example or two (written in C is fine) showing the kinds of loops or sequences of operations you're trying to pattern match to use this instruction. I don't know of anything that works exactly like this, but some targets do have IR-level preprocessing passes to use certain kinds of intrinsics (lib/Target/PowerPC/PPCCTRLoops.cpp for an example involving loops). There may be other ways to do this as well. I'd not give up so easily, but I need to see some concrete examples in order to provide advise.

 -Hal

>
>
> Thanks for the feedback
> Chris Dewhurst
> University of Limerick.
>
>
>
> From: James Y Knight [jyknight at google.com]
> Sent: 18 September 2015 16:39
> To: Chris.Dewhurst
> Cc: llvm-dev at lists.llvm.org
> Subject: Re: [llvm-dev] multiply-accumulate instruction
>
>
>
>
> Do you only want to define assembler syntax for this, or do you need
> to be able to be able to automatically emit it from some higher
> level construct? I'd expect the former would be entirely sufficient,
> in which case this should be sufficient:
>
>
> let Predicates = [HasLeon3, HasLeon4], Defs = [Y, ASR18], Uses = [Y,
> ASR18] in
>
> def SMACrr : F3_1<3, 0b111110,
>
> (outs IntRegs:$rd), (ins IntRegs:$rs1, IntRegs:$rs2),
>
> "smac $rs1, $rs2, $rd",
>
> [ ]>;
>
>
>
>
> If you want the latter, I'm not sure how you'd go about being able to
> pattern-match it, because of the unusual 40 bit accumulate input and
> output, and the unusual for sparc 16-bit inputs. Hopefully you don't
> really need that. :)
>
>
> On Fri, Sep 18, 2015 at 10:19 AM, Chris.Dewhurst via llvm-dev <
> llvm-dev at lists.llvm.org > wrote:
>
>
>
>
>
>
> I’m trying to define a multiply-accumulate instruction for the LEON
> processor, a Subtarget of the Sparc target.
>
>
>
> The documentation for the processor is as follows:
>
>
>
> ===
>
> To accelerate DSP algorithms, two multiply&accumulate instructions
> are implemented: UMAC and SMAC. The UMAC performs an unsigned 16-bit
> multiply, producing a 32-bit result, and adds the result to a 40-bit
> accumulator made up by the 8 lsb bits from the %y register and the
> %asr18 register. The least significant 32 bits are also written to
> the destination register. SMAC works similarly but performs signed
> multiply and accumulate. The MAC instructions execute in one clock
> but have two clocks latency, meaning that one pipeline stall cycle
> will be inserted if the following instruction uses the destination
> register of the MAC as a source operand.
>
>
>
> Assembler syntax:
>
> smac rs1, reg_imm, rd
>
>
>
> Operation:
>
> prod[31:0] = rs1[15:0] * reg_imm[15:0]
>
> result[39:0] = (Y[7:0] & %asr18[31:0]) + prod[31:0]
>
> (Y[7:0] & %asr18[31:0]) = result[39:0]
>
> rd = result[31:0]
>
>
>
> %asr18 can be read and written using the rdasr and wrasr
> instructions.
>
> ===
>
>
>
> I have the following in SparcInstrInfo to define the lowering rules
> for this instruction, but I feel that this isn’t likely to work as I
> need to somehow tie together the fact that %Y, %ASR18 and %rd are
> all related to each other in the output.
>
>
>
> let Predicates = [HasLeon3, HasLeon4], Defs = [Y, ASR18], Uses = [Y,
> ASR18] in
>
> def SMACrr : F3_1<3, 0b111110,
>
> (outs IntRegs:$rd), (ins IntRegs:$rs1, IntRegs:$rs2, ASRRegs:$asr18),
>
> "smac $rs1, $rs2, $rd",
>
> [(set i32:$rd,
>
> (add i32:$asr18, (mul i32:$rs1, i32:$rs2)))] >;
>
>
>
> Perhaps a well-chosen “let Constraints=” might be used here? If so,
> I’m not sure I know what to put in there. If not, can anyone help me
> how I might define the lowering rules for this instruction please?
>
>
>
> Chris Dewhurst, University of Limerick.
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory