[llvm-dev] multiply-accumulate instruction
    Hal Finkel via llvm-dev 
    llvm-dev at lists.llvm.org
       
    Tue Sep 29 16:07:14 PDT 2015
    
    
  
----- Original Message -----
> From: "Chris.Dewhurst" <Chris.Dewhurst at lero.ie>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: llvm-dev at lists.llvm.org, "James Y Knight" <jyknight at google.com>
> Sent: Monday, September 21, 2015 8:16:10 AM
> Subject: RE: [llvm-dev] multiply-accumulate instruction
> 
> I think the canonical example is likely to be matrix multiplication,
> but any kind of "sum of products"-type method would, I expect, be
> improved by using this instruction.
> 
> e.g. (and I haven't syntax-checked this, so apologies for any
> errors):
> 
> int SumOfProducts(int xs[], int ys[], int count)
> {
>    int sum = 0; // Use ASR18 & Y for the SMAC / UMAC instructions.
> 
>    for (int index=0 ; index<count ; index++)
>    {
>       sum += xs[index] * ys[index]; // Could all be done with SMAC in
>       one instruction.
>    }
> 
>    return sum; // Needs retrieving from ASR18 & Y
> }
Using a late target-specific IR-level pass (such as lib/Target/PowerPC/PPCCTRLoops.cpp) to recognize this pattern and produce some appropriate target intrinsics should definitely work for this. Also, we already have code to recognize these kinds of reductions in the loop vectorizer that you should fine useful.
 -Hal
> ________________________________________
> From: Hal Finkel [hfinkel at anl.gov]
> Sent: 21 September 2015 13:48
> To: Chris.Dewhurst
> Cc: llvm-dev at lists.llvm.org; James Y Knight
> Subject: Re: [llvm-dev] multiply-accumulate instruction
> 
> ----- Original Message -----
> > From: "Chris.Dewhurst via llvm-dev" <llvm-dev at lists.llvm.org>
> > To: "James Y Knight" <jyknight at google.com>
> > Cc: llvm-dev at lists.llvm.org
> > Sent: Monday, September 21, 2015 2:43:30 AM
> > Subject: Re: [llvm-dev] multiply-accumulate instruction
> >
> > I've been looking to see if there's a way to get the instruction
> > below (SMAC) emitted from a higher-level construct, but I'm
> > starting
> > to think this is unrealistic.
> >
> > To do so, I'd have to tie-in two other instructions: Firstly,
> > clearing the ASR18 and Y register somewhere near the start of the
> > method, then copying out the value of these registers somewhere
> > near
> > the end of the method, or wherever the value needs to be used.
> >
> > In addition, it would only make sense to use the construct inside a
> > loop of some form, otherwise, some variation on MUL would be
> > better.
> > That would either require detecting the loop, or optimising further
> > down the line to convert the above construct *into* a simple MUL.
> >
> > This now feels to me to be unrealistic and likely to be prone to
> > bugs.
> >
> > On that basis, I'm going to go with the simple "assembler-only
> > support" recommended below, unless anyone can recommend a simple
> > way
> > of achieving the above (and direct me to a suitable reference). I
> > can't find anything sufficiently similar in any of the other
> > processors supported by LLVM.
> 
> Can you provide an example or two (written in C is fine) showing the
> kinds of loops or sequences of operations you're trying to pattern
> match to use this instruction. I don't know of anything that works
> exactly like this, but some targets do have IR-level preprocessing
> passes to use certain kinds of intrinsics
> (lib/Target/PowerPC/PPCCTRLoops.cpp for an example involving loops).
> There may be other ways to do this as well. I'd not give up so
> easily, but I need to see some concrete examples in order to provide
> advise.
> 
>  -Hal
> 
> >
> >
> > Thanks for the feedback
> > Chris Dewhurst
> > University of Limerick.
> >
> >
> >
> > From: James Y Knight [jyknight at google.com]
> > Sent: 18 September 2015 16:39
> > To: Chris.Dewhurst
> > Cc: llvm-dev at lists.llvm.org
> > Subject: Re: [llvm-dev] multiply-accumulate instruction
> >
> >
> >
> >
> > Do you only want to define assembler syntax for this, or do you
> > need
> > to be able to be able to automatically emit it from some higher
> > level construct? I'd expect the former would be entirely
> > sufficient,
> > in which case this should be sufficient:
> >
> >
> > let Predicates = [HasLeon3, HasLeon4], Defs = [Y, ASR18], Uses =
> > [Y,
> > ASR18] in
> >
> > def SMACrr : F3_1<3, 0b111110,
> >
> > (outs IntRegs:$rd), (ins IntRegs:$rs1, IntRegs:$rs2),
> >
> > "smac $rs1, $rs2, $rd",
> >
> > [ ]>;
> >
> >
> >
> >
> > If you want the latter, I'm not sure how you'd go about being able
> > to
> > pattern-match it, because of the unusual 40 bit accumulate input
> > and
> > output, and the unusual for sparc 16-bit inputs. Hopefully you
> > don't
> > really need that. :)
> >
> >
> > On Fri, Sep 18, 2015 at 10:19 AM, Chris.Dewhurst via llvm-dev <
> > llvm-dev at lists.llvm.org > wrote:
> >
> >
> >
> >
> >
> >
> > I’m trying to define a multiply-accumulate instruction for the LEON
> > processor, a Subtarget of the Sparc target.
> >
> >
> >
> > The documentation for the processor is as follows:
> >
> >
> >
> > ===
> >
> > To accelerate DSP algorithms, two multiply&accumulate instructions
> > are implemented: UMAC and SMAC. The UMAC performs an unsigned
> > 16-bit
> > multiply, producing a 32-bit result, and adds the result to a
> > 40-bit
> > accumulator made up by the 8 lsb bits from the %y register and the
> > %asr18 register. The least significant 32 bits are also written to
> > the destination register. SMAC works similarly but performs signed
> > multiply and accumulate. The MAC instructions execute in one clock
> > but have two clocks latency, meaning that one pipeline stall cycle
> > will be inserted if the following instruction uses the destination
> > register of the MAC as a source operand.
> >
> >
> >
> > Assembler syntax:
> >
> > smac rs1, reg_imm, rd
> >
> >
> >
> > Operation:
> >
> > prod[31:0] = rs1[15:0] * reg_imm[15:0]
> >
> > result[39:0] = (Y[7:0] & %asr18[31:0]) + prod[31:0]
> >
> > (Y[7:0] & %asr18[31:0]) = result[39:0]
> >
> > rd = result[31:0]
> >
> >
> >
> > %asr18 can be read and written using the rdasr and wrasr
> > instructions.
> >
> > ===
> >
> >
> >
> > I have the following in SparcInstrInfo to define the lowering rules
> > for this instruction, but I feel that this isn’t likely to work as
> > I
> > need to somehow tie together the fact that %Y, %ASR18 and %rd are
> > all related to each other in the output.
> >
> >
> >
> > let Predicates = [HasLeon3, HasLeon4], Defs = [Y, ASR18], Uses =
> > [Y,
> > ASR18] in
> >
> > def SMACrr : F3_1<3, 0b111110,
> >
> > (outs IntRegs:$rd), (ins IntRegs:$rs1, IntRegs:$rs2,
> > ASRRegs:$asr18),
> >
> > "smac $rs1, $rs2, $rd",
> >
> > [(set i32:$rd,
> >
> > (add i32:$asr18, (mul i32:$rs1, i32:$rs2)))] >;
> >
> >
> >
> > Perhaps a well-chosen “let Constraints=” might be used here? If so,
> > I’m not sure I know what to put in there. If not, can anyone help
> > me
> > how I might define the lowering rules for this instruction please?
> >
> >
> >
> > Chris Dewhurst, University of Limerick.
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >
> >
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >
> 
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
> 
-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory
    
    
More information about the llvm-dev
mailing list