[llvm-dev] Adding new vector instructions to LLVM Sparc backend
shivam gupta via llvm-dev
llvm-dev at lists.llvm.org
Tue Feb 25 10:18:34 PST 2020
Hello all,
As a major degree project, I started working on adding vector instruction
to the LLVM Sparc(modify for AJIT processor) backend.
My work is to implement VADDD, VSUBD, VUMULD, VSMULD instructions.
Their instruction format is as follows:-
31-30 op (always 10)
29-25 rd
24-19 op3
18-14 rs1
13 i (always 1)
12-10 (unused)
9-7 (datatype 8->001, 16->010, 32->100)
6-5 (always 10)
4-0 (rs2)
https://llvm.org/docs/ExtendingLLVM.html suggest me to use LLVM Custom
Intrinsic to represent this VADDD operation. Is there any detail example
code for other architectures available to look at?
Am I need to define a new class in SparcInsFormat.td
<https://github.com/llvm-mirror/llvm/blob/master/lib/Target/Sparc/SparcInstrFormats.td#L106>
because these instructions can't use predefined format-3 class of other
arithmetic instructions(8-bit felid of asi changed to specify vector
datatype)?
Does the implementation of Sparc VIS
<https://github.com/llvm/llvm-project/blob/master/llvm/lib/Target/Sparc/SparcInstrVIS.td>
resemble with these instructions?
May some LLVM backend experts give me an initial idea on what steps should
I take to add these instructions?
I have gone through LLVM target-independent code generator documentation.
SPARC architecture manual and AJIT processor ISA is attached to the mail.
https://www.gaisler.com/doc/sparcv8.pdf
Thanks and Regards,
Shivam
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200225/86745946/attachment-0001.html>
-------------- next part --------------
64-bit ISA extensions to the AJIT processor
Madhav Desai
1. Overview
--------
The AJIT processor implements the Sparc-V8 ISA. We propose
to extend this ISA to provide support for a native 64-bit
integer datatype. The proposed extensions use the existing
instruction encodings to the maximum extent possible.
All proposed extensions are RegisterXRegister -> Register,Condition-codes
type instructions. The load/store instructions are not modified.
We list the additional instructions in the subsequent sections.
In each case, only the differences in the encoding relative
to an existing Sparc-V8 instruction are provided.
2. Integer-unit extensions: Arithmetic-logic instructions
-------------------------------------------------------
These instructions provide 64-bit arithmetic/logic support
in the integer unit. The instructions work on 64-bit register
pairs in most cases. Register-pairs are identified by a 5-bit
even number (lowest bit must be 0).
ADDD
encoding: same as ADD, but with Instr[13]=0 (i=0), and Instr[5]=1.
rd(pair) <- rs1(pair) + rs2(pair)
ADDDCC
encoding: same as ADDCC, but with Instr[13]=0 (i=0), and Instr[5]=1.
rd(pair) <- rs1(pair) + rs2(pair), set Z,N
SUBD
encoding: same as SUB, but with Instr[13]=0 (i=0), and Instr[5]=1.
rd(pair) <- rs1(pair) - rs2(pair)
SUBDCC
encoding: same as SUBCC, but with Instr[13]=0 (i=0), and Instr[5]=1.
rd(pair) <- rs1(pair) - rs2(pair), set Z,N
// shifts
SLLD
encoding: same as SLL, but with Instr[6:5]=2.
if imm bit (Instr[13]) is 1, then Instr[5:0] is the shift-amount.
else shift-amount is the lowest 5 bits of rs2. Note that rs2
is a 32-bit register.
rd(pair) <- rs1(pair) << shift-amount
SRLD
encoding: same as SRL, but with Instr[6:5]=2.
if imm bit (Instr[13]) is 1, then Instr[5:0] is the shift-amount.
else shift-amount is the lowest 5 bits of rs2. Note that rs2
is a 32-bit register.
rd(pair) <- rs1(pair) >> shift-amount
SRAD
encoding: same as SRA, but with Instr[6:5]=2.
if imm bit (Instr[13]) is 1, then Instr[5:0] is the shift-amount.
else shift-amount is the lowest 5 bits of rs2. Note that rs2
is a 32-bit register.
rd(pair) <- rs1(pair) >> shift-amount (with sign extension).
// mul/div
UMULD
encoding: same as UMUL, but with Instr[13]=0 (i=0), and Instr[5]=1.
rd(pair) <- rs1(pair) * rs2(pair)
UMULDCC
encoding: same as UMULCC, but with Instr[13]=0 (i=0), and Instr[5]=1.
rd(pair) <- rs1(pair) * rs2(pair), sets Z, Ovflow
SMULD
encoding: same as SMULD, but with Instr[13]=0 (i=0), and Instr[5]=1.
rd(pair) <- rs1(pair) * rs2(pair) (signed)
SMULDCC
encoding: same as SMULCC, but with Instr[13]=0 (i=0), and Instr[5]=1.
rd(pair) <- rs1(pair) * rs2(pair) (signed)
sets condition codes Z,N,Ovflow
UDIVD
encoding: same as UDIV, but with Instr[13]=0 (i=0), and Instr[5]=1.
rd(pair) <- rs1(pair) / rs2(pair)
note: can generate div-by-zero trap.
UDIVDCC
encoding: same as UDIVCC, but with Instr[13]=0 (i=0), and Instr[5]=1.
rd(pair) <- rs1(pair) / rs2(pair)
sets condition codes Z,Ovflow
note: can generate div-by-zero trap.
SDIVD
encoding: same as SDIV, but with Instr[13]=0 (i=0), and Instr[5]=1.
rd(pair) <- rs1(pair) / rs2(pair) (signed)
SDIVDCC
encoding: same as SDIVCC, but with Instr[13]=0 (i=0), and Instr[5]=1.
rd(pair) <- rs1(pair) / rs2(pair) (signed)
sets condition codes Z,N,Ovflow
note: can generate div-by-zero trap.
// 64-bit logical.
ORD
encoding: same as OR, but with Instr[13]=0 (i=0), and Instr[5]=1.
rd(pair) <- rs1(pair) | rs2(pair)
ORDCC
encoding: same as ORCC, but with Instr[13]=0 (i=0), and Instr[5]=1.
rd(pair) <- rs1(pair) | rs2(pair), sets Z.
ORDN
encoding: same as ORN, but with Instr[13]=0 (i=0), and Instr[5]=1.
rd(pair) <- rs1(pair) | (~rs2(pair))
ORDNCC
encoding: same as ORNCC, but with Instr[13]=0 (i=0), and Instr[5]=1.
rd(pair) <- rs1(pair) | (~rs2(pair)), sets Z
sets Z.
XORD
encoding: same as XOR, but with Instr[13]=0 (i=0), and Instr[5]=1.
rd(pair) <- rs1(pair) ^ rs2(pair)
XORDCC
encoding: same as XORCC, but with Instr[13]=0 (i=0), and Instr[5]=1.
rd(pair) <- rs1(pair) ^ rs2(pair), sets Z
sets Z.
XNORD
encoding: same as XNOR, but with Instr[13]=0 (i=0), and Instr[5]=1.
rd(pair) <- rs1(pair) ^ rs2(pair)
XNORDCC
encoding: same as XNORCC, but with Instr[13]=0 (i=0), and Instr[5]=1.
rd(pair) <- rs1(pair) ^ rs2(pair), sets Z
ANDD
encoding: same as AND, but with Instr[13]=0 (i=0), and Instr[5]=1.
rd(pair) <- rs1(pair) . rs2(pair)
ANDDCC
encoding: same as ANDCC, but with Instr[13]=0 (i=0), and Instr[5]=1.
rd(pair) <- rs1(pair) . rs2(pair), sets Z
ANDDN
encoding: same as ANDN, but with Instr[13]=0 (i=0), and Instr[5]=1.
rd(pair) <- rs1(pair) . (~rs2(pair))
ANDDNCC
encoding: same as ANDNCC, but with Instr[13]=0 (i=0), and Instr[5]=1.
rd <- rs1 . (~rs2), sets Z
3. Integer-unit extensions: SIMD instructions
-------------------------------------------------------
These instructions are vector instructions which work on
two source registers (each a 64 bit register pair), and
produce a 64-bit vector result. The vector elements can
be 8-bit/16-bit/32-bit.
VADDD8, VADDD16, VADDD32
encoding: same as ADDD, but with Instr[13]=0 (i=0), and Instr[6:5]=2.
bits Instr[9:7] are a 3-bit field, which specify the data
type
001 byte (VADDD8)
010 half-word (16-bits) (VADDD16)
100 word (32-bits) (VADDD32)
performs a vector operation by considering the 64-bit operands as
a vector of objects with specified data-type.
vadd8 rs1,rs2, rd
vadd16
vadd32
VSUBD8, VSUBD16, VSUBD32
encoding: same as SUBD, but with Instr[13]=0 (i=0), and Instr[6:5]=2.
bits Instr[9:7] are a 3-bit field, which specify the data
type
001 byte (VSUBD8)
010 half-word (16-bits) (VSUBD16)
100 word (32-bits) (VSUBD32)
performs a vector operation by considering the 64-bit operands as
a vector of objects with specified data-type.
VUMULD8, VUMULD16, VUMULD32
encoding: same as UMULD, but with Instr[13]=0 (i=0), and Instr[6:5]=2.
bits Instr[9:7] are a 3-bit field, which specify the data
type
001 byte (VMULD8)
010 half-word (16-bits) (VMULD16)
100 word (32-bits) (VMULD32)
performs a vector operation by considering the 64-bit operands as
a vector of objects with specified data-type.
VSMULD8, VSUMLD16, VSMULD32
encoding: same as SMULD, but with Instr[13]=0 (i=0), and Instr[6:5]=2.
bits Instr[9:7] are a 3-bit field, which specify the data
type
001 byte (VSMULD8)
010 half-word (16-bits) (VSMULD16)
100 word (32-bits) (VSMULD32)
performs a vector operation by considering the 64-bit operands as
a vector of objects with specified data-type.
4. Integer-unit extensions: SIMD instructions
-------------------------------------------------------
These instructions are vector instructions which reduce
a source register to a byte result.
// byte-reduce or
ADDDBYTER
op=2, op3[3:0]=0xd, op3[5:4]=0x2, contents[7:0] of rs2 specify a mask.
encoding
Instr[31:30] (op) = 0x2
Instr[29:25] (rd) 32-bit register.
Instr[24:19] (op3) = 101101
Instr[18:14] (rs1) lowest bit assumed 0.
Instr[13] (i) = 0 (ignored)
Instr[12:5] (zero)
Instr[4:0] (rs2) 32-bit register is read.
rd <- (rs1_7.m7 + rs1_6.m6 + rs1_5.m5 ... + rs1_0.m0)
(The final sum will be a 13-bit number, stored
in the least significant bytes. It
is up to software to decide which byte(s) to
use).
addbyter %rs1, %rs2/imm, rd
// byte-reduce or
ORDBYTER
op=2, op3[3:0]=0xe, op3[5:4]=0x2, contents[7:0] of rs2 specify a mask.
encoding
Instr[31:30] (op) = 0x2
Instr[29:25] (rd) rd is a 32-bit register.
Instr[24:19] (op3) = 101110
Instr[18:14] (rs1) lowest bit assumed 0.
Instr[13] (i) = 0 (ignored)
Instr[12:5] (zero)
Instr[4:0] (rs2) 32-bit register is read.
rd <- (rs1_7.m7 | rs1_6.m6 | rs1_5.m5 ... | rs1_0.m0)
// byte-reduce and
ANDDBYTER
op=2, op3[3:0]=0xf, op3[5:4]=0x2, contents[7:0] of rs2 specify a mask.
encoding
Instr[31:30] (op) = 0x2
Instr[29:25] (rd) rd is a 32-bit register.
Instr[24:19] (op3) = 101111
Instr[18:14] (rs1) lowest bit assumed 0.
Instr[13] (i) = 0 (ignored)
Instr[12:5] (zero)
Instr[4:0] (rs2) 32-bit register is read.
rd <- ( (m7 ? rs1_7 : 0xff) . (m6 ? rs1_6 : 0xff) .... (m0 ? rs1_0 : 0xff))
// byte-reduce xor
XORDBYTER
op=2, op3[3:0]=0xe, op3[5:4]=0x3, contents[7:0] of rs2 specify a mask.
encoding
Instr[31:30] (op) = 0x2
Instr[29:25] (rd) rd is a 32-bit register.
Instr[24:19] (op3) = 111110
Instr[18:14] (rs1) lowest bit assumed 0.
Instr[13] (i) = 0 (ignored)
Instr[12:5] (zero)
Instr[4:0] (rs2) 32-bit register is read.
rd <- (rs1_7.m7 ^ rs1_6.m6 ^ rs1_5.m5 ... ^ rs1_0.m0)
// positions-of-zero-bytes in d-word.
ZBYTEDPOS
op=2, op3[3:0]=0xf, op3[5:4]=0x3, contents[7:0] of rs2/imm-value specify a mask.
encoding
Instr[31:30] (op) = 0x2
Instr[29:25] (rd) rd is a 32-bit register.
Instr[24:19] (op3) = 111111
Instr[18:14] (rs1) lowest bit assumed 0.
Instr[13] (i) = if 0, use rs2, else Instr[7:0]
Instr[12:5] = 0 (ignored if i=0)
Instr[4:0] (rs2, if i=0)
32-bit register is read.
rd <- [b7_zero b6_zero b5_zero b4_zero .. b0_zero]
(if mask-bit is zero then b*_zero is zero)
5. Vector floating point instructions
---------------------------------------
These are vector float operations which work
on two single precision operand pairs to
produce two single precision results.
// SIMD float ops.
// NaN propagated, but no traps.
// For each of these, rs1,rs2,rd are
// considered even numbers pointing to
// a floating point register-pair.
//
VFADD
op=2, op3=0x34, opf=0x142
vfadd %f1, %f2, %f3
VFSUB
op=2, op3=0x34, opf=0x146
VFMUL
op=2, op3=0x34, opf=0x14a
VFDIV
op=2, op3=0x34, opf=0x14e
VFSQRT
op=2, op3=0x34, opf=0x12a
6. CSWAP insruction
---------------------------------------
The Sparc-V8 ISA does not include a compare-and-swap (CAS) instruction
which is very useful in achieving consensus among distributed agents
when the number of agents is > 2.
We introduce a CSWAP instruction in two flavours
CSWAPD rs1, rs2-pair/immediate, rd-pair
op=3
op3= 10 1111
(rest of instruction similar to SWAP)
CSWAPDA rs1, rs2-pair/immediate, rd-pair, asi
op=3
op3= 11 1111
(rest of instruction similar to SWAPA)
The semantics of the instruction (the entire sequence is atomic)
TMPVAL = mem[rs1] (load double, lock system bus)
if <rs2-pair/immediate> == TMPVAL
(store double, unlock) mem[rs1] = <rd-pair>
<rd-pair> = TMPVAL
else
(store double, unlock) mem[rs1] = TMPVAL
The write under else is redundant but is required in order to unlock the bus.
Similar to SWAP,
- mem[rs1] is left either with its value prior to the instruction or
with the value in rd-pair.
- <rd-pair> is left either with its value prior to the instruction or
with the value in mem[rs1].
The processor can check rd-pair after execution to confirm if the swap
succeeded.
More information about the llvm-dev
mailing list