R600/SI Patchset: Initial support for compute shaders

Tom Stellard tom at stellard.net
Mon Mar 11 06:59:09 PDT 2013


On Mon, Mar 11, 2013 at 10:03:53AM +0100, Christian K?nig wrote:
> Am 08.03.2013 22:29, schrieb Tom Stellard:
> > Hi,
> >
> > The attached patches add support for the BUFFER_STORE instruction and
> > make it possible to run simple compute shaders on Southern Islands
> > hardware with the open source radeonsi GPU driver in
> > Mesa (http://www.mesa3d.org).
> >
> > -Tom
> >
> > 0001-R600-SI-Avoid-generating-S_MOVs-with-64-bit-immediat.patch
> >
> >
> > >From aa3e075e2f0df66e2fe50018704aee28904155f0 Mon Sep 17 00:00:00 2001
> > From: Tom Stellard<thomas.stellard at amd.com>
> > Date: Wed, 6 Mar 2013 17:28:55 +0000
> > Subject: [PATCH 1/5] R600/SI: Avoid generating S_MOVs with 64-bit immediates
> >
> > SITargetLowering::analyzeImmediate() was converting the 64-bit values
> > to 32-bit and then checking if they were an inline immediate.  Some
> > of these conversions caused this check to succeed and produced
> > S_MOV instructions with 64-bit immediates, which are illegal.
> > ---
> >   lib/Target/R600/SIISelLowering.cpp |   11 +++++++++--
> >   test/CodeGen/R600/imm.ll           |   34 ++++++++++++++++++++++++++++++++++
> >   2 files changed, 43 insertions(+), 2 deletions(-)
> >   create mode 100644 test/CodeGen/R600/imm.ll
> >
> > diff --git a/lib/Target/R600/SIISelLowering.cpp b/lib/Target/R600/SIISelLowering.cpp
> > index fead115..74b1dc5 100644
> > --- a/lib/Target/R600/SIISelLowering.cpp
> > +++ b/lib/Target/R600/SIISelLowering.cpp
> > @@ -426,9 +426,16 @@ int32_t SITargetLowering::analyzeImmediate(const SDNode *N) const {
> >       float F;
> >     } Imm;
> >   
> > -  if (const ConstantSDNode *Node = dyn_cast<ConstantSDNode>(N))
> > +  if (const ConstantSDNode *Node = dyn_cast<ConstantSDNode>(N)) {
> > +    if (Node->getValueType(0) == MVT::i64) {
> > +      int64_t Imm64 = Node->getZExtValue();
> > +      if (Imm64 >> 32) {
> > +        return -1;
> > +      }
> > +      Imm.I = Imm64;
> > +    }
> >       Imm.I = Node->getSExtValue();
> 
> I wouldn't write it like this, setting Imm.I twice is a bit superfluous. 
> Also I don't think we should look at the ValueType, maybe something like 
> this should be more appropriate:
> 
> if (const ConstantSDNode *Node = dyn_cast<ConstantSDNode>(N)) {
>    int64_t Imm64 = Node->getSExtValue();
>    if (Imm64 >> 32)
>      return -1;
>    Imm.I = Imm64;
> } else...
>

This make sense, but what if this function were used on a 32-bit
immediate?  Wouldn't if(Imm64 >> 32) be true for all negative integers?
If so, then we would miss some opportunities for using inline
immediates.

> That should also work if anybody ever has the stupid idea to put an i64 immediate onto a *_F64 operation.
> 
> > -  else if (const ConstantFPSDNode *Node = dyn_cast<ConstantFPSDNode>(N))
> > +  } else if (const ConstantFPSDNode *Node = dyn_cast<ConstantFPSDNode>(N))
> >       Imm.F = Node->getValueAPF().convertToFloat();
> >     else
> >       return -1; // It isn't an immediate
> > diff --git a/test/CodeGen/R600/imm.ll b/test/CodeGen/R600/imm.ll
> > new file mode 100644
> > index 0000000..20f9dec
> > --- /dev/null
> > +++ b/test/CodeGen/R600/imm.ll
> > @@ -0,0 +1,34 @@
> > +; RUN: llc < %s -march=r600 -mcpu=SI | FileCheck %s
> > +
> > +; XXX: Enable once SI supports buffer stores
> > +; XFAIL: *
> > +
> > +; CHECK: @i64_imm
> > +; CHECK-NOT: S_MOV_B64
> > +define void @i64_imm(i64 addrspace(1)* %out) {
> > +entry:
> > +  store i64 1311768465173141112, i64 addrspace(1) *%out  ; 0x1234567812345678
> > +  ret void
> > +}
> > +
> > +; Use a 64-bit value with lo bits that can be represented as an inline constant
> > +; CHECK: @i64_imm_inline_lo
> > +; CHECK: S_MOV_B32 [[LO:SGPR[0-9]+]], 5
> > +; CHECK: V_MOV_B32_e32 [[LO_VGPR:VGPR[0-9]+]], [[LO]]
> > +; CHECK: BUFFER_STORE_DWORDX2 [[LO_VGPR]]_
> > +define void @i64_imm_inline_lo(i64 addrspace(1) *%out) {
> > +entry:
> > +  store i64 1311768464867721221, i64 addrspace(1) *%out ; 0x1234567800000005
> > +  ret void
> > +}
> > +
> > +; Use a 64-bit value with hi bits that can be represented as an inline constant
> > +; CHECK: @i64_imm_inline_hi
> > +; CHECK: S_MOV_B32 [[HI:SGPR[0-9]+]], 5
> > +; CHECK: V_MOV_B32_e32 [[HI_VGPR:VGPR[0-9]+]], [[HI]]
> > +; CHECK: BUFFER_STORE_DWORDX2 {{VGPR[0-9]+}}_[[HI_VGPR]]
> > +define void @i64_imm_inline_hi(i64 addrspace(1) *%out) {
> > +entry:
> > +  store i64 21780256376, i64 addrspace(1) *%out ; 0x0000000512345678
> > +  ret void
> > +}
> > -- 1.7.3.4
> >
> > 0002-R600-SI-Add-processor-types-for-each-SI-variant.patch
> >
> >
> > >From 018c05ab8a7e9f6da5a6f7d2547da61671f82c3e Mon Sep 17 00:00:00 2001
> > From: Tom Stellard<thomas.stellard at amd.com>
> > Date: Fri, 8 Mar 2013 13:28:18 -0500
> > Subject: [PATCH 2/5] R600/SI: Add processor types for each SI variant
> >
> > ---
> >   lib/Target/R600/AMDILDeviceInfo.cpp |    3 ++-
> >   lib/Target/R600/Processors.td       |    6 ++++--
> >   2 files changed, 6 insertions(+), 3 deletions(-)
> >
> > diff --git a/lib/Target/R600/AMDILDeviceInfo.cpp b/lib/Target/R600/AMDILDeviceInfo.cpp
> > index 9605fbe..9cc2083 100644
> > --- a/lib/Target/R600/AMDILDeviceInfo.cpp
> > +++ b/lib/Target/R600/AMDILDeviceInfo.cpp
> > @@ -79,7 +79,8 @@ AMDGPUDevice* getDeviceFromName(const std::string &deviceName,
> >             " on 32bit pointers!");
> >   #endif
> >       return new AMDGPUNIDevice(ptr);
> > -  } else if (deviceName == "SI") {
> > +  } else if (deviceName == "tahiti" || deviceName == "pitcairn" ||
> > +             deviceName == "verde"  || deviceName == "oland") {
> >       return new AMDGPUSIDevice(ptr);
> >     } else {
> >   #if DEBUG
> > diff --git a/lib/Target/R600/Processors.td b/lib/Target/R600/Processors.td
> > index 868810c..86534f6 100644
> > --- a/lib/Target/R600/Processors.td
> > +++ b/lib/Target/R600/Processors.td
> > @@ -26,5 +26,7 @@ def : Proc<"barts",      R600_EG_Itin, [FeatureByteAddress, FeatureImages]>;
> >   def : Proc<"turks",      R600_EG_Itin, [FeatureByteAddress, FeatureImages]>;
> >   def : Proc<"caicos",     R600_EG_Itin, [FeatureByteAddress, FeatureImages]>;
> >   def : Proc<"cayman",     R600_EG_Itin, [FeatureByteAddress, FeatureImages, FeatureFP64]>;
> > -def : Proc<"SI", SI_Itin, [Feature64BitPtr]>;
> > -
> > +def : Proc<"tahiti", SI_Itin, [Feature64BitPtr, FeatureFP64]>;
> > +def : Proc<"pitcairn", SI_Itin, [Feature64BitPtr, FeatureFP64]>;
> > +def : Proc<"verde", SI_Itin, [Feature64BitPtr, FeatureFP64]>;
> > +def : Proc<"oland", SI_Itin, [Feature64BitPtr, FeatureFP64]>;
> > -- 1.7.3.4
> >
> > 0003-R600-SI-Use-same-names-for-corresponding-MUBUF-opera.patch
> >
> >
> > >From f95ede73f1030b0f62ecfe9d7ffe51e6237fd442 Mon Sep 17 00:00:00 2001
> > From: Tom Stellard<thomas.stellard at amd.com>
> > Date: Wed, 24 Oct 2012 16:20:20 -0400
> > Subject: [PATCH 3/5] R600/SI: Use same names for corresponding MUBUF operands and encoding fields
> >
> > The code emitter knows how to encode operands whose name matches one of
> > the encoding fields.  If there is no match, the code emitter relies on
> > the order of the operand and field definitions to determine how operands
> > should be encoding.  Matching by order makes it easy to accidentally break
> > the instruction encodings, so we prefer to match by name.
> > ---
> >   lib/Target/R600/SIInstrFormats.td |   50 ++++++++++++++++++------------------
> >   lib/Target/R600/SIInstrInfo.td    |    4 +-
> >   2 files changed, 27 insertions(+), 27 deletions(-)
> >
> > diff --git a/lib/Target/R600/SIInstrFormats.td b/lib/Target/R600/SIInstrFormats.td
> > index 3891ddb..f737ddd 100644
> > --- a/lib/Target/R600/SIInstrFormats.td
> > +++ b/lib/Target/R600/SIInstrFormats.td
> > @@ -284,33 +284,33 @@ let Uses = [EXEC] in {
> >   class MUBUF <bits<7> op, dag outs, dag ins, string asm, list<dag> pattern> :
> >       Enc64<outs, ins, asm, pattern> {
> >   
> > -  bits<8> VDATA;
> > -  bits<12> OFFSET;
> > -  bits<1> OFFEN;
> > -  bits<1> IDXEN;
> > -  bits<1> GLC;
> > -  bits<1> ADDR64;
> > -  bits<1> LDS;
> > -  bits<8> VADDR;
> > -  bits<7> SRSRC;
> > -  bits<1> SLC;
> > -  bits<1> TFE;
> > -  bits<8> SOFFSET;
> > -
> > -  let Inst{11-0} = OFFSET;
> > -  let Inst{12} = OFFEN;
> > -  let Inst{13} = IDXEN;
> > -  let Inst{14} = GLC;
> > -  let Inst{15} = ADDR64;
> > -  let Inst{16} = LDS;
> > +  bits<12> offset;
> > +  bits<1> offen;
> > +  bits<1> idxen;
> > +  bits<1> glc;
> > +  bits<1> addr64;
> > +  bits<1> lds;
> > +  bits<8> vaddr;
> > +  bits<8> vdata;
> > +  bits<7> srsrc;
> > +  bits<1> slc;
> > +  bits<1> tfe;
> > +  bits<8> soffset;
> > +
> > +  let Inst{11-0} = offset;
> > +  let Inst{12} = offen;
> > +  let Inst{13} = idxen;
> > +  let Inst{14} = glc;
> > +  let Inst{15} = addr64;
> > +  let Inst{16} = lds;
> >     let Inst{24-18} = op;
> >     let Inst{31-26} = 0x38; //encoding
> > -  let Inst{39-32} = VADDR;
> > -  let Inst{47-40} = VDATA;
> > -  let Inst{52-48} = SRSRC{6-2};
> > -  let Inst{54} = SLC;
> > -  let Inst{55} = TFE;
> > -  let Inst{63-56} = SOFFSET;
> > +  let Inst{39-32} = vaddr;
> > +  let Inst{47-40} = vdata;
> > +  let Inst{52-48} = srsrc{6-2};
> > +  let Inst{54} = slc;
> > +  let Inst{55} = tfe;
> > +  let Inst{63-56} = soffset;
> >   
> >     let VM_CNT = 1;
> >     let EXP_CNT = 1;
> > diff --git a/lib/Target/R600/SIInstrInfo.td b/lib/Target/R600/SIInstrInfo.td
> > index 260c651..1b09ded 100644
> > --- a/lib/Target/R600/SIInstrInfo.td
> > +++ b/lib/Target/R600/SIInstrInfo.td
> > @@ -276,11 +276,11 @@ class MTBUF_Store_Helper <bits<3> op, string asm, RegisterClass regClass> : MTBU
> >   
> >   class MUBUF_Load_Helper <bits<7> op, string asm, RegisterClass regClass> : MUBUF <
> >     op,
> > -  (outs regClass:$dst),
> > +  (outs regClass:$vdata),
> >     (ins i16imm:$offset, i1imm:$offen, i1imm:$idxen, i1imm:$glc, i1imm:$addr64,
> >          i1imm:$lds, VReg_32:$vaddr, SReg_128:$srsrc, i1imm:$slc,
> >          i1imm:$tfe, SSrc_32:$soffset),
> > -  asm#" $dst, $offset, $offen, $idxen, $glc, $addr64, "
> > +  asm#" $vdata, $offset, $offen, $idxen, $glc, $addr64, "
> >        #"$lds, $vaddr, $srsrc, $slc, $tfe, $soffset",
> >     []> {
> >     let mayLoad = 1;
> > -- 1.7.3.4
> >
> > 0004-R600-SI-Simplify-MUBUF-helper-class-for-loads.patch
> >
> >
> > >From 3a1807a4b8e0a17febbf623d05b833e0cbdda3d2 Mon Sep 17 00:00:00 2001
> > From: Tom Stellard<thomas.stellard at amd.com>
> > Date: Thu, 25 Oct 2012 10:36:05 -0400
> > Subject: [PATCH 4/5] R600/SI: Simplify MUBUF helper class for loads
> >
> > ---
> >   lib/Target/R600/SIInstrInfo.td    |   23 ++++++++++++++++-------
> >   lib/Target/R600/SIInstructions.td |   17 +++++++----------
> >   2 files changed, 23 insertions(+), 17 deletions(-)
> >
> > diff --git a/lib/Target/R600/SIInstrInfo.td b/lib/Target/R600/SIInstrInfo.td
> > index 1b09ded..d0cd468 100644
> > --- a/lib/Target/R600/SIInstrInfo.td
> > +++ b/lib/Target/R600/SIInstrInfo.td
> > @@ -274,17 +274,26 @@ class MTBUF_Store_Helper <bits<3> op, string asm, RegisterClass regClass> : MTBU
> >     let mayLoad = 0;
> >   }
> >   
> > -class MUBUF_Load_Helper <bits<7> op, string asm, RegisterClass regClass> : MUBUF <
> > +class MUBUF_Load_Helper <bits<7> op, string name, RegisterClass regClass,
> > +                         list<dag> Pat> : MUBUF <
> >     op,
> >     (outs regClass:$vdata),
> > -  (ins i16imm:$offset, i1imm:$offen, i1imm:$idxen, i1imm:$glc, i1imm:$addr64,
> > -       i1imm:$lds, VReg_32:$vaddr, SReg_128:$srsrc, i1imm:$slc,
> > -       i1imm:$tfe, SSrc_32:$soffset),
> > -  asm#" $vdata, $offset, $offen, $idxen, $glc, $addr64, "
> > -     #"$lds, $vaddr, $srsrc, $slc, $tfe, $soffset",
> > -  []> {
> > +  (ins i16imm:$offset, SReg_128:$srsrc, VReg_32:$vaddr),
> > +  name#" $vdata, ($srsrc + $offset)[$vaddr]",
> > +  Pat> {
> > +
> >     let mayLoad = 1;
> >     let mayStore = 0;
> > +
> > +  // Encoding
> > +  let offen = 0;
> > +  let idxen = 1;
> > +  let glc = 0;
> > +  let addr64 = 0;
> > +  let lds = 0;
> > +  let slc = 0;
> > +  let tfe = 0;
> > +  let soffset = 128; // ZERO
> >   }
> >   
> >   class MTBUF_Load_Helper <bits<3> op, string asm, RegisterClass regClass> : MTBUF <
> > diff --git a/lib/Target/R600/SIInstructions.td b/lib/Target/R600/SIInstructions.td
> > index 0ab9e4e..44f35b9 100644
> > --- a/lib/Target/R600/SIInstructions.td
> > +++ b/lib/Target/R600/SIInstructions.td
> > @@ -394,7 +394,13 @@ defm V_CMPX_CLASS_F64 : VOPC_64 <0x000000b8, "V_CMPX_CLASS_F64">;
> >   //def BUFFER_LOAD_FORMAT_X : MUBUF_ <0x00000000, "BUFFER_LOAD_FORMAT_X", []>;
> >   //def BUFFER_LOAD_FORMAT_XY : MUBUF_ <0x00000001, "BUFFER_LOAD_FORMAT_XY", []>;
> >   //def BUFFER_LOAD_FORMAT_XYZ : MUBUF_ <0x00000002, "BUFFER_LOAD_FORMAT_XYZ", []>;
> > -def BUFFER_LOAD_FORMAT_XYZW : MUBUF_Load_Helper <0x00000003, "BUFFER_LOAD_FORMAT_XYZW", VReg_128>;
> > +
> > +def BUFFER_LOAD_FORMAT_XYZW : MUBUF_Load_Helper <
> > +  0x00000003, "BUFFER_LOAD_FORMAT_XYZW", VReg_128,
> > +  [(set VReg_128:$vdata, (int_SI_vs_load_input SReg_128:$srsrc,
> > +                                               IMM12bit:$offset,
> > +                                               VReg_32:$vaddr))]>;
> > +
> >   //def BUFFER_STORE_FORMAT_X : MUBUF_ <0x00000004, "BUFFER_STORE_FORMAT_X", []>;
> >   //def BUFFER_STORE_FORMAT_XY : MUBUF_ <0x00000005, "BUFFER_STORE_FORMAT_XY", []>;
> >   //def BUFFER_STORE_FORMAT_XYZ : MUBUF_ <0x00000006, "BUFFER_STORE_FORMAT_XYZ", []>;
> > @@ -1145,15 +1151,6 @@ def : Pat <
> >     (SI_KILL (V_MOV_B32_e32 0xbf800000))
> >   >;
> >   
> > -/* int_SI_vs_load_input */
> > -def : Pat<
> > -  (int_SI_vs_load_input SReg_128:$tlst, IMM12bit:$attr_offset,
> > -                        VReg_32:$buf_idx_vgpr),
> > -  (BUFFER_LOAD_FORMAT_XYZW imm:$attr_offset, 0, 1, 0, 0, 0,
> > -                           VReg_32:$buf_idx_vgpr, SReg_128:$tlst,
> > -                           0, 0, 0)
> > ->;
> > -
> >   /* int_SI_export */
> >   def : Pat <
> >     (int_SI_export imm:$en, imm:$vm, imm:$done, imm:$tgt, imm:$compr,
> > -- 1.7.3.4
> >
> > 0005-R600-SI-Add-support-for-buffer-stores.patch
> >
> >
> > >From b5391a90d06ae4e232fd63e440b891586e456045 Mon Sep 17 00:00:00 2001
> > From: Tom Stellard<thomas.stellard at amd.com>
> > Date: Mon, 4 Mar 2013 15:12:16 -0500
> > Subject: [PATCH 5/5] R600/SI: Add support for buffer stores
> >
> > ---
> >   lib/Target/R600/AMDGPUCallingConv.td  |    8 ++-
> >   lib/Target/R600/AMDGPUISelLowering.h  |    1 +
> >   lib/Target/R600/AMDILISelDAGToDAG.cpp |   23 +++++++
> >   lib/Target/R600/SIDefines.h           |  112 +++++++++++++++++++++++++++++++++
> >   lib/Target/R600/SIISelLowering.cpp    |   63 ++++++++++++++++++
> >   lib/Target/R600/SIISelLowering.h      |    1 +
> >   lib/Target/R600/SIInstrInfo.td        |   26 ++++++++
> >   lib/Target/R600/SIInstructions.td     |   10 +++-
> >   lib/Target/R600/SIRegisterInfo.td     |    2 +-
> >   test/CodeGen/R600/imm.ll              |    3 -
> >   test/CodeGen/R600/store.ll            |   11 +++
> >   11 files changed, 253 insertions(+), 7 deletions(-)
> >   create mode 100644 lib/Target/R600/SIDefines.h
> >   create mode 100644 test/CodeGen/R600/store.ll
> >
> > diff --git a/lib/Target/R600/AMDGPUCallingConv.td b/lib/Target/R600/AMDGPUCallingConv.td
> > index 45ae37e..9c30515 100644
> > --- a/lib/Target/R600/AMDGPUCallingConv.td
> > +++ b/lib/Target/R600/AMDGPUCallingConv.td
> > @@ -32,8 +32,14 @@ def CC_SI : CallingConv<[
> >       VGPR8, VGPR9, VGPR10, VGPR11, VGPR12, VGPR13, VGPR14, VGPR15,
> >       VGPR16, VGPR17, VGPR18, VGPR19, VGPR20, VGPR21, VGPR22, VGPR23,
> >       VGPR24, VGPR25, VGPR26, VGPR27, VGPR28, VGPR29, VGPR30, VGPR31
> > -  ]>>>
> > +  ]>>>,
> >   
> > +  // This is the default for i64 values.
> > +  // XXX: We should change this once clang understands the CC_AMDGPU.
> > +  CCIfType<[i64], CCAssignToRegWithShadow<
> > +   [ SGPR0, SGPR2, SGPR4, SGPR6, SGPR8, SGPR10, SGPR12, SGPR14 ],
> > +   [ SGPR1, SGPR3, SGPR5, SGPR7, SGPR9, SGPR11, SGPR13, SGPR15 ]
> > +  >>
> >   ]>;
> >   
> >   def CC_AMDGPU : CallingConv<[
> > diff --git a/lib/Target/R600/AMDGPUISelLowering.h b/lib/Target/R600/AMDGPUISelLowering.h
> > index f31b646..c2a79ea 100644
> > --- a/lib/Target/R600/AMDGPUISelLowering.h
> > +++ b/lib/Target/R600/AMDGPUISelLowering.h
> > @@ -116,6 +116,7 @@ enum {
> >     BRANCH_COND,
> >     // End AMDIL ISD Opcodes
> >     BITALIGN,
> > +  BUFFER_STORE,
> >     DWORDADDR,
> >     FRACT,
> >     FMAX,
> > diff --git a/lib/Target/R600/AMDILISelDAGToDAG.cpp b/lib/Target/R600/AMDILISelDAGToDAG.cpp
> > index e77b9dc..3e58881 100644
> > --- a/lib/Target/R600/AMDILISelDAGToDAG.cpp
> > +++ b/lib/Target/R600/AMDILISelDAGToDAG.cpp
> > @@ -162,6 +162,29 @@ SDNode *AMDGPUDAGToDAGISel::Select(SDNode *N) {
> >     }
> >     switch (Opc) {
> >     default: break;
> > +  case ISD::BUILD_PAIR: {
> > +    SDValue RC, SubReg0, SubReg1;
> > +    const AMDGPUSubtarget &ST = TM.getSubtarget<AMDGPUSubtarget>();
> > +    if (ST.device()->getGeneration() <= AMDGPUDeviceInfo::HD6XXX) {
> > +      break;
> > +    }
> > +    if (N->getValueType(0) == MVT::i128) {
> > +      RC = CurDAG->getTargetConstant(AMDGPU::SReg_128RegClassID, MVT::i32);
> > +      SubReg0 = CurDAG->getTargetConstant(AMDGPU::sub0_sub1, MVT::i32);
> > +      SubReg1 = CurDAG->getTargetConstant(AMDGPU::sub2_sub3, MVT::i32);
> > +    } else if (N->getValueType(0) == MVT::i64) {
> > +      RC = CurDAG->getTargetConstant(AMDGPU::SReg_64RegClassID, MVT::i32);
> > +      SubReg0 = CurDAG->getTargetConstant(AMDGPU::sub0, MVT::i32);
> > +      SubReg1 = CurDAG->getTargetConstant(AMDGPU::sub1, MVT::i32);
> > +    } else {
> > +      llvm_unreachable("Unhandled value type for BUILD_PAIR");
> > +    }
> > +    const SDValue Ops[] = { RC, N->getOperand(0), SubReg0,
> > +                            N->getOperand(1), SubReg1 };
> > +    return CurDAG->getMachineNode(TargetOpcode::REG_SEQUENCE,
> > +                                  N->getDebugLoc(), N->getValueType(0), Ops, 5);
> > +  }
> > +
> >     case ISD::ConstantFP:
> >     case ISD::Constant: {
> >       const AMDGPUSubtarget &ST = TM.getSubtarget<AMDGPUSubtarget>();
> > diff --git a/lib/Target/R600/SIDefines.h b/lib/Target/R600/SIDefines.h
> > new file mode 100644
> > index 0000000..3e2a229
> > --- /dev/null
> > +++ b/lib/Target/R600/SIDefines.h
> > @@ -0,0 +1,112 @@
> > +//===-- SIDefines.h - SI Helper Macros --------------------------*- C++ -*-===//
> > +//
> > +//                     The LLVM Compiler Infrastructure
> > +//
> > +// This file is distributed under the University of Illinois Open Source
> > +// License. See LICENSE.TXT for details.
> > +//
> > +/// \file
> > +//===----------------------------------------------------------------------===//
> > +
> > +#ifndef SIDEFINES_H_
> > +#define SIDEFINES_H_
> > +
> > +#define R_008F0C_SQ_BUF_RSRC_WORD3                                      0x008F0C
> > +#define   S_008F0C_DST_SEL_X(x)                                       (((x) & 0x07) << 0)
> > +#define   G_008F0C_DST_SEL_X(x)                                       (((x) >> 0) & 0x07)
> > +#define   C_008F0C_DST_SEL_X                                          0xFFFFFFF8
> > +#define     V_008F0C_SQ_SEL_0                                       0x00
> > +#define     V_008F0C_SQ_SEL_1                                       0x01
> > +#define     V_008F0C_SQ_SEL_RESERVED_0                              0x02
> > +#define     V_008F0C_SQ_SEL_RESERVED_1                              0x03
> > +#define     V_008F0C_SQ_SEL_X                                       0x04
> > +#define     V_008F0C_SQ_SEL_Y                                       0x05
> > +#define     V_008F0C_SQ_SEL_Z                                       0x06
> > +#define     V_008F0C_SQ_SEL_W                                       0x07
> > +#define   S_008F0C_DST_SEL_Y(x)                                       (((x) & 0x07) << 3)
> > +#define   G_008F0C_DST_SEL_Y(x)                                       (((x) >> 3) & 0x07)
> > +#define   C_008F0C_DST_SEL_Y                                          0xFFFFFFC7
> > +#define     V_008F0C_SQ_SEL_0                                       0x00
> > +#define     V_008F0C_SQ_SEL_1                                       0x01
> > +#define     V_008F0C_SQ_SEL_RESERVED_0                              0x02
> > +#define     V_008F0C_SQ_SEL_RESERVED_1                              0x03
> > +#define     V_008F0C_SQ_SEL_X                                       0x04
> > +#define     V_008F0C_SQ_SEL_Y                                       0x05
> > +#define     V_008F0C_SQ_SEL_Z                                       0x06
> > +#define     V_008F0C_SQ_SEL_W                                       0x07
> > +#define   S_008F0C_DST_SEL_Z(x)                                       (((x) & 0x07) << 6)
> > +#define   G_008F0C_DST_SEL_Z(x)                                       (((x) >> 6) & 0x07)
> > +#define   C_008F0C_DST_SEL_Z                                          0xFFFFFE3F
> > +#define     V_008F0C_SQ_SEL_0                                       0x00
> > +#define     V_008F0C_SQ_SEL_1                                       0x01
> > +#define     V_008F0C_SQ_SEL_RESERVED_0                              0x02
> > +#define     V_008F0C_SQ_SEL_RESERVED_1                              0x03
> > +#define     V_008F0C_SQ_SEL_X                                       0x04
> > +#define     V_008F0C_SQ_SEL_Y                                       0x05
> > +#define     V_008F0C_SQ_SEL_Z                                       0x06
> > +#define     V_008F0C_SQ_SEL_W                                       0x07
> > +#define   S_008F0C_DST_SEL_W(x)                                       (((x) & 0x07) << 9)
> > +#define   G_008F0C_DST_SEL_W(x)                                       (((x) >> 9) & 0x07)
> > +#define   C_008F0C_DST_SEL_W                                          0xFFFFF1FF
> > +#define     V_008F0C_SQ_SEL_0                                       0x00
> > +#define     V_008F0C_SQ_SEL_1                                       0x01
> > +#define     V_008F0C_SQ_SEL_RESERVED_0                              0x02
> > +#define     V_008F0C_SQ_SEL_RESERVED_1                              0x03
> > +#define     V_008F0C_SQ_SEL_X                                       0x04
> > +#define     V_008F0C_SQ_SEL_Y                                       0x05
> > +#define     V_008F0C_SQ_SEL_Z                                       0x06
> > +#define     V_008F0C_SQ_SEL_W                                       0x07
> > +#define   S_008F0C_NUM_FORMAT(x)                                      (((x) & 0x07) << 12)
> > +#define   G_008F0C_NUM_FORMAT(x)                                      (((x) >> 12) & 0x07)
> > +#define   C_008F0C_NUM_FORMAT                                         0xFFFF8FFF
> > +#define     V_008F0C_BUF_NUM_FORMAT_UNORM                           0x00
> > +#define     V_008F0C_BUF_NUM_FORMAT_SNORM                           0x01
> > +#define     V_008F0C_BUF_NUM_FORMAT_USCALED                         0x02
> > +#define     V_008F0C_BUF_NUM_FORMAT_SSCALED                         0x03
> > +#define     V_008F0C_BUF_NUM_FORMAT_UINT                            0x04
> > +#define     V_008F0C_BUF_NUM_FORMAT_SINT                            0x05
> > +#define     V_008F0C_BUF_NUM_FORMAT_SNORM_OGL                       0x06
> > +#define     V_008F0C_BUF_NUM_FORMAT_FLOAT                           0x07
> > +#define   S_008F0C_DATA_FORMAT(x)                                     (((x) & 0x0F) << 15)
> > +#define   G_008F0C_DATA_FORMAT(x)                                     (((x) >> 15) & 0x0F)
> > +#define   C_008F0C_DATA_FORMAT                                        0xFFF87FFF
> > +#define     V_008F0C_BUF_DATA_FORMAT_INVALID                        0x00
> > +#define     V_008F0C_BUF_DATA_FORMAT_8                              0x01
> > +#define     V_008F0C_BUF_DATA_FORMAT_16                             0x02
> > +#define     V_008F0C_BUF_DATA_FORMAT_8_8                            0x03
> > +#define     V_008F0C_BUF_DATA_FORMAT_32                             0x04
> > +#define     V_008F0C_BUF_DATA_FORMAT_16_16                          0x05
> > +#define     V_008F0C_BUF_DATA_FORMAT_10_11_11                       0x06
> > +#define     V_008F0C_BUF_DATA_FORMAT_11_11_10                       0x07
> > +#define     V_008F0C_BUF_DATA_FORMAT_10_10_10_2                     0x08
> > +#define     V_008F0C_BUF_DATA_FORMAT_2_10_10_10                     0x09
> > +#define     V_008F0C_BUF_DATA_FORMAT_8_8_8_8                        0x0A
> > +#define     V_008F0C_BUF_DATA_FORMAT_32_32                          0x0B
> > +#define     V_008F0C_BUF_DATA_FORMAT_16_16_16_16                    0x0C
> > +#define     V_008F0C_BUF_DATA_FORMAT_32_32_32                       0x0D
> > +#define     V_008F0C_BUF_DATA_FORMAT_32_32_32_32                    0x0E
> > +#define     V_008F0C_BUF_DATA_FORMAT_RESERVED_15                    0x0F
> > +#define   S_008F0C_ELEMENT_SIZE(x)                                    (((x) & 0x03) << 19)
> > +#define   G_008F0C_ELEMENT_SIZE(x)                                    (((x) >> 19) & 0x03)
> > +#define   C_008F0C_ELEMENT_SIZE                                       0xFFE7FFFF
> > +#define   S_008F0C_INDEX_STRIDE(x)                                    (((x) & 0x03) << 21)
> > +#define   G_008F0C_INDEX_STRIDE(x)                                    (((x) >> 21) & 0x03)
> > +#define   C_008F0C_INDEX_STRIDE                                       0xFF9FFFFF
> > +#define   S_008F0C_ADD_TID_ENABLE(x)                                  (((x) & 0x1) << 23)
> > +#define   G_008F0C_ADD_TID_ENABLE(x)                                  (((x) >> 23) & 0x1)
> > +#define   C_008F0C_ADD_TID_ENABLE                                     0xFF7FFFFF
> > +#define   S_008F0C_HASH_ENABLE(x)                                     (((x) & 0x1) << 25)
> > +#define   G_008F0C_HASH_ENABLE(x)                                     (((x) >> 25) & 0x1)
> > +#define   C_008F0C_HASH_ENABLE                                        0xFDFFFFFF
> > +#define   S_008F0C_HEAP(x)                                            (((x) & 0x1) << 26)
> > +#define   G_008F0C_HEAP(x)                                            (((x) >> 26) & 0x1)
> > +#define   C_008F0C_HEAP                                               0xFBFFFFFF
> > +#define   S_008F0C_TYPE(x)                                            (((x) & 0x03) << 30)
> > +#define   G_008F0C_TYPE(x)                                            (((x) >> 30) & 0x03)
> > +#define   C_008F0C_TYPE                                               0x3FFFFFFF
> > +#define     V_008F0C_SQ_RSRC_BUF                                    0x00
> > +#define     V_008F0C_SQ_RSRC_BUF_RSVD_1                             0x01
> > +#define     V_008F0C_SQ_RSRC_BUF_RSVD_2                             0x02
> > +#define     V_008F0C_SQ_RSRC_BUF_RSVD_3                             0x03
> > +
> > +#endif // R600DEFINES_H_
> 
> Oh no, please not! Do we really need that? For the indirect stuff I put 
> quite allot of effort into keeping the resource descriptors out of the 
> LLVM backend.
> 
> My main reason was that even if we today only use v1i32 or v2i32 (or 
> v2f32, etc...) buffers that stuff really belongs into the driver and not 
> the compile, cause the driver is really setting up the buffers 
> (including handling all the restrictions for tilling, alignment, padding 
> etc...).
>

I'm not really too familiar with tiling, which resource fields are used
for tiling?

The reasons I wanted to construct the resource constant in the compiler
are:

1. The ability to use the same virtual address for different typed loads.  For
example if I have 5 i32 loads from a buffer and LLVM optimizes them to
1 v4i32 load and 1 i32 load there is really no good way to tell the
driver to send different resource descriptors for each loaded type.

2. Allows us to pass 64-bit pointer values to the shader and avoid
having to deal with arithmetic on 128-bit pointers.  This might not be
too hard to solve, but it would require some additional work.

-Tom

> > diff --git a/lib/Target/R600/SIISelLowering.cpp b/lib/Target/R600/SIISelLowering.cpp
> > index 74b1dc5..3f0e27d 100644
> > --- a/lib/Target/R600/SIISelLowering.cpp
> > +++ b/lib/Target/R600/SIISelLowering.cpp
> > @@ -16,6 +16,7 @@
> >   #include "AMDIL.h"
> >   #include "AMDGPU.h"
> >   #include "AMDILIntrinsicInfo.h"
> > +#include "SIDefines.h"
> >   #include "SIInstrInfo.h"
> >   #include "SIMachineFunctionInfo.h"
> >   #include "SIRegisterInfo.h"
> > @@ -49,6 +50,7 @@ SITargetLowering::SITargetLowering(TargetMachine &TM) :
> >   
> >     addRegisterClass(MVT::v4i32, &AMDGPU::VReg_128RegClass);
> >     addRegisterClass(MVT::v4f32, &AMDGPU::VReg_128RegClass);
> > +  addRegisterClass(MVT::i128, &AMDGPU::SReg_128RegClass);
> >   
> >     addRegisterClass(MVT::v8i32, &AMDGPU::VReg_256RegClass);
> >     addRegisterClass(MVT::v8f32, &AMDGPU::VReg_256RegClass);
> > @@ -68,6 +70,9 @@ SITargetLowering::SITargetLowering(TargetMachine &TM) :
> >     setTargetDAGCombine(ISD::SELECT_CC);
> >   
> >     setTargetDAGCombine(ISD::SETCC);
> > +
> > +  setOperationAction(ISD::STORE, MVT::i32, Custom);
> > +  setOperationAction(ISD::STORE, MVT::i64, Custom);
> >   }
> >   
> >   SDValue SITargetLowering::LowerFormalArguments(
> > @@ -236,6 +241,7 @@ SDValue SITargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const {
> >     default: return AMDGPUTargetLowering::LowerOperation(Op, DAG);
> >     case ISD::BRCOND: return LowerBRCOND(Op, DAG);
> >     case ISD::SELECT_CC: return LowerSELECT_CC(Op, DAG);
> > +  case ISD::STORE: return LowerSTORE(Op, DAG);
> >     }
> >     return SDValue();
> >   }
> > @@ -334,6 +340,63 @@ SDValue SITargetLowering::LowerBRCOND(SDValue BRCOND,
> >     return Chain;
> >   }
> >   
> > +SDValue SITargetLowering::LowerSTORE(SDValue Op, SelectionDAG &DAG) const {
> > +  StoreSDNode *StoreNode = cast<StoreSDNode>(Op);
> > +  SDValue Chain = Op.getOperand(0);
> > +  SDValue Value = Op.getOperand(1);
> > +  SDValue VirtualAddress = Op.getOperand(2);
> > +  EVT VT = Value.getValueType();
> > +  DebugLoc DL = Op.getDebugLoc();
> > +
> > +  if (StoreNode->getAddressSpace() != AMDGPUAS::GLOBAL_ADDRESS) {
> > +    return SDValue();
> > +  }
> > +
> > +  // 128-bit resource constants are used to identify buffers for loads and
> > +  // stores.  The resource constant is composed of a 48-bit virtual address
> > +  // plus 80-bits of information about the buffer.
> > +
> > +  // Set NumRecords to the maximum allowed value.  This way we don't need to
> > +  // pass the number of records to he program and it should work for all
> > +  // buffer sizes.
> > +  //
> > +  // XXX: Is there any disadvantage to doing this?
> > +  //
> > +  uint64_t RsrcWord2 = 0xFFFFFFFF; //NUM_RECORDS
> > +
> > +  unsigned DataFormat;
> > +  switch (VT.getSizeInBits()) {
> > +  case 32: DataFormat = V_008F0C_BUF_DATA_FORMAT_32; break;
> > +  case 64: DataFormat = V_008F0C_BUF_DATA_FORMAT_32_32; break;
> > +  default: llvm_unreachable("Unhandle type size in store");
> > +  }
> > +
> > +  uint64_t RsrcWord3 =
> > +    // Always set the destination select values to XYZW.
> > +    // XXX: Is there any disadvantage to doing this?
> > +    S_008F0C_DST_SEL_X(V_008F0C_SQ_SEL_X) |
> > +    S_008F0C_DST_SEL_Y(V_008F0C_SQ_SEL_Y) |
> > +    S_008F0C_DST_SEL_Z(V_008F0C_SQ_SEL_Z) |
> > +    S_008F0C_DST_SEL_W(V_008F0C_SQ_SEL_W) |
> > +    // NUM_FORMAT_UINT appears to work even if the storage type is float,
> > +    // so we'll use it for everything.
> > +    S_008F0C_NUM_FORMAT(V_008F0C_BUF_NUM_FORMAT_UINT) |
> > +    S_008F0C_DATA_FORMAT(DataFormat);
> > +
> > +  SDValue RsrcWord23 = DAG.getConstant(RsrcWord2 | (RsrcWord3 << 32), MVT::i64);
> > +
> > +  SDValue SrcSrc = DAG.getNode(ISD::BUILD_PAIR, DL, MVT::i128,
> > +                               VirtualAddress, RsrcWord23);
> > +
> > +  SDValue Ops[2];
> > +  Ops[0] = DAG.getNode(AMDGPUISD::BUFFER_STORE, DL, MVT::Other, Chain,
> > +                       Value, SrcSrc, DAG.getConstant(0, MVT::i32));
> > +  Ops[1] = Chain;
> > +
> > +  return DAG.getMergeValues(Ops, 2, DL);
> > +
> > +}
> > +
> >   SDValue SITargetLowering::LowerSELECT_CC(SDValue Op, SelectionDAG &DAG) const {
> >     SDValue LHS = Op.getOperand(0);
> >     SDValue RHS = Op.getOperand(1);
> > diff --git a/lib/Target/R600/SIISelLowering.h b/lib/Target/R600/SIISelLowering.h
> > index 0411565..ef0a8dc 100644
> > --- a/lib/Target/R600/SIISelLowering.h
> > +++ b/lib/Target/R600/SIISelLowering.h
> > @@ -27,6 +27,7 @@ class SITargetLowering : public AMDGPUTargetLowering {
> >     void LowerSI_WQM(MachineInstr *MI, MachineBasicBlock &BB,
> >                 MachineBasicBlock::iterator I, MachineRegisterInfo & MRI) const;
> >   
> > +  SDValue LowerSTORE(SDValue Op, SelectionDAG &DAG) const;
> >     SDValue LowerSELECT_CC(SDValue Op, SelectionDAG &DAG) const;
> >     SDValue LowerBRCOND(SDValue Op, SelectionDAG &DAG) const;
> >   
> > diff --git a/lib/Target/R600/SIInstrInfo.td b/lib/Target/R600/SIInstrInfo.td
> > index d0cd468..02452ca 100644
> > --- a/lib/Target/R600/SIInstrInfo.td
> > +++ b/lib/Target/R600/SIInstrInfo.td
> > @@ -26,6 +26,10 @@ def HI32 : SDNodeXForm<imm, [{
> >     return CurDAG->getTargetConstant(N->getZExtValue() >> 32, MVT::i32);
> >   }]>;
> >   
> > +def SIbuffer_store : SDNode<"AMDGPUISD::BUFFER_STORE",
> > +                           SDTypeProfile<0, 3, [SDTCisPtrTy<1>, SDTCisInt<2>]>,
> > +                           [SDNPHasChain, SDNPMayStore]>;
> > +
> >   def IMM8bitDWORD : ImmLeaf <
> >     i32, [{
> >       return (Imm & ~0x3FC) == 0;
> > @@ -296,6 +300,28 @@ class MUBUF_Load_Helper <bits<7> op, string name, RegisterClass regClass,
> >     let soffset = 128; // ZERO
> >   }
> >   
> > +class MUBUF_Store_Helper <bits<7> op, string name, RegisterClass vdataClass,
> > +                         ValueType VT> :
> > +    MUBUF <op, (outs), (ins vdataClass:$vdata, SReg_128:$srsrc, VReg_32:$vaddr),
> > +          name#" $vdata, $srsrc + $vaddr",
> > +          [(SIbuffer_store (VT vdataClass:$vdata), (i128 SReg_128:$srsrc),
> > +                                                    (i32 VReg_32:$vaddr))]> {
> > +
> > +  let mayLoad = 0;
> > +  let mayStore = 1;
> > +
> > +  // Encoding
> > +  let offset = 0;
> > +  let offen = 1;
> > +  let idxen = 0;
> > +  let glc = 0;
> > +  let addr64 = 0;
> > +  let lds = 0;
> > +  let slc = 0;
> > +  let tfe = 0;
> > +  let soffset = 128; // ZERO
> > +}
> > +
> >   class MTBUF_Load_Helper <bits<3> op, string asm, RegisterClass regClass> : MTBUF <
> >     op,
> >     (outs regClass:$dst),
> > diff --git a/lib/Target/R600/SIInstructions.td b/lib/Target/R600/SIInstructions.td
> > index 44f35b9..a01e096 100644
> > --- a/lib/Target/R600/SIInstructions.td
> > +++ b/lib/Target/R600/SIInstructions.td
> > @@ -414,8 +414,14 @@ def BUFFER_LOAD_FORMAT_XYZW : MUBUF_Load_Helper <
> >   //def BUFFER_LOAD_DWORDX4 : MUBUF_DWORDX4 <0x0000000e, "BUFFER_LOAD_DWORDX4", []>;
> >   //def BUFFER_STORE_BYTE : MUBUF_ <0x00000018, "BUFFER_STORE_BYTE", []>;
> >   //def BUFFER_STORE_SHORT : MUBUF_ <0x0000001a, "BUFFER_STORE_SHORT", []>;
> > -//def BUFFER_STORE_DWORD : MUBUF_ <0x0000001c, "BUFFER_STORE_DWORD", []>;
> > -//def BUFFER_STORE_DWORDX2 : MUBUF_DWORDX2 <0x0000001d, "BUFFER_STORE_DWORDX2", []>;
> > +
> > +def BUFFER_STORE_DWORD : MUBUF_Store_Helper <
> > +  0x0000001c, "BUFFER_STORE_DWORD", VReg_32, i32
> > +>;
> > +
> > +def BUFFER_STORE_DWORDX2 : MUBUF_Store_Helper <
> > +  0x0000001d, "BUFFER_STORE_DWORDX2", VReg_64, i64
> > +>;
> >   //def BUFFER_STORE_DWORDX4 : MUBUF_DWORDX4 <0x0000001e, "BUFFER_STORE_DWORDX4", []>;
> >   //def BUFFER_ATOMIC_SWAP : MUBUF_ <0x00000030, "BUFFER_ATOMIC_SWAP", []>;
> >   //def BUFFER_ATOMIC_CMPSWAP : MUBUF_ <0x00000031, "BUFFER_ATOMIC_CMPSWAP", []>;
> > diff --git a/lib/Target/R600/SIRegisterInfo.td b/lib/Target/R600/SIRegisterInfo.td
> > index 3dcad50..af74123 100644
> > --- a/lib/Target/R600/SIRegisterInfo.td
> > +++ b/lib/Target/R600/SIRegisterInfo.td
> > @@ -151,7 +151,7 @@ def SReg_64 : RegisterClass<"AMDGPU", [i64, i1], 64,
> >     (add SGPR_64, VCCReg, EXECReg)
> >   >;
> >   
> > -def SReg_128 : RegisterClass<"AMDGPU", [v16i8], 128, (add SGPR_128)>;
> > +def SReg_128 : RegisterClass<"AMDGPU", [v16i8, i128], 128, (add SGPR_128)>;
> >   
> >   def SReg_256 : RegisterClass<"AMDGPU", [v32i8], 256, (add SGPR_256)>;
> >   
> > diff --git a/test/CodeGen/R600/imm.ll b/test/CodeGen/R600/imm.ll
> > index 20f9dec..0e19e78 100644
> > --- a/test/CodeGen/R600/imm.ll
> > +++ b/test/CodeGen/R600/imm.ll
> > @@ -1,8 +1,5 @@
> >   ; RUN: llc < %s -march=r600 -mcpu=SI | FileCheck %s
> >   
> > -; XXX: Enable once SI supports buffer stores
> > -; XFAIL: *
> > -
> >   ; CHECK: @i64_imm
> >   ; CHECK-NOT: S_MOV_B64
> >   define void @i64_imm(i64 addrspace(1)* %out) {
> > diff --git a/test/CodeGen/R600/store.ll b/test/CodeGen/R600/store.ll
> > new file mode 100644
> > index 0000000..e5811ea
> > --- /dev/null
> > +++ b/test/CodeGen/R600/store.ll
> > @@ -0,0 +1,11 @@
> > +; RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck --check-prefix=EG-CHECK %s
> > +; RUN: llc < %s -march=r600 -mcpu=SI | FileCheck --check-prefix=SI-CHECK %s
> > +
> > +; CHECK: @store_float
> > +; EG-CHECK: RAT_WRITE_CACHELESS_32_eg T{{[0-9]+\.X, T[0-9]+\.X}}, 1
> > +; SI-CHECK: BUFFER_STORE_DWORD
> > +
> > +define void @store_float(float addrspace(1)* %out, float %in) {
> > +  store float %in, float addrspace(1)* %out
> > +  ret void
> > +}
> > -- 1.7.3.4
> >
> >
> > _______________________________________________
> > llvm-commits mailing list
> > llvm-commits at cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> 



More information about the llvm-commits mailing list