[llvm-commits] RFC: Add R600/SI backend as an experimental target

Mon Sep 10 07:13:01 PDT 2012

On Sun, Sep 09, 2012 at 01:15:50PM -0400, Justin Holewinski wrote:
> The back-end builds fine for me!
> 
> I haven't done extensive testing, but it looks like there is some issue
> with the way registers are defined.  In text output, I am getting
> placeholders instead of actual numbers:
> 
> jholewinski at rapture [R600]$ llc -march=r600 -mcpu=redwood <
> llvm.AMDGPU.pow.ll
> .text
> ; BB#0:
> LOG_IEEE T#Index#.#Chan, T#Index#.#Chan (Pred_sel_off)
> MUL NON-IEEE T#Index#.#Chan, T#Index#.#Chan, T#Index#.#Chan
> EXP_IEEE T#Index#.#Chan, T#Index#.#Chan (Pred_sel_off)
> RETURN
> 
> 

Hi Justin,

Thanks for taking the time to test this out.  It looks like I made a
mistake when I updated the R600RegisterInfo.td files.  I've attached a
few patches that should fix this issue and others I noticed while going
through the code.  I also changed the tests to check the register names.
These patches can also be found in my r600-review-v9 branch.

-Tom


> On Fri, Sep 7, 2012 at 4:35 PM, Tom Stellard <tom at stellard.net> wrote:
> 
> > Hi,
> >
> > Attached is a patch with an updated version of the R600/SI backend.  The
> > same code can also be found in the r600-review-v9 branch here:
> > http://cgit.freedesktop.org/~tstellar/llvm/
> >
> > Changes since last week's version include:
> >  +  Lots of fixes and new features for Southern Islands GPUs.
> >  +  The *RegisterInfo.td files have been updated to use newer features
> >     of tablegen and they no longer need to be generated with perl
> >     scripts. This change reduces the size of the patch by around ~100kb.
> >
> > I'm still hoping to get this accepted as an experimental backend, and I
> > look forward to your comments.
> >
> > Thanks,
> > Tom Stellard
> >
> >
> > On Fri, Aug 31, 2012 at 03:18:17PM -0400, Tom Stellard wrote:
> > > Hi,
> > >
> > > I've attached a patch with an updated version of the R600/SI backend.
> > > This time I've squashed all the changes into a single patch and used a
> > > format that is compatible with svn for easier testing/reviewing.
> > >
> > > Since the last version, I've fixed some build errors caused by recent
> > > tablegen changes, and I've also gone through and fixed all the
> > > errors caused by setting InstrInfo.guessInstructionProperties = 0.
> > >
> > > If you prefer git, this code can also be found in the r600-review-v8
> > > branch here: http://cgit.freedesktop.org/~tstellar/llvm/
> > >
> > > I'm still interested in getting this code into the main tree as an
> > > experimental target, so I'd appreciate any feedback on whether or not
> > > this backend is ready to be accepted as an experimental target.
> > >
> > > Thanks,
> > > Tom Stellard
> > >
> > > On Mon, Aug 27, 2012 at 02:46:06PM +0000, Tom Stellard wrote:
> > > >
> > > > Hi,
> > > >
> > > > Here is an updated version of the R600/SI backend that I would like to
> > > > be considered for inclusion in the LLVM main tree as an experimental[1]
> > > > target.
> > > >
> > > > Changes since the last version include:
> > > >   + Added AsmPrinter
> > > >   + Converted Tests to use AsmPrinter
> > > >   + Replaced CodeEmitters with MCCodeEmitters
> > > >   + Removed all uses of MachineOperands flags
> > > >   + Enabled the If Conversion pass on R600
> > > >   + Various clean ups and bug fixes.
> > > >
> > > > For more information about the backend, here are some frequently asked
> > > > questions:
> > > >
> > > > ++ What is the R600/SI backend?
> > > >
> > > >   The R600/SI backend is a code generator for AMD GPUs and supports
> > > >   the HD2XXX-HD7XXX GPU models.  It is currently being used by the
> > > >   Open Source graphics drivers for AMD GPUs in the Mesa[2] project as
> > > >   a compiler backend for graphics and compute shaders.
> > > >
> > > > ++ How will the R600/SI backend benefit LLVM?
> > > >
> > > >   The R600/SI backend is a native code generator for GPUs.  There are
> > > >   very few completely Open Source compiler backends for GPUs, so
> > > >   adding this backend to LLVM will give the project a leg up over other
> > > >   compiler projects and help attract the attention of researchers and
> > > >   new developers.
> > > >
> > > > ++ Why do the R600/SI developers want this code in the main LLVM tree?
> > > >
> > > >   The main reason for this is to reduce the maintenance burden on
> > > >   developers so they have more time to spend improving the backend and
> > > >   core LLVM code.  Supporting multiple versions of LLVM in the Mesa
> > > >   tree will require a lot of ugly ifdefs, and it will be very difficult
> > > >   to take advantage of new features in Tablegen.  Also, changes to the
> > > >   backend will require testing against at least two version of LLVM.
> > > >   Moving the backend code into the main LLVM tree will allow the
> > > >   backend to evolve with the rest of the project and remove these
> > > >   extra burdens from developers.
> > > >
> > > > ++ What is the planned development model for the R600/SI backend?
> > > >
> > > >   If accepted into the main tree, all development on the R600/SI will
> > be based
> > > >   on the public LLVM tree.  The backend has been Open Source from the
> > beginning
> > > >   and there are no private internal trees.  All code developed for
> > this backend
> > > >   is pushed to the public tree as soon as it is ready.
> > > >
> > > > Looking forward to you comments.
> > > >
> > > > Thanks,
> > > > Tom Stellard
> > > >
> > > > [1] http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-July/051929.html
> > > > [2] http://www.mesa3d.org
> > > > _______________________________________________
> > > > llvm-commits mailing list
> > > > llvm-commits at cs.uiuc.edu
> > > > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >
> >
> > _______________________________________________
> > llvm-commits mailing list
> > llvm-commits at cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >
> >
> 
> 
> -- 
> 
> Thanks,
> 
> Justin Holewinski
-------------- next part --------------

diff --git test/CodeGen/R600/fadd.ll test/CodeGen/R600/fadd.ll
index 3d3d32b..d7d1b65 100644
--- test/CodeGen/R600/fadd.ll
+++ test/CodeGen/R600/fadd.ll
@@ -1,6 +1,6 @@
 ;RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
 
-; CHECK: ADD
+; CHECK: ADD T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
 
 define void @test() {
    %r0 = call float @llvm.R600.load.input(i32 0)
diff --git test/CodeGen/R600/fmul.ll test/CodeGen/R600/fmul.ll
index b59c578..eb1d523 100644
--- test/CodeGen/R600/fmul.ll
+++ test/CodeGen/R600/fmul.ll
@@ -1,6 +1,6 @@
 ;RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
 
-; CHECK: MUL
+; CHECK: MUL_IEEE T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
 
 define void @test() {
    %r0 = call float @llvm.R600.load.input(i32 0)
diff --git test/CodeGen/R600/fsub.ll test/CodeGen/R600/fsub.ll
index bb819ce..e938df2 100644
--- test/CodeGen/R600/fsub.ll
+++ test/CodeGen/R600/fsub.ll
@@ -1,6 +1,6 @@
 ;RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
 
-; CHECK: ADD
+; CHECK: ADD T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
 
 define void @test() {
    %r0 = call float @llvm.R600.load.input(i32 0)
diff --git test/CodeGen/R600/llvm.AMDGPU.cos.ll test/CodeGen/R600/llvm.AMDGPU.cos.ll
index d12b8ea..5b41a40 100644
--- test/CodeGen/R600/llvm.AMDGPU.cos.ll
+++ test/CodeGen/R600/llvm.AMDGPU.cos.ll
@@ -1,6 +1,6 @@
 ;RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
 
-;CHECK: COS
+;CHECK: COS T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
 
 define void @test() {
    %r0 = call float @llvm.R600.load.input(i32 0)
diff --git test/CodeGen/R600/llvm.AMDGPU.floor.ll test/CodeGen/R600/llvm.AMDGPU.floor.ll
index 3a4640c..a96419d 100644
--- test/CodeGen/R600/llvm.AMDGPU.floor.ll
+++ test/CodeGen/R600/llvm.AMDGPU.floor.ll
@@ -1,6 +1,6 @@
 ;RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
 
-;CHECK: FLOOR
+;CHECK: FLOOR T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
 
 define void @test() {
    %r0 = call float @llvm.R600.load.input(i32 0)
diff --git test/CodeGen/R600/llvm.AMDGPU.mul.ll test/CodeGen/R600/llvm.AMDGPU.mul.ll
index 2f1eecc..693eb27 100644
--- test/CodeGen/R600/llvm.AMDGPU.mul.ll
+++ test/CodeGen/R600/llvm.AMDGPU.mul.ll
@@ -1,6 +1,6 @@
 ;RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
 
-;CHECK: MUL
+;CHECK: MUL NON-IEEE T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
 
 define void @test() {
    %r0 = call float @llvm.R600.load.input(i32 0)
diff --git test/CodeGen/R600/llvm.AMDGPU.pow.ll test/CodeGen/R600/llvm.AMDGPU.pow.ll
index edc7972..0e1867a 100644
--- test/CodeGen/R600/llvm.AMDGPU.pow.ll
+++ test/CodeGen/R600/llvm.AMDGPU.pow.ll
@@ -1,8 +1,8 @@
 ;RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
 
-;CHECK: LOG_IEEE
-;CHECK-NEXT: MUL NON-IEEE
-;CHECK-NEXT: EXP_IEE
+;CHECK: LOG_IEEE T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
+;CHECK-NEXT: MUL NON-IEEE T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
+;CHECK-NEXT: EXP_IEEE T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
 
 define void @test() {
    %r0 = call float @llvm.R600.load.input(i32 0)
diff --git test/CodeGen/R600/llvm.AMDGPU.rcp.ll test/CodeGen/R600/llvm.AMDGPU.rcp.ll
index 681a0fb..6327be2 100644
--- test/CodeGen/R600/llvm.AMDGPU.rcp.ll
+++ test/CodeGen/R600/llvm.AMDGPU.rcp.ll
@@ -1,6 +1,6 @@
 ;RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
 
-;CHECK: RECIP_IEEE
+;CHECK: RECIP_IEEE T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
 
 define void @test() {
    %r0 = call float @llvm.R600.load.input(i32 0)
diff --git test/CodeGen/R600/llvm.AMDGPU.sin.ll test/CodeGen/R600/llvm.AMDGPU.sin.ll
index e465964..d0e0df8 100644
--- test/CodeGen/R600/llvm.AMDGPU.sin.ll
+++ test/CodeGen/R600/llvm.AMDGPU.sin.ll
@@ -1,6 +1,6 @@
 ;RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
 
-;CHECK: SIN
+;CHECK: SIN T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
 
 define void @test() {
    %r0 = call float @llvm.R600.load.input(i32 0)
diff --git test/CodeGen/R600/llvm.AMDGPU.trunc.ll test/CodeGen/R600/llvm.AMDGPU.trunc.ll
index 4ce78cc..fac957f 100644
--- test/CodeGen/R600/llvm.AMDGPU.trunc.ll
+++ test/CodeGen/R600/llvm.AMDGPU.trunc.ll
@@ -1,6 +1,6 @@
 ;RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
 
-;CHECK: TRUNC
+;CHECK: TRUNC T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
 
 define void @test() {
    %r0 = call float @llvm.R600.load.input(i32 0)
diff --git test/CodeGen/R600/llvm.AMDIL.fabs..ll test/CodeGen/R600/llvm.AMDIL.fabs..ll
index 9cb5e48..a059d73 100644
--- test/CodeGen/R600/llvm.AMDIL.fabs..ll
+++ test/CodeGen/R600/llvm.AMDIL.fabs..ll
@@ -1,6 +1,6 @@
 ;RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
 
-;CHECK: MOV
+;CHECK: MOV T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
 
 define void @test() {
    %r0 = call float @llvm.R600.load.input(i32 0)
diff --git test/CodeGen/R600/llvm.AMDIL.max..ll test/CodeGen/R600/llvm.AMDIL.max..ll
index 12ce5be..1a7e7d9 100644
--- test/CodeGen/R600/llvm.AMDIL.max..ll
+++ test/CodeGen/R600/llvm.AMDIL.max..ll
@@ -1,6 +1,6 @@
 ;RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
 
-;CHECK: MAX
+;CHECK: MAX T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
 
 define void @test() {
    %r0 = call float @llvm.R600.load.input(i32 0)
diff --git test/CodeGen/R600/llvm.AMDIL.min..ll test/CodeGen/R600/llvm.AMDIL.min..ll
index 8d8d45c..7c5f2fc 100644
--- test/CodeGen/R600/llvm.AMDIL.min..ll
+++ test/CodeGen/R600/llvm.AMDIL.min..ll
@@ -1,6 +1,6 @@
 ;RUN: llc < %s -march=r600 -mcpu=redwood | FileCheck %s
 
-;CHECK: MIN
+;CHECK: MIN T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], T[0-9]+\.[XYZW]}}
 
 define void @test() {
    %r0 = call float @llvm.R600.load.input(i32 0)
-------------- next part --------------
diff --git lib/Target/AMDGPU/R600RegisterInfo.td lib/Target/AMDGPU/R600RegisterInfo.td
index e1ac4fc..37345c6 100644
--- lib/Target/AMDGPU/R600RegisterInfo.td
+++ lib/Target/AMDGPU/R600RegisterInfo.td
@@ -14,14 +14,14 @@ class R600Reg_128<string n, list<Register> subregs, bits<16> encoding> :
 foreach Index = 0-127 in {
   foreach Chan = [ "X", "Y", "Z", "W" ] in {
     // 32-bit Temporary Registers
-    def T#Index#_#Chan : R600Reg <"T#Index#.#Chan", !cast<bits<16>>(Index)>;
+    def T#Index#_#Chan : R600Reg <"T"#Index#"."#Chan, !cast<bits<16>>(Index)>;
 
     // 32-bit Constant Registers (There are more than 128, this the number
     // that is currently supported.
-    def C#Index#_#Chan : R600Reg <"C#Index#.#Chan", !cast<bits<16>>(Index)>;
+    def C#Index#_#Chan : R600Reg <"C"#Index#"."#Chan, !cast<bits<16>>(Index)>;
   }
   // 128-bit Temporary Registers
-  def T#Index#_XYZW : R600Reg_128 <"T#Index#.XYZW",
+  def T#Index#_XYZW : R600Reg_128 <"T"#Index#".XYZW",
                                    [!cast<Register>("T"#Index#"_X"),
                                     !cast<Register>("T"#Index#"_Y"),
                                     !cast<Register>("T"#Index#"_Z"),
-------------- next part --------------
diff --git lib/Target/AMDGPU/InstPrinter/AMDGPUInstPrinter.cpp lib/Target/AMDGPU/InstPrinter/AMDGPUInstPrinter.cpp
index b6ab9b2..391c0e0 100644
--- lib/Target/AMDGPU/InstPrinter/AMDGPUInstPrinter.cpp
+++ lib/Target/AMDGPU/InstPrinter/AMDGPUInstPrinter.cpp
@@ -1,5 +1,6 @@
 
 #include "AMDGPUInstPrinter.h"
+#include "MCTargetDesc/AMDGPUMCTargetDesc.h"
 #include "llvm/MC/MCInst.h"
 
 using namespace llvm;
@@ -16,7 +17,11 @@ void AMDGPUInstPrinter::printOperand(const MCInst *MI, unsigned OpNo,
 
   const MCOperand &Op = MI->getOperand(OpNo);
   if (Op.isReg()) {
-    O << getRegisterName(Op.getReg());
+    switch (Op.getReg()) {
+    // This is the default predicate state, so we don't need to print it.
+    case AMDGPU::PRED_SEL_OFF: break;
+    default: O << getRegisterName(Op.getReg()); break;
+    }
   } else if (Op.isImm()) {
     O << Op.getImm();
   } else if (Op.isFPImm()) {
diff --git lib/Target/AMDGPU/R600Instructions.td lib/Target/AMDGPU/R600Instructions.td
index c1a0ed7..75eb3ec 100644
--- lib/Target/AMDGPU/R600Instructions.td
+++ lib/Target/AMDGPU/R600Instructions.td
@@ -84,7 +84,7 @@ class R600_1OP <bits<32> inst, string opName, list<dag> pattern,
   InstR600 <inst,
           (outs R600_Reg32:$dst),
           (ins R600_Reg32:$src, R600_Pred:$p, variable_ops),
-          !strconcat(opName, " $dst, $src ($p)"),
+          !strconcat(opName, " $dst, $src $p"),
           pattern,
           itin
   >;
-------------- next part --------------
diff --git lib/Target/AMDGPU/InstPrinter/AMDGPUInstPrinter.cpp lib/Target/AMDGPU/InstPrinter/AMDGPUInstPrinter.cpp
index 391c0e0..5e30255 100644
--- lib/Target/AMDGPU/InstPrinter/AMDGPUInstPrinter.cpp
+++ lib/Target/AMDGPU/InstPrinter/AMDGPUInstPrinter.cpp
@@ -1,3 +1,11 @@
+//===-- AMDGPUInstPrinter.cpp - AMDGPU MC Inst -> ASM ---------------------===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
 
 #include "AMDGPUInstPrinter.h"
 #include "MCTargetDesc/AMDGPUMCTargetDesc.h"
diff --git lib/Target/AMDGPU/InstPrinter/AMDGPUInstPrinter.h lib/Target/AMDGPU/InstPrinter/AMDGPUInstPrinter.h
index 62c1a5e..b4efa55 100644
--- lib/Target/AMDGPU/InstPrinter/AMDGPUInstPrinter.h
+++ lib/Target/AMDGPU/InstPrinter/AMDGPUInstPrinter.h
@@ -1,3 +1,11 @@
+//===-- AMDGPUInstPrinter.h - AMDGPU MC Inst -> ASM interface ---*- C++ -*-===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
 
 #ifndef AMDGPUINSTPRINTER_H
 #define AMDGPUINSTPRINTER_H
-------------- next part --------------
diff --git lib/Target/AMDGPU/SIRegisterInfo.td lib/Target/AMDGPU/SIRegisterInfo.td
index 8888c79..d99d017 100644
--- lib/Target/AMDGPU/SIRegisterInfo.td
+++ lib/Target/AMDGPU/SIRegisterInfo.td
@@ -66,7 +66,7 @@ def POS_FIXED_PT : SIReg <"POS_FIXED_PT">;
 
 // SGPR 32-bit registers
 foreach Index = 0-103 in {
-  def SGPR#Index : SGPR_32 <!cast<bits<16>>(Index), "SGPR#Index">;
+  def SGPR#Index : SGPR_32 <!cast<bits<16>>(Index), "SGPR"#Index>;
 }
 
 def SGPR_32 : RegisterClass<"AMDGPU", [f32, i32], 32,
@@ -97,7 +97,7 @@ def SGPR_256 : RegisterTuples<[sub0, sub1, sub2, sub3, sub4, sub5, sub6, sub7],
 
 // VGPR 32-bit registers
 foreach Index = 0-255 in {
-  def VGPR#Index : VGPR_32 <!cast<bits<16>>(Index), "VGPR#Index">;
+  def VGPR#Index : VGPR_32 <!cast<bits<16>>(Index), "VGPR"#Index>;
 }
 
 def VGPR_32 : RegisterClass<"AMDGPU", [f32, i32], 32,