[PATCH] R600/SI: Fix various operand legalization bugs

Thu Sep 25 12:40:14 PDT 2014

> From 50bc914204a37c7f314d65ca8a50581f3935e390 Mon Sep 17 00:00:00 2001
> From: Matt Arsenault <Matthew.Arsenault at amd.com>
> Date: Mon, 22 Sep 2014 19:30:31 -0400
> Subject: [PATCH 01/10] R600/SI: Fix using wrong operand indices when commuting
> 
> No test since the current SIISelLowering::legalizeOperands
> effectively hides this, and the general uses seem to only fire
> on SALU instructions which don't have modifiers between
> the operands.
> 
> When trying to use legalizeOperands immediately after
> instruction selection, it now sees a lot more patterns
> it did not see before which break on this.

I have a few comments, I would also like to test the final version of
these before they are committed.

> ---
>  lib/Target/R600/SIInstrInfo.cpp        | 31 ++++++++++++++++++++-----------
>  test/CodeGen/R600/llvm.AMDGPU.clamp.ll |  4 ++--
>  2 files changed, 22 insertions(+), 13 deletions(-)
> 

LGTM.

> From f3967d9b265e3e80a376d72799abe556d190bced Mon Sep 17 00:00:00 2001
> From: Matt Arsenault <Matthew.Arsenault at amd.com>
> Date: Tue, 23 Sep 2014 01:32:56 -0700
> Subject: [PATCH 02/10] R600/SI: Don't assert on exotic operand types
> 
> This needs a test, but I'm not exactly sure how I am getting
> the MO_GlobalAddress operand that hit here.

I think these come from constant address space variables.

LGTM.

> ---
>  lib/Target/R600/SIInstrInfo.cpp | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lib/Target/R600/SIInstrInfo.cpp b/lib/Target/R600/SIInstrInfo.cpp
> index d9f7146..d68858e 100644
> --- a/lib/Target/R600/SIInstrInfo.cpp
> +++ b/lib/Target/R600/SIInstrInfo.cpp
> @@ -705,7 +705,7 @@ MachineInstr *SIInstrInfo::commuteInstruction(MachineInstr *MI,
>  
>    if (Src1Idx != -1 && !MI->getOperand(Src1Idx).isReg()) {
>      // XXX: Commute instructions with FPImm operands
> -    if (NewMI || MI->getOperand(Src1Idx).isFPImm() ||
> +    if (NewMI || !MI->getOperand(Src1Idx).isImm() ||
>         (!isVOP2(MI->getOpcode()) && !isVOP3(MI->getOpcode()))) {
>        return nullptr;
>      }
> -- 
> 1.8.4.3
> 

> From 679655e6f32ce0091bce473eb8251c41ad76a7d8 Mon Sep 17 00:00:00 2001
> From: Matt Arsenault <Matthew.Arsenault at amd.com>
> Date: Mon, 22 Sep 2014 20:29:12 -0400
> Subject: [PATCH 03/10] R600/SI: Don't move operands that are required to be
>  SGPRs
> 
> e.g. v_cndmask_b32 requires the condition operand be an SGPR.
> If one of the source operands were an SGPR, that would be considered
> the one SGPR use and the condition operand would be illegally moved.
> ---
>  lib/Target/R600/SIInstrInfo.cpp | 21 ++++++++++++++++++++-
>  test/CodeGen/R600/v_cndmask.ll  | 42 ++++++++++++++++++++++++++++++++---------
>  2 files changed, 53 insertions(+), 10 deletions(-)
> 
> diff --git a/lib/Target/R600/SIInstrInfo.cpp b/lib/Target/R600/SIInstrInfo.cpp
> index d68858e..2f26861 100644
> --- a/lib/Target/R600/SIInstrInfo.cpp
> +++ b/lib/Target/R600/SIInstrInfo.cpp
> @@ -1358,8 +1358,27 @@ void SIInstrInfo::legalizeOperands(MachineInstr *MI) const {
>    // XXX - Do any VOP3 instructions read VCC?
>    // Legalize VOP3
>    if (isVOP3(MI->getOpcode())) {
> -    int VOP3Idx[3] = {Src0Idx, Src1Idx, Src2Idx};
> +    const MCInstrDesc &Desc = get(MI->getOpcode());
>      unsigned SGPRReg = AMDGPU::NoRegister;
> +
> +    int VOP3Idx[3] = { Src0Idx, Src1Idx, Src2Idx };
> +
> +    // First we need to consider the instruction's operand requirements before
> +    // legalizing. Some operands are required to be SGPRs, but we are still
> +    // bound by the constant bus requirement to only use one.
> +    //
> +    // If the operand's class is an SGPR, we can never move it.
> +    for (unsigned i = 0; i < 3; ++i) {
> +      int Idx = VOP3Idx[i];
> +      if (Idx == -1)
> +        break;
> +
> +      if (RI.isSGPRClassID(Desc.OpInfo[Idx].RegClass)) {
> +        SGPRReg = MI->getOperand(Idx).getReg();
> +        break;
> +      }
> +    }
> +

I think it may be possible to move some of this logic into
isOperandLegal.  You might want to look at patch #3 from here
first:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140915/235726.html

>      for (unsigned i = 0; i < 3; ++i) {
>        int Idx = VOP3Idx[i];
>        if (Idx == -1)
> diff --git a/test/CodeGen/R600/v_cndmask.ll b/test/CodeGen/R600/v_cndmask.ll
> index 84087ee..51e3d8b 100644
> --- a/test/CodeGen/R600/v_cndmask.ll
> +++ b/test/CodeGen/R600/v_cndmask.ll
> @@ -1,14 +1,38 @@
> -; RUN: llc < %s -march=r600 -mcpu=SI -verify-machineinstrs | FileCheck --check-prefix=SI %s
> +; RUN: llc -march=r600 -mcpu=SI -verify-machineinstrs < %s | FileCheck -check-prefix=SI %s
>  
> -; SI: @v_cnd_nan
> -; SI: V_CNDMASK_B32_e64 v{{[0-9]}},
> +declare i32 @llvm.r600.read.tidig.x() #1
> +
> +; SI-LABEL: @v_cnd_nan_nosgpr
> +; SI: V_CNDMASK_B32_e64 v{{[0-9]}}, v{{[0-9]}}, -1, s{{\[[0-9]+:[0-9]+\]}}
>  ; SI-DAG: v{{[0-9]}}
>  ; All nan values are converted to 0xffffffff
> -; SI-DAG: -1
> -define void @v_cnd_nan(float addrspace(1)* %out, i32 %c, float %f) {
> -entry:
> -  %0 = icmp ne i32 %c, 0
> -  %1 = select i1 %0, float 0xFFFFFFFFE0000000, float %f
> -  store float %1, float addrspace(1)* %out
> +; SI: S_ENDPGM
> +define void @v_cnd_nan_nosgpr(float addrspace(1)* %out, i32 %c, float addrspace(1)* %fptr) #0 {
> +  %idx = call i32 @llvm.r600.read.tidig.x() #1
> +  %f.gep = getelementptr float addrspace(1)* %fptr, i32 %idx
> +  %f = load float addrspace(1)* %fptr
> +  %setcc = icmp ne i32 %c, 0
> +  %select = select i1 %setcc, float 0xFFFFFFFFE0000000, float %f
> +  store float %select, float addrspace(1)* %out
>    ret void
>  }
> +
> +
> +; This requires slightly trickier SGPR operand legalization since the
> +; single constant bus SGPR usage is the last operand, and it should
> +; never be moved.
> +
> +; SI-LABEL: @v_cnd_nan
> +; SI: V_CNDMASK_B32_e64 v{{[0-9]}}, v{{[0-9]}}, -1, s{{\[[0-9]+:[0-9]+\]}}
> +; SI-DAG: v{{[0-9]}}
> +; All nan values are converted to 0xffffffff
> +; SI: S_ENDPGM
> +define void @v_cnd_nan(float addrspace(1)* %out, i32 %c, float %f) #0 {
> +  %setcc = icmp ne i32 %c, 0
> +  %select = select i1 %setcc, float 0xFFFFFFFFE0000000, float %f
> +  store float %select, float addrspace(1)* %out
> +  ret void
> +}
> +
> +attributes #0 = { nounwind }
> +attributes #1 = { nounwind readnone }
> -- 
> 1.8.4.3
> 

> From 6a9bcc4481581e507c431419240810a1ec6f9b5b Mon Sep 17 00:00:00 2001
> From: Matt Arsenault <Matthew.Arsenault at amd.com>
> Date: Mon, 22 Sep 2014 21:25:56 -0400
> Subject: [PATCH 04/10] R600/SI: Implement findCommutedOpIndices
> 
> The base implementation of commuteInstruction is used
> in some cases, but it turns out this has been broken for a
> long time since modifiers were inserted between the real operands.
> 
> The base implementation of commuteInstruction also fails on immediates,
> which also needs to be fixed.
> ---
>  lib/Target/R600/SIInstrInfo.cpp | 32 ++++++++++++++++++++++++++++++++
>  lib/Target/R600/SIInstrInfo.h   |  5 ++++-
>  2 files changed, 36 insertions(+), 1 deletion(-)
> 

LGTM.

> From 2debce9120a68d02c8fba9b8ef56b85fff8e05d1 Mon Sep 17 00:00:00 2001
> From: Matt Arsenault <Matthew.Arsenault at amd.com>
> Date: Mon, 22 Sep 2014 19:12:24 -0400
> Subject: [PATCH 05/10] R600/SI: Partially move operand legalization to
>  post-isel hook.
> 
> Disable the SGPR usage restriction parts of the DAG legalizeOperands.
> It now should only be doing immediate folding until it can be replaced
> later. The real legalization work is now done by the other
> SIInstrInfo::legalizeOperands

Nice, thanks for doing this.

> ---
>  lib/Target/R600/SIISelLowering.cpp | 60 ++++----------------------------------
>  lib/Target/R600/SIISelLowering.h   |  2 --
>  lib/Target/R600/SIInstrFormats.td  |  2 ++
>  lib/Target/R600/SIInstrInfo.cpp    | 39 ++++++++++++++++++-------
>  test/CodeGen/R600/fneg.f64.ll      |  2 +-
>  test/CodeGen/R600/fneg.ll          |  2 +-
>  test/CodeGen/R600/rotl.ll          |  2 +-
>  test/CodeGen/R600/seto.ll          |  8 ++---
>  test/CodeGen/R600/setuo.ll         |  8 ++---
>  9 files changed, 48 insertions(+), 77 deletions(-)
> 
> diff --git a/lib/Target/R600/SIISelLowering.cpp b/lib/Target/R600/SIISelLowering.cpp
> index 8a3ab46..2055208 100644
> --- a/lib/Target/R600/SIISelLowering.cpp
> +++ b/lib/Target/R600/SIISelLowering.cpp
> @@ -1648,57 +1648,6 @@ bool SITargetLowering::fitsRegClass(SelectionDAG &DAG, const SDValue &Op,
>    return TRI->getRegClass(RegClass)->hasSubClassEq(RC);
>  }
>  
> -/// \brief Make sure that we don't exeed the number of allowed scalars
> -void SITargetLowering::ensureSRegLimit(SelectionDAG &DAG, SDValue &Operand,
> -                                       unsigned RegClass,
> -                                       bool &ScalarSlotUsed) const {
> -
> -  if (!isVSrc(RegClass))
> -    return;
> -
> -  // First map the operands register class to a destination class
> -  switch (RegClass) {
> -    case AMDGPU::VSrc_32RegClassID:
> -    case AMDGPU::VCSrc_32RegClassID:
> -      RegClass = AMDGPU::VReg_32RegClassID;
> -      break;
> -    case AMDGPU::VSrc_64RegClassID:
> -    case AMDGPU::VCSrc_64RegClassID:
> -      RegClass = AMDGPU::VReg_64RegClassID;
> -      break;
> -   default:
> -    llvm_unreachable("Unknown vsrc reg class");
> -  }
> -
> -  // Nothing to do if they fit naturally
> -  if (fitsRegClass(DAG, Operand, RegClass))
> -    return;
> -
> -  // If the scalar slot isn't used yet use it now
> -  if (!ScalarSlotUsed) {
> -    ScalarSlotUsed = true;
> -    return;
> -  }
> -
> -  // This is a conservative aproach. It is possible that we can't determine the
> -  // correct register class and copy too often, but better safe than sorry.
> -
> -  SDNode *Node;
> -  // We can't use COPY_TO_REGCLASS with FrameIndex arguments.
> -  if (isa<FrameIndexSDNode>(Operand) ||
> -      isa<GlobalAddressSDNode>(Operand)) {
> -    unsigned Opcode = Operand.getValueType() == MVT::i32 ?
> -                      AMDGPU::S_MOV_B32 : AMDGPU::S_MOV_B64;
> -    Node = DAG.getMachineNode(Opcode, SDLoc(), Operand.getValueType(),
> -                              Operand);
> -  } else {
> -    SDValue RC = DAG.getTargetConstant(RegClass, MVT::i32);
> -    Node = DAG.getMachineNode(TargetOpcode::COPY_TO_REGCLASS, SDLoc(),
> -                              Operand.getValueType(), Operand, RC);
> -  }
> -  Operand = SDValue(Node, 0);
> -}
> -
>  /// \returns true if \p Node's operands are different from the SDValue list
>  /// \p Ops
>  static bool isNodeChanged(const SDNode *Node, const std::vector<SDValue> &Ops) {
> @@ -1710,8 +1659,9 @@ static bool isNodeChanged(const SDNode *Node, const std::vector<SDValue> &Ops) {
>    return false;
>  }
>  
> -/// \brief Try to commute instructions and insert copies in order to satisfy the
> -/// operand constraints.
> +/// TODO: This needs to be removed. It's current primary purpose is to fold
> +/// immediates into operands when legal. The legalization parts are redundant
> +/// with SIInstrInfo::legalizeOperands which is called in a post-isel hook.
>  SDNode *SITargetLowering::legalizeOperands(MachineSDNode *Node,
>                                             SelectionDAG &DAG) const {
>    // Original encoding (either e32 or e64)
> @@ -1787,7 +1737,7 @@ SDNode *SITargetLowering::legalizeOperands(MachineSDNode *Node,
>        // Try to fold the immediates
>        if (!foldImm(Ops[i], Immediate, ScalarSlotUsed)) {
>          // Folding didn't work, make sure we don't hit the SReg limit.
> -        ensureSRegLimit(DAG, Ops[i], RegClass, ScalarSlotUsed);
> +        //ensureSRegLimit(DAG, Ops[i], RegClass, ScalarSlotUsed);

Can this be deleted?
>        }
>        continue;
>      }
> @@ -1938,6 +1888,8 @@ void SITargetLowering::AdjustInstrPostInstrSelection(MachineInstr *MI,
>    const SIInstrInfo *TII = static_cast<const SIInstrInfo *>(
>        getTargetMachine().getSubtargetImpl()->getInstrInfo());
>  
> +  TII->legalizeOperands(MI);
> +
>    if (TII->isMIMG(MI->getOpcode())) {
>      unsigned VReg = MI->getOperand(0).getReg();
>      unsigned Writemask = MI->getOperand(1).getImm();
> diff --git a/lib/Target/R600/SIISelLowering.h b/lib/Target/R600/SIISelLowering.h
> index f953b48..9cf4dbc 100644
> --- a/lib/Target/R600/SIISelLowering.h
> +++ b/lib/Target/R600/SIISelLowering.h
> @@ -47,8 +47,6 @@ class SITargetLowering : public AMDGPUTargetLowering {
>                                                  const SDValue &Op) const;
>    bool fitsRegClass(SelectionDAG &DAG, const SDValue &Op,
>                      unsigned RegClass) const;
> -  void ensureSRegLimit(SelectionDAG &DAG, SDValue &Operand,
> -                       unsigned RegClass, bool &ScalarSlotUsed) const;
>  
>    SDNode *legalizeOperands(MachineSDNode *N, SelectionDAG &DAG) const;
>    void adjustWritemask(MachineSDNode *&N, SelectionDAG &DAG) const;
> diff --git a/lib/Target/R600/SIInstrFormats.td b/lib/Target/R600/SIInstrFormats.td
> index 8369a0c..acf9b55 100644
> --- a/lib/Target/R600/SIInstrFormats.td
> +++ b/lib/Target/R600/SIInstrFormats.td
> @@ -42,6 +42,8 @@ class InstSI <dag outs, dag ins, string asm, list<dag> pattern> :
>    let TSFlags{10} = MUBUF;
>    let TSFlags{11} = MTBUF;
>    let TSFlags{12} = FLAT;
> +
> +  let hasPostISelHook = 1; // Adjusted to no return version.

This comment seems wrong.

>  }
>  
>  class Enc32 {
> diff --git a/lib/Target/R600/SIInstrInfo.cpp b/lib/Target/R600/SIInstrInfo.cpp
> index f8ccecd..76a9413 100644
> --- a/lib/Target/R600/SIInstrInfo.cpp
> +++ b/lib/Target/R600/SIInstrInfo.cpp
> @@ -1395,22 +1395,41 @@ void SIInstrInfo::legalizeOperands(MachineInstr *MI) const {
>  
>      int VOP3Idx[3] = { Src0Idx, Src1Idx, Src2Idx };
>  
> -    // First we need to consider the instruction's operand requirements before
> -    // legalizing. Some operands are required to be SGPRs, but we are still
> -    // bound by the constant bus requirement to only use one.
> -    //
> -    // If the operand's class is an SGPR, we can never move it.
> -    for (unsigned i = 0; i < 3; ++i) {
> -      int Idx = VOP3Idx[i];
> -      if (Idx == -1)
> +    for (const MachineOperand &MO : MI->implicit_operands()) {
> +      // We only care about reads.
> +      if (MO.isDef())
> +        continue;
> +
> +      if (MO.getReg() == AMDGPU::VCC) {
> +        SGPRReg = AMDGPU::VCC;
>          break;
> +      }
>  
> -      if (RI.isSGPRClassID(Desc.OpInfo[Idx].RegClass)) {
> -        SGPRReg = MI->getOperand(Idx).getReg();
> +      if (MO.getReg() == AMDGPU::FLAT_SCR) {
> +        SGPRReg = AMDGPU::FLAT_SCR;
>          break;
>        }
>      }
>  
> +
> +    if (SGPRReg == AMDGPU::NoRegister) {
> +      // First we need to consider the instruction's operand requirements before
> +      // legalizing. Some operands are required to be SGPRs, but we are still
> +      // bound by the constant bus requirement to only use one.
> +      //
> +      // If the operand's class is an SGPR, we can never move it.
> +      for (unsigned i = 0; i < 3; ++i) {
> +        int Idx = VOP3Idx[i];
> +        if (Idx == -1)
> +          break;
> +
> +        if (RI.isSGPRClassID(Desc.OpInfo[Idx].RegClass)) {
> +          SGPRReg = MI->getOperand(Idx).getReg();
> +          break;
> +        }
> +      }
> +    }
> +
>      for (unsigned i = 0; i < 3; ++i) {
>        int Idx = VOP3Idx[i];
>        if (Idx == -1)
> diff --git a/test/CodeGen/R600/fneg.f64.ll b/test/CodeGen/R600/fneg.f64.ll
> index 61d9513..f0b341b 100644
> --- a/test/CodeGen/R600/fneg.f64.ll
> +++ b/test/CodeGen/R600/fneg.f64.ll
> @@ -50,7 +50,7 @@ define void @fneg_free_f64(double addrspace(1)* %out, i64 %in) {
>  ; SI-LABEL: @fneg_fold
>  ; SI: S_LOAD_DWORDX2 [[NEG_VALUE:s\[[0-9]+:[0-9]+\]]], {{s\[[0-9]+:[0-9]+\]}}, 0xb
>  ; SI-NOT: XOR
> -; SI: V_MUL_F64 {{v\[[0-9]+:[0-9]+\]}}, -[[NEG_VALUE]], {{v\[[0-9]+:[0-9]+\]}}
> +; SI: V_MUL_F64 {{v\[[0-9]+:[0-9]+\]}}, -[[NEG_VALUE]], [[NEG_VALUE]]
>  define void @fneg_fold_f64(double addrspace(1)* %out, double %in) {
>    %fsub = fsub double -0.0, %in
>    %fmul = fmul double %fsub, %in
> diff --git a/test/CodeGen/R600/fneg.ll b/test/CodeGen/R600/fneg.ll
> index 72cd15c..8631301 100644
> --- a/test/CodeGen/R600/fneg.ll
> +++ b/test/CodeGen/R600/fneg.ll
> @@ -59,7 +59,7 @@ define void @fneg_free_f32(float addrspace(1)* %out, i32 %in) {
>  ; FUNC-LABEL: @fneg_fold
>  ; SI: S_LOAD_DWORD [[NEG_VALUE:s[0-9]+]], s[{{[0-9]+:[0-9]+}}], 0xb
>  ; SI-NOT: XOR
> -; SI: V_MUL_F32_e64 v{{[0-9]+}}, -[[NEG_VALUE]], v{{[0-9]+}}
> +; SI: V_MUL_F32_e64 v{{[0-9]+}}, -[[NEG_VALUE]], [[NEG_VALUE]]
>  define void @fneg_fold_f32(float addrspace(1)* %out, float %in) {
>    %fsub = fsub float -0.0, %in
>    %fmul = fmul float %fsub, %in
> diff --git a/test/CodeGen/R600/rotl.ll b/test/CodeGen/R600/rotl.ll
> index 8c86fb5..a9dee8c 100644
> --- a/test/CodeGen/R600/rotl.ll
> +++ b/test/CodeGen/R600/rotl.ll
> @@ -8,7 +8,7 @@
>  
>  ; SI: S_SUB_I32 [[SDST:s[0-9]+]], 32, {{[s][0-9]+}}
>  ; SI: V_MOV_B32_e32 [[VDST:v[0-9]+]], [[SDST]]
> -; SI: V_ALIGNBIT_B32 {{v[0-9]+, [s][0-9]+, v[0-9]+}}, [[VDST]]
> +; SI: V_ALIGNBIT_B32 {{v[0-9]+, [s][0-9]+, s[0-9]+}}, [[VDST]]
>  define void @rotl_i32(i32 addrspace(1)* %in, i32 %x, i32 %y) {
>  entry:
>    %0 = shl i32 %x, %y
> diff --git a/test/CodeGen/R600/seto.ll b/test/CodeGen/R600/seto.ll
> index cc942c1..eb1176f 100644
> --- a/test/CodeGen/R600/seto.ll
> +++ b/test/CodeGen/R600/seto.ll
> @@ -1,8 +1,8 @@
> -;RUN: llc < %s -march=r600 -mcpu=verde -verify-machineinstrs | FileCheck %s
> -
> -;CHECK-LABEL: @main
> -;CHECK: V_CMP_O_F32_e32 vcc, {{[sv][0-9]+, v[0-9]+}}
> +; RUN: llc -march=r600 -mcpu=verde -verify-machineinstrs < %s | FileCheck %s
>  
> +; CHECK-LABEL: @main
> +; CHECK: V_CMP_O_F32_e64 [[CMP:s\[[0-9]+:[0-9]+\]]], [[SREG:s[0-9]+]], [[SREG]]
> +; CHECK-NEXT: V_CNDMASK_B32_e64 {{v[0-9]+}}, 0, 1.0, [[CMP]]
>  define void @main(float %p) {
>  main_body:
>    %c = fcmp oeq float %p, %p
> diff --git a/test/CodeGen/R600/setuo.ll b/test/CodeGen/R600/setuo.ll
> index 33007fc..a78e8e6 100644
> --- a/test/CodeGen/R600/setuo.ll
> +++ b/test/CodeGen/R600/setuo.ll
> @@ -1,8 +1,8 @@
> -;RUN: llc < %s -march=r600 -mcpu=verde -verify-machineinstrs | FileCheck %s
> -
> -;CHECK-LABEL: @main
> -;CHECK: V_CMP_U_F32_e32 vcc, {{[sv][0-9]+, v[0-9]+}}
> +; RUN: llc -march=r600 -mcpu=verde -verify-machineinstrs < %s | FileCheck %s
>  
> +; CHECK-LABEL: @main
> +; CHECK: V_CMP_U_F32_e64 [[CMP:s\[[0-9]+:[0-9]+\]]], [[SREG:s[0-9]+]], [[SREG]]
> +; CHECK-NEXT: V_CNDMASK_B32_e64 {{v[0-9]+}}, 0, 1.0, [[CMP]]
>  define void @main(float %p) {
>  main_body:
>    %c = fcmp une float %p, %p
> -- 
> 1.8.4.3
> 

> From 31fc32bd9656c37101895b36b537b61dd79d3793 Mon Sep 17 00:00:00 2001
> From: Matt Arsenault <Matthew.Arsenault at amd.com>
> Date: Mon, 22 Sep 2014 23:50:40 -0400
> Subject: [PATCH 06/10] R600/SI Allow same SGPR to be used for multiple
>  operands
> 
> Instead of moving the first SGPR that is different than the first,
> legalize the operand that requires the fewest moves if one
> SGPR is used for multiple operands.
> 
> This saves extra moves and is also required for some instructions
> which require that the same operand be used for multiple operands.
> ---
>  lib/Target/R600/SIInstrInfo.cpp              | 38 +++++++++--
>  test/CodeGen/R600/use-sgpr-multiple-times.ll | 96 ++++++++++++++++++++++++++++
>  2 files changed, 129 insertions(+), 5 deletions(-)
>  create mode 100644 test/CodeGen/R600/use-sgpr-multiple-times.ll
> 
> diff --git a/lib/Target/R600/SIInstrInfo.cpp b/lib/Target/R600/SIInstrInfo.cpp
> index 76a9413..e31947a 100644
> --- a/lib/Target/R600/SIInstrInfo.cpp
> +++ b/lib/Target/R600/SIInstrInfo.cpp
> @@ -1391,10 +1391,12 @@ void SIInstrInfo::legalizeOperands(MachineInstr *MI) const {
>    // Legalize VOP3
>    if (isVOP3(MI->getOpcode())) {
>      const MCInstrDesc &Desc = get(MI->getOpcode());
> -    unsigned SGPRReg = AMDGPU::NoRegister;
>  
>      int VOP3Idx[3] = { Src0Idx, Src1Idx, Src2Idx };
>  
> +    // Find the one SGPR operand we are allowed to use.
> +    unsigned SGPRReg = AMDGPU::NoRegister;
> +
>      for (const MachineOperand &MO : MI->implicit_operands()) {
>        // We only care about reads.
>        if (MO.isDef())
> @@ -1411,8 +1413,9 @@ void SIInstrInfo::legalizeOperands(MachineInstr *MI) const {
>        }
>      }
>  
> -
>      if (SGPRReg == AMDGPU::NoRegister) {
> +      unsigned UsedSGPRs[3] = { AMDGPU::NoRegister };
> +
>        // First we need to consider the instruction's operand requirements before
>        // legalizing. Some operands are required to be SGPRs, but we are still
>        // bound by the constant bus requirement to only use one.
> @@ -1423,9 +1426,33 @@ void SIInstrInfo::legalizeOperands(MachineInstr *MI) const {
>          if (Idx == -1)
>            break;
>  
> -        if (RI.isSGPRClassID(Desc.OpInfo[Idx].RegClass)) {
> -          SGPRReg = MI->getOperand(Idx).getReg();
> -          break;
> +        const MachineOperand &MO = MI->getOperand(Idx);
> +        if (RI.isSGPRClassID(Desc.OpInfo[Idx].RegClass))
> +          SGPRReg = MO.getReg();
> +
> +        if (MO.isReg() && RI.isSGPRClass(MRI.getRegClass(MO.getReg())))
> +          UsedSGPRs[i] = MO.getReg();
> +      }
> +
> +      if (SGPRReg == AMDGPU::NoRegister) {
> +        // We don't have a required SGPR operand, so we have a bit more freedom in
> +        // selecting operands to move.
> +
> +        // Try to select the most used SGPR. If an SGPR is equal to one of the
> +        // others, we choose that.
> +        //
> +        // e.g.
> +        // V_FMA_F32 v0, s0, s0, s0 -> No moves
> +        // V_FMA_F32 v0, s0, s1, s0 -> Move s1
> +
> +        if (UsedSGPRs[0] != AMDGPU::NoRegister) {
> +          if (UsedSGPRs[0] == UsedSGPRs[1] || UsedSGPRs[0] == UsedSGPRs[2])
> +            SGPRReg = UsedSGPRs[0];
> +        }
> +
> +        if (SGPRReg == AMDGPU::NoRegister && UsedSGPRs[1] != AMDGPU::NoRegister) {
> +          if (UsedSGPRs[1] == UsedSGPRs[2])
> +            SGPRReg = UsedSGPRs[1];
>          }
>        }
>      }
> @@ -1452,6 +1479,7 @@ void SIInstrInfo::legalizeOperands(MachineInstr *MI) const {
>          // an inline constant which is always legal.
>          continue;
>        }
> +

Extra whitespace.

>        // If we make it this far, then the operand is not legal and we must
>        // legalize it.
>        legalizeOpWithMove(MI, Idx);
> diff --git a/test/CodeGen/R600/use-sgpr-multiple-times.ll b/test/CodeGen/R600/use-sgpr-multiple-times.ll
> new file mode 100644
> index 0000000..5c00718
> --- /dev/null
> +++ b/test/CodeGen/R600/use-sgpr-multiple-times.ll
> @@ -0,0 +1,96 @@
> +; RUN: llc -march=r600 -mcpu=SI -verify-machineinstrs < %s | FileCheck -check-prefix=SI %s
> +
> +declare float @llvm.fma.f32(float, float, float) #1
> +declare float @llvm.fmuladd.f32(float, float, float) #1
> +declare i32 @llvm.AMDGPU.imad24(i32, i32, i32) #1
> +
> +
> +; SI-LABEL: @test_sgpr_use_twice_binop:
> +; SI: S_LOAD_DWORD [[SGPR:s[0-9]+]],
> +; SI: V_ADD_F32_e64 [[RESULT:v[0-9]+]], [[SGPR]], [[SGPR]]
> +; SI: BUFFER_STORE_DWORD [[RESULT]]
> +define void @test_sgpr_use_twice_binop(float addrspace(1)* %out, float %a) #0 {
> +  %dbl = fadd float %a, %a
> +  store float %dbl, float addrspace(1)* %out, align 4
> +  ret void
> +}
> +
> +; SI-LABEL: @test_sgpr_use_three_ternary_op:
> +; SI: S_LOAD_DWORD [[SGPR:s[0-9]+]],
> +; SI: V_FMA_F32 [[RESULT:v[0-9]+]], [[SGPR]], [[SGPR]], [[SGPR]]
> +; SI: BUFFER_STORE_DWORD [[RESULT]]
> +define void @test_sgpr_use_three_ternary_op(float addrspace(1)* %out, float %a) #0 {
> +  %fma = call float @llvm.fma.f32(float %a, float %a, float %a) #1
> +  store float %fma, float addrspace(1)* %out, align 4
> +  ret void
> +}
> +
> +; SI-LABEL: @test_sgpr_use_twice_ternary_op_a_a_b:
> +; SI: S_LOAD_DWORD [[SGPR0:s[0-9]+]], s{{\[[0-9]+:[0-9]+\]}}, 0xb
> +; SI: S_LOAD_DWORD [[SGPR1:s[0-9]+]], s{{\[[0-9]+:[0-9]+\]}}, 0xc
> +; SI: V_MOV_B32_e32 [[VGPR1:v[0-9]+]], [[SGPR1]]
> +; SI: V_FMA_F32 [[RESULT:v[0-9]+]], [[SGPR0]], [[SGPR0]], [[VGPR1]]
> +; SI: BUFFER_STORE_DWORD [[RESULT]]
> +define void @test_sgpr_use_twice_ternary_op_a_a_b(float addrspace(1)* %out, float %a, float %b) #0 {
> +  %fma = call float @llvm.fma.f32(float %a, float %a, float %b) #1
> +  store float %fma, float addrspace(1)* %out, align 4
> +  ret void
> +}
> +
> +; SI-LABEL: @test_sgpr_use_twice_ternary_op_a_b_a:
> +; SI: S_LOAD_DWORD [[SGPR0:s[0-9]+]], s{{\[[0-9]+:[0-9]+\]}}, 0xb
> +; SI: S_LOAD_DWORD [[SGPR1:s[0-9]+]], s{{\[[0-9]+:[0-9]+\]}}, 0xc
> +; SI: V_MOV_B32_e32 [[VGPR1:v[0-9]+]], [[SGPR1]]
> +; SI: V_FMA_F32 [[RESULT:v[0-9]+]], [[SGPR0]], [[VGPR1]], [[SGPR0]]
> +; SI: BUFFER_STORE_DWORD [[RESULT]]
> +define void @test_sgpr_use_twice_ternary_op_a_b_a(float addrspace(1)* %out, float %a, float %b) #0 {
> +  %fma = call float @llvm.fma.f32(float %a, float %b, float %a) #1
> +  store float %fma, float addrspace(1)* %out, align 4
> +  ret void
> +}
> +
> +; SI-LABEL: @test_sgpr_use_twice_ternary_op_b_a_a:
> +; SI: S_LOAD_DWORD [[SGPR0:s[0-9]+]], s{{\[[0-9]+:[0-9]+\]}}, 0xb
> +; SI: S_LOAD_DWORD [[SGPR1:s[0-9]+]], s{{\[[0-9]+:[0-9]+\]}}, 0xc
> +; SI: V_MOV_B32_e32 [[VGPR1:v[0-9]+]], [[SGPR1]]
> +; SI: V_FMA_F32 [[RESULT:v[0-9]+]], [[VGPR1]], [[SGPR0]], [[SGPR0]]
> +; SI: BUFFER_STORE_DWORD [[RESULT]]
> +define void @test_sgpr_use_twice_ternary_op_b_a_a(float addrspace(1)* %out, float %a, float %b) #0 {
> +  %fma = call float @llvm.fma.f32(float %b, float %a, float %a) #1
> +  store float %fma, float addrspace(1)* %out, align 4
> +  ret void
> +}
> +
> +; SI-LABEL: @test_sgpr_use_twice_ternary_op_a_a_imm:
> +; SI: S_LOAD_DWORD [[SGPR:s[0-9]+]]
> +; SI: V_FMA_F32 [[RESULT:v[0-9]+]], [[SGPR]], [[SGPR]], 2.0
> +; SI: BUFFER_STORE_DWORD [[RESULT]]
> +define void @test_sgpr_use_twice_ternary_op_a_a_imm(float addrspace(1)* %out, float %a) #0 {
> +  %fma = call float @llvm.fma.f32(float %a, float %a, float 2.0) #1
> +  store float %fma, float addrspace(1)* %out, align 4
> +  ret void
> +}
> +
> +; SI-LABEL: @test_sgpr_use_twice_ternary_op_a_imm_a:
> +; SI: S_LOAD_DWORD [[SGPR:s[0-9]+]]
> +; SI: V_FMA_F32 [[RESULT:v[0-9]+]], [[SGPR]], 2.0, [[SGPR]]
> +; SI: BUFFER_STORE_DWORD [[RESULT]]
> +define void @test_sgpr_use_twice_ternary_op_a_imm_a(float addrspace(1)* %out, float %a) #0 {
> +  %fma = call float @llvm.fma.f32(float %a, float 2.0, float %a) #1
> +  store float %fma, float addrspace(1)* %out, align 4
> +  ret void
> +}
> +
> +; Don't use fma since fma c, x, y is canonicalized to fma x, c, y
> +; SI-LABEL: @test_sgpr_use_twice_ternary_op_imm_a_a:
> +; SI: S_LOAD_DWORD [[SGPR:s[0-9]+]]
> +; SI: V_MAD_I32_I24 [[RESULT:v[0-9]+]], 2, [[SGPR]], [[SGPR]]
> +; SI: BUFFER_STORE_DWORD [[RESULT]]
> +define void @test_sgpr_use_twice_ternary_op_imm_a_a(i32 addrspace(1)* %out, i32 %a) #0 {
> +  %fma = call i32 @llvm.AMDGPU.imad24(i32 2, i32 %a, i32 %a) #1
> +  store i32 %fma, i32 addrspace(1)* %out, align 4
> +  ret void
> +}
> +
> +attributes #0 = { nounwind }
> +attributes #1 = { nounwind readnone }
> -- 
> 1.8.4.3
> 

> From f87915da2e01e4d44f0b8d6f5a4b3a1e1f5812e4 Mon Sep 17 00:00:00 2001
> From: Matt Arsenault <Matthew.Arsenault at amd.com>
> Date: Tue, 23 Sep 2014 18:22:58 -0700
> Subject: [PATCH 07/10] R600/SI: Move finding SGPR operand to move to separate
>  function
> 
> ---
>  lib/Target/R600/SIInstrInfo.cpp | 132 +++++++++++++++++++++-------------------
>  lib/Target/R600/SIInstrInfo.h   |   2 +
>  2 files changed, 71 insertions(+), 63 deletions(-)
> 

LGTM.

> From 0f0180485cfd44ec5b459010053ba9c0f1858ac1 Mon Sep 17 00:00:00 2001
> From: Matt Arsenault <Matthew.Arsenault at amd.com>
> Date: Tue, 23 Sep 2014 00:23:30 -0400
> Subject: [PATCH 08/10] R600/SI: Add a note about the order of the operands to
>  div_scale
> 
> ---
>  lib/Target/R600/AMDGPUISelLowering.cpp | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
>

LGTM.

> From 8898b9e1f8c8e87285e7f2f5b5d58ef8325b6b69 Mon Sep 17 00:00:00 2001
> From: Matt Arsenault <Matthew.Arsenault at amd.com>
> Date: Tue, 23 Sep 2014 00:30:35 -0400
> Subject: [PATCH 09/10] R600/SI: Add strict check lines to div_scale tests.
> 
> This has weird operand requirements so it's worthwhile
> to have very strict checks for its operands.
> 
> Add different combinations of SGPR operands.
> ---
>  test/CodeGen/R600/llvm.AMDGPU.div_scale.ll | 271 +++++++++++++++++++++++++++--
>  1 file changed, 255 insertions(+), 16 deletions(-)
> 

LGTM.

> From 37f4ce315b7a6a513c7c9e22eb20aa4e671c4980 Mon Sep 17 00:00:00 2001
> From: Matt Arsenault <Matthew.Arsenault at amd.com>
> Date: Tue, 23 Sep 2014 17:39:25 -0700
> Subject: [PATCH 10/10] R600/SI: Use break instead of continue
> 
> If an instruction doesn't have src1, it doesn't have src2
> ---
>  lib/Target/R600/SIInstrInfo.cpp | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 

LGTM.