[PATCH] R600/SI: Fix various operand legalization bugs

Thu Sep 25 17:36:41 PDT 2014

On Sep 25, 2014, at 12:40 PM, Tom Stellard <tom at stellard.net> wrote:

>> From 50bc914204a37c7f314d65ca8a50581f3935e390 Mon Sep 17 00:00:00 2001
>> From: Matt Arsenault <Matthew.Arsenault at amd.com>
>> Date: Mon, 22 Sep 2014 19:30:31 -0400
>> Subject: [PATCH 01/10] R600/SI: Fix using wrong operand indices when commuting
>> 
>> No test since the current SIISelLowering::legalizeOperands
>> effectively hides this, and the general uses seem to only fire
>> on SALU instructions which don't have modifiers between
>> the operands.
>> 
>> When trying to use legalizeOperands immediately after
>> instruction selection, it now sees a lot more patterns
>> it did not see before which break on this.
> 
> I have a few comments, I would also like to test the final version of
> these before they are committed.

New versions attached, although I’ve only changed comments and things like that

> 
>> ---
>> lib/Target/R600/SIInstrInfo.cpp        | 31 ++++++++++++++++++++-----------
>> test/CodeGen/R600/llvm.AMDGPU.clamp.ll |  4 ++--
>> 2 files changed, 22 insertions(+), 13 deletions(-)
>> 
> 
> LGTM.
> 
>> From f3967d9b265e3e80a376d72799abe556d190bced Mon Sep 17 00:00:00 2001
>> From: Matt Arsenault <Matthew.Arsenault at amd.com>
>> Date: Tue, 23 Sep 2014 01:32:56 -0700
>> Subject: [PATCH 02/10] R600/SI: Don't assert on exotic operand types
>> 
>> This needs a test, but I'm not exactly sure how I am getting
>> the MO_GlobalAddress operand that hit here.
> 
> I think these come from constant address space variables.
> 
> LGTM.

I think the only reason I was hitting this was because of a bug in one of the other patches when I was working on it where I used the wrong indices. Right now I’m only managing to get here with a scalar add off the constant address that eventually becomes the add to after the text section, so that never gets past the isVOP2 / isVOP3 check, although I guess it’s theoretically possible it could happen.

> 
>> ---
>> lib/Target/R600/SIInstrInfo.cpp | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/lib/Target/R600/SIInstrInfo.cpp b/lib/Target/R600/SIInstrInfo.cpp
>> index d9f7146..d68858e 100644
>> --- a/lib/Target/R600/SIInstrInfo.cpp
>> +++ b/lib/Target/R600/SIInstrInfo.cpp
>> @@ -705,7 +705,7 @@ MachineInstr *SIInstrInfo::commuteInstruction(MachineInstr *MI,
>> 
>>   if (Src1Idx != -1 && !MI->getOperand(Src1Idx).isReg()) {
>>     // XXX: Commute instructions with FPImm operands
>> -    if (NewMI || MI->getOperand(Src1Idx).isFPImm() ||
>> +    if (NewMI || !MI->getOperand(Src1Idx).isImm() ||
>>        (!isVOP2(MI->getOpcode()) && !isVOP3(MI->getOpcode()))) {
>>       return nullptr;
>>     }
>> -- 
>> 1.8.4.3
>> 
> 
>> From 679655e6f32ce0091bce473eb8251c41ad76a7d8 Mon Sep 17 00:00:00 2001
>> From: Matt Arsenault <Matthew.Arsenault at amd.com>
>> Date: Mon, 22 Sep 2014 20:29:12 -0400
>> Subject: [PATCH 03/10] R600/SI: Don't move operands that are required to be
>> SGPRs
>> 
>> e.g. v_cndmask_b32 requires the condition operand be an SGPR.
>> If one of the source operands were an SGPR, that would be considered
>> the one SGPR use and the condition operand would be illegally moved.
>> ---
>> lib/Target/R600/SIInstrInfo.cpp | 21 ++++++++++++++++++++-
>> test/CodeGen/R600/v_cndmask.ll  | 42 ++++++++++++++++++++++++++++++++---------
>> 2 files changed, 53 insertions(+), 10 deletions(-)
>> 
>> diff --git a/lib/Target/R600/SIInstrInfo.cpp b/lib/Target/R600/SIInstrInfo.cpp
>> index d68858e..2f26861 100644
>> --- a/lib/Target/R600/SIInstrInfo.cpp
>> +++ b/lib/Target/R600/SIInstrInfo.cpp
>> @@ -1358,8 +1358,27 @@ void SIInstrInfo::legalizeOperands(MachineInstr *MI) const {
>>   // XXX - Do any VOP3 instructions read VCC?
>>   // Legalize VOP3
>>   if (isVOP3(MI->getOpcode())) {
>> -    int VOP3Idx[3] = {Src0Idx, Src1Idx, Src2Idx};
>> +    const MCInstrDesc &Desc = get(MI->getOpcode());
>>     unsigned SGPRReg = AMDGPU::NoRegister;
>> +
>> +    int VOP3Idx[3] = { Src0Idx, Src1Idx, Src2Idx };
>> +
>> +    // First we need to consider the instruction's operand requirements before
>> +    // legalizing. Some operands are required to be SGPRs, but we are still
>> +    // bound by the constant bus requirement to only use one.
>> +    //
>> +    // If the operand's class is an SGPR, we can never move it.
>> +    for (unsigned i = 0; i < 3; ++i) {
>> +      int Idx = VOP3Idx[i];
>> +      if (Idx == -1)
>> +        break;
>> +
>> +      if (RI.isSGPRClassID(Desc.OpInfo[Idx].RegClass)) {
>> +        SGPRReg = MI->getOperand(Idx).getReg();
>> +        break;
>> +      }
>> +    }
>> +
> 
> I think it may be possible to move some of this logic into
> isOperandLegal.  You might want to look at patch #3 from here
> first:
> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140915/235726.html

I’m not sure how. I think isOperandLegal should only consider one operand at a time, and this looks over all of the operands at once. The way isOperandLegal checks all operands of the instruction after looking at one that uses the constant bus is sort of confusing, since it makes it more of a areInstructionOperandsLegal

> 
> 
>>     for (unsigned i = 0; i < 3; ++i) {
>>       int Idx = VOP3Idx[i];
>>       if (Idx == -1)
>> diff --git a/test/CodeGen/R600/v_cndmask.ll b/test/CodeGen/R600/v_cndmask.ll
>> index 84087ee..51e3d8b 100644
>> --- a/test/CodeGen/R600/v_cndmask.ll
>> +++ b/test/CodeGen/R600/v_cndmask.ll
>> @@ -1,14 +1,38 @@
>> -; RUN: llc < %s -march=r600 -mcpu=SI -verify-machineinstrs | FileCheck --check-prefix=SI %s
>> +; RUN: llc -march=r600 -mcpu=SI -verify-machineinstrs < %s | FileCheck -check-prefix=SI %s
>> 
>> -; SI: @v_cnd_nan
>> -; SI: V_CNDMASK_B32_e64 v{{[0-9]}},
>> +declare i32 @llvm.r600.read.tidig.x() #1
>> +
>> +; SI-LABEL: @v_cnd_nan_nosgpr
>> +; SI: V_CNDMASK_B32_e64 v{{[0-9]}}, v{{[0-9]}}, -1, s{{\[[0-9]+:[0-9]+\]}}
>> ; SI-DAG: v{{[0-9]}}
>> ; All nan values are converted to 0xffffffff
>> -; SI-DAG: -1
>> -define void @v_cnd_nan(float addrspace(1)* %out, i32 %c, float %f) {
>> -entry:
>> -  %0 = icmp ne i32 %c, 0
>> -  %1 = select i1 %0, float 0xFFFFFFFFE0000000, float %f
>> -  store float %1, float addrspace(1)* %out
>> +; SI: S_ENDPGM
>> +define void @v_cnd_nan_nosgpr(float addrspace(1)* %out, i32 %c, float addrspace(1)* %fptr) #0 {
>> +  %idx = call i32 @llvm.r600.read.tidig.x() #1
>> +  %f.gep = getelementptr float addrspace(1)* %fptr, i32 %idx
>> +  %f = load float addrspace(1)* %fptr
>> +  %setcc = icmp ne i32 %c, 0
>> +  %select = select i1 %setcc, float 0xFFFFFFFFE0000000, float %f
>> +  store float %select, float addrspace(1)* %out
>>   ret void
>> }
>> +
>> +
>> +; This requires slightly trickier SGPR operand legalization since the
>> +; single constant bus SGPR usage is the last operand, and it should
>> +; never be moved.
>> +
>> +; SI-LABEL: @v_cnd_nan
>> +; SI: V_CNDMASK_B32_e64 v{{[0-9]}}, v{{[0-9]}}, -1, s{{\[[0-9]+:[0-9]+\]}}
>> +; SI-DAG: v{{[0-9]}}
>> +; All nan values are converted to 0xffffffff
>> +; SI: S_ENDPGM
>> +define void @v_cnd_nan(float addrspace(1)* %out, i32 %c, float %f) #0 {
>> +  %setcc = icmp ne i32 %c, 0
>> +  %select = select i1 %setcc, float 0xFFFFFFFFE0000000, float %f
>> +  store float %select, float addrspace(1)* %out
>> +  ret void
>> +}
>> +
>> +attributes #0 = { nounwind }
>> +attributes #1 = { nounwind readnone }
>> -- 
>> 1.8.4.3
>> 
> 
>> From 6a9bcc4481581e507c431419240810a1ec6f9b5b Mon Sep 17 00:00:00 2001
>> From: Matt Arsenault <Matthew.Arsenault at amd.com>
>> Date: Mon, 22 Sep 2014 21:25:56 -0400
>> Subject: [PATCH 04/10] R600/SI: Implement findCommutedOpIndices
>> 
>> The base implementation of commuteInstruction is used
>> in some cases, but it turns out this has been broken for a
>> long time since modifiers were inserted between the real operands.
>> 
>> The base implementation of commuteInstruction also fails on immediates,
>> which also needs to be fixed.
>> ---
>> lib/Target/R600/SIInstrInfo.cpp | 32 ++++++++++++++++++++++++++++++++
>> lib/Target/R600/SIInstrInfo.h   |  5 ++++-
>> 2 files changed, 36 insertions(+), 1 deletion(-)
>> 
> 
> LGTM.
> 
>> From 2debce9120a68d02c8fba9b8ef56b85fff8e05d1 Mon Sep 17 00:00:00 2001
>> From: Matt Arsenault <Matthew.Arsenault at amd.com>
>> Date: Mon, 22 Sep 2014 19:12:24 -0400
>> Subject: [PATCH 05/10] R600/SI: Partially move operand legalization to
>> post-isel hook.
>> 
>> Disable the SGPR usage restriction parts of the DAG legalizeOperands.
>> It now should only be doing immediate folding until it can be replaced
>> later. The real legalization work is now done by the other
>> SIInstrInfo::legalizeOperands
> 
> Nice, thanks for doing this.
> 
>> ---
>> lib/Target/R600/SIISelLowering.cpp | 60 ++++----------------------------------
>> lib/Target/R600/SIISelLowering.h   |  2 --
>> lib/Target/R600/SIInstrFormats.td  |  2 ++
>> lib/Target/R600/SIInstrInfo.cpp    | 39 ++++++++++++++++++-------
>> test/CodeGen/R600/fneg.f64.ll      |  2 +-
>> test/CodeGen/R600/fneg.ll          |  2 +-
>> test/CodeGen/R600/rotl.ll          |  2 +-
>> test/CodeGen/R600/seto.ll          |  8 ++---
>> test/CodeGen/R600/setuo.ll         |  8 ++---
>> 9 files changed, 48 insertions(+), 77 deletions(-)
>> 
>> diff --git a/lib/Target/R600/SIISelLowering.cpp b/lib/Target/R600/SIISelLowering.cpp
>> index 8a3ab46..2055208 100644
>> --- a/lib/Target/R600/SIISelLowering.cpp
>> +++ b/lib/Target/R600/SIISelLowering.cpp
>> @@ -1648,57 +1648,6 @@ bool SITargetLowering::fitsRegClass(SelectionDAG &DAG, const SDValue &Op,
>>   return TRI->getRegClass(RegClass)->hasSubClassEq(RC);
>> }
>> 
>> -/// \brief Make sure that we don't exeed the number of allowed scalars
>> -void SITargetLowering::ensureSRegLimit(SelectionDAG &DAG, SDValue &Operand,
>> -                                       unsigned RegClass,
>> -                                       bool &ScalarSlotUsed) const {
>> -
>> -  if (!isVSrc(RegClass))
>> -    return;
>> -
>> -  // First map the operands register class to a destination class
>> -  switch (RegClass) {
>> -    case AMDGPU::VSrc_32RegClassID:
>> -    case AMDGPU::VCSrc_32RegClassID:
>> -      RegClass = AMDGPU::VReg_32RegClassID;
>> -      break;
>> -    case AMDGPU::VSrc_64RegClassID:
>> -    case AMDGPU::VCSrc_64RegClassID:
>> -      RegClass = AMDGPU::VReg_64RegClassID;
>> -      break;
>> -   default:
>> -    llvm_unreachable("Unknown vsrc reg class");
>> -  }
>> -
>> -  // Nothing to do if they fit naturally
>> -  if (fitsRegClass(DAG, Operand, RegClass))
>> -    return;
>> -
>> -  // If the scalar slot isn't used yet use it now
>> -  if (!ScalarSlotUsed) {
>> -    ScalarSlotUsed = true;
>> -    return;
>> -  }
>> -
>> -  // This is a conservative aproach. It is possible that we can't determine the
>> -  // correct register class and copy too often, but better safe than sorry.
>> -
>> -  SDNode *Node;
>> -  // We can't use COPY_TO_REGCLASS with FrameIndex arguments.
>> -  if (isa<FrameIndexSDNode>(Operand) ||
>> -      isa<GlobalAddressSDNode>(Operand)) {
>> -    unsigned Opcode = Operand.getValueType() == MVT::i32 ?
>> -                      AMDGPU::S_MOV_B32 : AMDGPU::S_MOV_B64;
>> -    Node = DAG.getMachineNode(Opcode, SDLoc(), Operand.getValueType(),
>> -                              Operand);
>> -  } else {
>> -    SDValue RC = DAG.getTargetConstant(RegClass, MVT::i32);
>> -    Node = DAG.getMachineNode(TargetOpcode::COPY_TO_REGCLASS, SDLoc(),
>> -                              Operand.getValueType(), Operand, RC);
>> -  }
>> -  Operand = SDValue(Node, 0);
>> -}
>> -
>> /// \returns true if \p Node's operands are different from the SDValue list
>> /// \p Ops
>> static bool isNodeChanged(const SDNode *Node, const std::vector<SDValue> &Ops) {
>> @@ -1710,8 +1659,9 @@ static bool isNodeChanged(const SDNode *Node, const std::vector<SDValue> &Ops) {
>>   return false;
>> }
>> 
>> -/// \brief Try to commute instructions and insert copies in order to satisfy the
>> -/// operand constraints.
>> +/// TODO: This needs to be removed. It's current primary purpose is to fold
>> +/// immediates into operands when legal. The legalization parts are redundant
>> +/// with SIInstrInfo::legalizeOperands which is called in a post-isel hook.
>> SDNode *SITargetLowering::legalizeOperands(MachineSDNode *Node,
>>                                            SelectionDAG &DAG) const {
>>   // Original encoding (either e32 or e64)
>> @@ -1787,7 +1737,7 @@ SDNode *SITargetLowering::legalizeOperands(MachineSDNode *Node,
>>       // Try to fold the immediates
>>       if (!foldImm(Ops[i], Immediate, ScalarSlotUsed)) {
>>         // Folding didn't work, make sure we don't hit the SReg limit.
>> -        ensureSRegLimit(DAG, Ops[i], RegClass, ScalarSlotUsed);
>> +        //ensureSRegLimit(DAG, Ops[i], RegClass, ScalarSlotUsed);
> 
> Can this be deleted?

Yes

>>       }
>>       continue;
>>     }
>> @@ -1938,6 +1888,8 @@ void SITargetLowering::AdjustInstrPostInstrSelection(MachineInstr *MI,
>>   const SIInstrInfo *TII = static_cast<const SIInstrInfo *>(
>>       getTargetMachine().getSubtargetImpl()->getInstrInfo());
>> 
>> +  TII->legalizeOperands(MI);
>> +
>>   if (TII->isMIMG(MI->getOpcode())) {
>>     unsigned VReg = MI->getOperand(0).getReg();
>>     unsigned Writemask = MI->getOperand(1).getImm();
>> diff --git a/lib/Target/R600/SIISelLowering.h b/lib/Target/R600/SIISelLowering.h
>> index f953b48..9cf4dbc 100644
>> --- a/lib/Target/R600/SIISelLowering.h
>> +++ b/lib/Target/R600/SIISelLowering.h
>> @@ -47,8 +47,6 @@ class SITargetLowering : public AMDGPUTargetLowering {
>>                                                 const SDValue &Op) const;
>>   bool fitsRegClass(SelectionDAG &DAG, const SDValue &Op,
>>                     unsigned RegClass) const;
>> -  void ensureSRegLimit(SelectionDAG &DAG, SDValue &Operand,
>> -                       unsigned RegClass, bool &ScalarSlotUsed) const;
>> 
>>   SDNode *legalizeOperands(MachineSDNode *N, SelectionDAG &DAG) const;
>>   void adjustWritemask(MachineSDNode *&N, SelectionDAG &DAG) const;
>> diff --git a/lib/Target/R600/SIInstrFormats.td b/lib/Target/R600/SIInstrFormats.td
>> index 8369a0c..acf9b55 100644
>> --- a/lib/Target/R600/SIInstrFormats.td
>> +++ b/lib/Target/R600/SIInstrFormats.td
>> @@ -42,6 +42,8 @@ class InstSI <dag outs, dag ins, string asm, list<dag> pattern> :
>>   let TSFlags{10} = MUBUF;
>>   let TSFlags{11} = MTBUF;
>>   let TSFlags{12} = FLAT;
>> +
>> +  let hasPostISelHook = 1; // Adjusted to no return version.
> 
> This comment seems wrong.

Fixed

> 
>> }
>> 
>> class Enc32 {
>> diff --git a/lib/Target/R600/SIInstrInfo.cpp b/lib/Target/R600/SIInstrInfo.cpp
>> index f8ccecd..76a9413 100644
>> --- a/lib/Target/R600/SIInstrInfo.cpp
>> +++ b/lib/Target/R600/SIInstrInfo.cpp
>> @@ -1395,22 +1395,41 @@ void SIInstrInfo::legalizeOperands(MachineInstr *MI) const {
>> 
>>     int VOP3Idx[3] = { Src0Idx, Src1Idx, Src2Idx };
>> 
>> -    // First we need to consider the instruction's operand requirements before
>> -    // legalizing. Some operands are required to be SGPRs, but we are still
>> -    // bound by the constant bus requirement to only use one.
>> -    //
>> -    // If the operand's class is an SGPR, we can never move it.
>> -    for (unsigned i = 0; i < 3; ++i) {
>> -      int Idx = VOP3Idx[i];
>> -      if (Idx == -1)
>> +    for (const MachineOperand &MO : MI->implicit_operands()) {
>> +      // We only care about reads.
>> +      if (MO.isDef())
>> +        continue;
>> +
>> +      if (MO.getReg() == AMDGPU::VCC) {
>> +        SGPRReg = AMDGPU::VCC;
>>         break;
>> +      }
>> 
>> -      if (RI.isSGPRClassID(Desc.OpInfo[Idx].RegClass)) {
>> -        SGPRReg = MI->getOperand(Idx).getReg();
>> +      if (MO.getReg() == AMDGPU::FLAT_SCR) {
>> +        SGPRReg = AMDGPU::FLAT_SCR;
>>         break;
>>       }
>>     }
>> 
>> +
>> +    if (SGPRReg == AMDGPU::NoRegister) {
>> +      // First we need to consider the instruction's operand requirements before
>> +      // legalizing. Some operands are required to be SGPRs, but we are still
>> +      // bound by the constant bus requirement to only use one.
>> +      //
>> +      // If the operand's class is an SGPR, we can never move it.
>> +      for (unsigned i = 0; i < 3; ++i) {
>> +        int Idx = VOP3Idx[i];
>> +        if (Idx == -1)
>> +          break;
>> +
>> +        if (RI.isSGPRClassID(Desc.OpInfo[Idx].RegClass)) {
>> +          SGPRReg = MI->getOperand(Idx).getReg();
>> +          break;
>> +        }
>> +      }
>> +    }
>> +
>>     for (unsigned i = 0; i < 3; ++i) {
>>       int Idx = VOP3Idx[i];
>>       if (Idx == -1)
>> diff --git a/test/CodeGen/R600/fneg.f64.ll b/test/CodeGen/R600/fneg.f64.ll
>> index 61d9513..f0b341b 100644
>> --- a/test/CodeGen/R600/fneg.f64.ll
>> +++ b/test/CodeGen/R600/fneg.f64.ll
>> @@ -50,7 +50,7 @@ define void @fneg_free_f64(double addrspace(1)* %out, i64 %in) {
>> ; SI-LABEL: @fneg_fold
>> ; SI: S_LOAD_DWORDX2 [[NEG_VALUE:s\[[0-9]+:[0-9]+\]]], {{s\[[0-9]+:[0-9]+\]}}, 0xb
>> ; SI-NOT: XOR
>> -; SI: V_MUL_F64 {{v\[[0-9]+:[0-9]+\]}}, -[[NEG_VALUE]], {{v\[[0-9]+:[0-9]+\]}}
>> +; SI: V_MUL_F64 {{v\[[0-9]+:[0-9]+\]}}, -[[NEG_VALUE]], [[NEG_VALUE]]
>> define void @fneg_fold_f64(double addrspace(1)* %out, double %in) {
>>   %fsub = fsub double -0.0, %in
>>   %fmul = fmul double %fsub, %in
>> diff --git a/test/CodeGen/R600/fneg.ll b/test/CodeGen/R600/fneg.ll
>> index 72cd15c..8631301 100644
>> --- a/test/CodeGen/R600/fneg.ll
>> +++ b/test/CodeGen/R600/fneg.ll
>> @@ -59,7 +59,7 @@ define void @fneg_free_f32(float addrspace(1)* %out, i32 %in) {
>> ; FUNC-LABEL: @fneg_fold
>> ; SI: S_LOAD_DWORD [[NEG_VALUE:s[0-9]+]], s[{{[0-9]+:[0-9]+}}], 0xb
>> ; SI-NOT: XOR
>> -; SI: V_MUL_F32_e64 v{{[0-9]+}}, -[[NEG_VALUE]], v{{[0-9]+}}
>> +; SI: V_MUL_F32_e64 v{{[0-9]+}}, -[[NEG_VALUE]], [[NEG_VALUE]]
>> define void @fneg_fold_f32(float addrspace(1)* %out, float %in) {
>>   %fsub = fsub float -0.0, %in
>>   %fmul = fmul float %fsub, %in
>> diff --git a/test/CodeGen/R600/rotl.ll b/test/CodeGen/R600/rotl.ll
>> index 8c86fb5..a9dee8c 100644
>> --- a/test/CodeGen/R600/rotl.ll
>> +++ b/test/CodeGen/R600/rotl.ll
>> @@ -8,7 +8,7 @@
>> 
>> ; SI: S_SUB_I32 [[SDST:s[0-9]+]], 32, {{[s][0-9]+}}
>> ; SI: V_MOV_B32_e32 [[VDST:v[0-9]+]], [[SDST]]
>> -; SI: V_ALIGNBIT_B32 {{v[0-9]+, [s][0-9]+, v[0-9]+}}, [[VDST]]
>> +; SI: V_ALIGNBIT_B32 {{v[0-9]+, [s][0-9]+, s[0-9]+}}, [[VDST]]
>> define void @rotl_i32(i32 addrspace(1)* %in, i32 %x, i32 %y) {
>> entry:
>>   %0 = shl i32 %x, %y
>> diff --git a/test/CodeGen/R600/seto.ll b/test/CodeGen/R600/seto.ll
>> index cc942c1..eb1176f 100644
>> --- a/test/CodeGen/R600/seto.ll
>> +++ b/test/CodeGen/R600/seto.ll
>> @@ -1,8 +1,8 @@
>> -;RUN: llc < %s -march=r600 -mcpu=verde -verify-machineinstrs | FileCheck %s
>> -
>> -;CHECK-LABEL: @main
>> -;CHECK: V_CMP_O_F32_e32 vcc, {{[sv][0-9]+, v[0-9]+}}
>> +; RUN: llc -march=r600 -mcpu=verde -verify-machineinstrs < %s | FileCheck %s
>> 
>> +; CHECK-LABEL: @main
>> +; CHECK: V_CMP_O_F32_e64 [[CMP:s\[[0-9]+:[0-9]+\]]], [[SREG:s[0-9]+]], [[SREG]]
>> +; CHECK-NEXT: V_CNDMASK_B32_e64 {{v[0-9]+}}, 0, 1.0, [[CMP]]
>> define void @main(float %p) {
>> main_body:
>>   %c = fcmp oeq float %p, %p
>> diff --git a/test/CodeGen/R600/setuo.ll b/test/CodeGen/R600/setuo.ll
>> index 33007fc..a78e8e6 100644
>> --- a/test/CodeGen/R600/setuo.ll
>> +++ b/test/CodeGen/R600/setuo.ll
>> @@ -1,8 +1,8 @@
>> -;RUN: llc < %s -march=r600 -mcpu=verde -verify-machineinstrs | FileCheck %s
>> -
>> -;CHECK-LABEL: @main
>> -;CHECK: V_CMP_U_F32_e32 vcc, {{[sv][0-9]+, v[0-9]+}}
>> +; RUN: llc -march=r600 -mcpu=verde -verify-machineinstrs < %s | FileCheck %s
>> 
>> +; CHECK-LABEL: @main
>> +; CHECK: V_CMP_U_F32_e64 [[CMP:s\[[0-9]+:[0-9]+\]]], [[SREG:s[0-9]+]], [[SREG]]
>> +; CHECK-NEXT: V_CNDMASK_B32_e64 {{v[0-9]+}}, 0, 1.0, [[CMP]]
>> define void @main(float %p) {
>> main_body:
>>   %c = fcmp une float %p, %p
>> -- 
>> 1.8.4.3
>> 
> 
>> From 31fc32bd9656c37101895b36b537b61dd79d3793 Mon Sep 17 00:00:00 2001
>> From: Matt Arsenault <Matthew.Arsenault at amd.com>
>> Date: Mon, 22 Sep 2014 23:50:40 -0400
>> Subject: [PATCH 06/10] R600/SI Allow same SGPR to be used for multiple
>> operands
>> 
>> Instead of moving the first SGPR that is different than the first,
>> legalize the operand that requires the fewest moves if one
>> SGPR is used for multiple operands.
>> 
>> This saves extra moves and is also required for some instructions
>> which require that the same operand be used for multiple operands.
>> ---
>> lib/Target/R600/SIInstrInfo.cpp              | 38 +++++++++--
>> test/CodeGen/R600/use-sgpr-multiple-times.ll | 96 ++++++++++++++++++++++++++++
>> 2 files changed, 129 insertions(+), 5 deletions(-)
>> create mode 100644 test/CodeGen/R600/use-sgpr-multiple-times.ll
>> 
>> diff --git a/lib/Target/R600/SIInstrInfo.cpp b/lib/Target/R600/SIInstrInfo.cpp
>> index 76a9413..e31947a 100644
>> --- a/lib/Target/R600/SIInstrInfo.cpp
>> +++ b/lib/Target/R600/SIInstrInfo.cpp
>> @@ -1391,10 +1391,12 @@ void SIInstrInfo::legalizeOperands(MachineInstr *MI) const {
>>   // Legalize VOP3
>>   if (isVOP3(MI->getOpcode())) {
>>     const MCInstrDesc &Desc = get(MI->getOpcode());
>> -    unsigned SGPRReg = AMDGPU::NoRegister;
>> 
>>     int VOP3Idx[3] = { Src0Idx, Src1Idx, Src2Idx };
>> 
>> +    // Find the one SGPR operand we are allowed to use.
>> +    unsigned SGPRReg = AMDGPU::NoRegister;
>> +
>>     for (const MachineOperand &MO : MI->implicit_operands()) {
>>       // We only care about reads.
>>       if (MO.isDef())
>> @@ -1411,8 +1413,9 @@ void SIInstrInfo::legalizeOperands(MachineInstr *MI) const {
>>       }
>>     }
>> 
>> -
>>     if (SGPRReg == AMDGPU::NoRegister) {
>> +      unsigned UsedSGPRs[3] = { AMDGPU::NoRegister };
>> +
>>       // First we need to consider the instruction's operand requirements before
>>       // legalizing. Some operands are required to be SGPRs, but we are still
>>       // bound by the constant bus requirement to only use one.
>> @@ -1423,9 +1426,33 @@ void SIInstrInfo::legalizeOperands(MachineInstr *MI) const {
>>         if (Idx == -1)
>>           break;
>> 
>> -        if (RI.isSGPRClassID(Desc.OpInfo[Idx].RegClass)) {
>> -          SGPRReg = MI->getOperand(Idx).getReg();
>> -          break;
>> +        const MachineOperand &MO = MI->getOperand(Idx);
>> +        if (RI.isSGPRClassID(Desc.OpInfo[Idx].RegClass))
>> +          SGPRReg = MO.getReg();
>> +
>> +        if (MO.isReg() && RI.isSGPRClass(MRI.getRegClass(MO.getReg())))
>> +          UsedSGPRs[i] = MO.getReg();
>> +      }
>> +
>> +      if (SGPRReg == AMDGPU::NoRegister) {
>> +        // We don't have a required SGPR operand, so we have a bit more freedom in
>> +        // selecting operands to move.
>> +
>> +        // Try to select the most used SGPR. If an SGPR is equal to one of the
>> +        // others, we choose that.
>> +        //
>> +        // e.g.
>> +        // V_FMA_F32 v0, s0, s0, s0 -> No moves
>> +        // V_FMA_F32 v0, s0, s1, s0 -> Move s1
>> +
>> +        if (UsedSGPRs[0] != AMDGPU::NoRegister) {
>> +          if (UsedSGPRs[0] == UsedSGPRs[1] || UsedSGPRs[0] == UsedSGPRs[2])
>> +            SGPRReg = UsedSGPRs[0];
>> +        }
>> +
>> +        if (SGPRReg == AMDGPU::NoRegister && UsedSGPRs[1] != AMDGPU::NoRegister) {
>> +          if (UsedSGPRs[1] == UsedSGPRs[2])
>> +            SGPRReg = UsedSGPRs[1];
>>         }
>>       }
>>     }
>> @@ -1452,6 +1479,7 @@ void SIInstrInfo::legalizeOperands(MachineInstr *MI) const {
>>         // an inline constant which is always legal.
>>         continue;
>>       }
>> +
> 
> Extra whitespace.
> 
>>       // If we make it this far, then the operand is not legal and we must
>>       // legalize it.
>>       legalizeOpWithMove(MI, Idx);
>> diff --git a/test/CodeGen/R600/use-sgpr-multiple-times.ll b/test/CodeGen/R600/use-sgpr-multiple-times.ll
>> new file mode 100644
>> index 0000000..5c00718
>> --- /dev/null
>> +++ b/test/CodeGen/R600/use-sgpr-multiple-times.ll
>> @@ -0,0 +1,96 @@
>> +; RUN: llc -march=r600 -mcpu=SI -verify-machineinstrs < %s | FileCheck -check-prefix=SI %s
>> +
>> +declare float @llvm.fma.f32(float, float, float) #1
>> +declare float @llvm.fmuladd.f32(float, float, float) #1
>> +declare i32 @llvm.AMDGPU.imad24(i32, i32, i32) #1
>> +
>> +
>> +; SI-LABEL: @test_sgpr_use_twice_binop:
>> +; SI: S_LOAD_DWORD [[SGPR:s[0-9]+]],
>> +; SI: V_ADD_F32_e64 [[RESULT:v[0-9]+]], [[SGPR]], [[SGPR]]
>> +; SI: BUFFER_STORE_DWORD [[RESULT]]
>> +define void @test_sgpr_use_twice_binop(float addrspace(1)* %out, float %a) #0 {
>> +  %dbl = fadd float %a, %a
>> +  store float %dbl, float addrspace(1)* %out, align 4
>> +  ret void
>> +}
>> +
>> +; SI-LABEL: @test_sgpr_use_three_ternary_op:
>> +; SI: S_LOAD_DWORD [[SGPR:s[0-9]+]],
>> +; SI: V_FMA_F32 [[RESULT:v[0-9]+]], [[SGPR]], [[SGPR]], [[SGPR]]
>> +; SI: BUFFER_STORE_DWORD [[RESULT]]
>> +define void @test_sgpr_use_three_ternary_op(float addrspace(1)* %out, float %a) #0 {
>> +  %fma = call float @llvm.fma.f32(float %a, float %a, float %a) #1
>> +  store float %fma, float addrspace(1)* %out, align 4
>> +  ret void
>> +}
>> +
>> +; SI-LABEL: @test_sgpr_use_twice_ternary_op_a_a_b:
>> +; SI: S_LOAD_DWORD [[SGPR0:s[0-9]+]], s{{\[[0-9]+:[0-9]+\]}}, 0xb
>> +; SI: S_LOAD_DWORD [[SGPR1:s[0-9]+]], s{{\[[0-9]+:[0-9]+\]}}, 0xc
>> +; SI: V_MOV_B32_e32 [[VGPR1:v[0-9]+]], [[SGPR1]]
>> +; SI: V_FMA_F32 [[RESULT:v[0-9]+]], [[SGPR0]], [[SGPR0]], [[VGPR1]]
>> +; SI: BUFFER_STORE_DWORD [[RESULT]]
>> +define void @test_sgpr_use_twice_ternary_op_a_a_b(float addrspace(1)* %out, float %a, float %b) #0 {
>> +  %fma = call float @llvm.fma.f32(float %a, float %a, float %b) #1
>> +  store float %fma, float addrspace(1)* %out, align 4
>> +  ret void
>> +}
>> +
>> +; SI-LABEL: @test_sgpr_use_twice_ternary_op_a_b_a:
>> +; SI: S_LOAD_DWORD [[SGPR0:s[0-9]+]], s{{\[[0-9]+:[0-9]+\]}}, 0xb
>> +; SI: S_LOAD_DWORD [[SGPR1:s[0-9]+]], s{{\[[0-9]+:[0-9]+\]}}, 0xc
>> +; SI: V_MOV_B32_e32 [[VGPR1:v[0-9]+]], [[SGPR1]]
>> +; SI: V_FMA_F32 [[RESULT:v[0-9]+]], [[SGPR0]], [[VGPR1]], [[SGPR0]]
>> +; SI: BUFFER_STORE_DWORD [[RESULT]]
>> +define void @test_sgpr_use_twice_ternary_op_a_b_a(float addrspace(1)* %out, float %a, float %b) #0 {
>> +  %fma = call float @llvm.fma.f32(float %a, float %b, float %a) #1
>> +  store float %fma, float addrspace(1)* %out, align 4
>> +  ret void
>> +}
>> +
>> +; SI-LABEL: @test_sgpr_use_twice_ternary_op_b_a_a:
>> +; SI: S_LOAD_DWORD [[SGPR0:s[0-9]+]], s{{\[[0-9]+:[0-9]+\]}}, 0xb
>> +; SI: S_LOAD_DWORD [[SGPR1:s[0-9]+]], s{{\[[0-9]+:[0-9]+\]}}, 0xc
>> +; SI: V_MOV_B32_e32 [[VGPR1:v[0-9]+]], [[SGPR1]]
>> +; SI: V_FMA_F32 [[RESULT:v[0-9]+]], [[VGPR1]], [[SGPR0]], [[SGPR0]]
>> +; SI: BUFFER_STORE_DWORD [[RESULT]]
>> +define void @test_sgpr_use_twice_ternary_op_b_a_a(float addrspace(1)* %out, float %a, float %b) #0 {
>> +  %fma = call float @llvm.fma.f32(float %b, float %a, float %a) #1
>> +  store float %fma, float addrspace(1)* %out, align 4
>> +  ret void
>> +}
>> +
>> +; SI-LABEL: @test_sgpr_use_twice_ternary_op_a_a_imm:
>> +; SI: S_LOAD_DWORD [[SGPR:s[0-9]+]]
>> +; SI: V_FMA_F32 [[RESULT:v[0-9]+]], [[SGPR]], [[SGPR]], 2.0
>> +; SI: BUFFER_STORE_DWORD [[RESULT]]
>> +define void @test_sgpr_use_twice_ternary_op_a_a_imm(float addrspace(1)* %out, float %a) #0 {
>> +  %fma = call float @llvm.fma.f32(float %a, float %a, float 2.0) #1
>> +  store float %fma, float addrspace(1)* %out, align 4
>> +  ret void
>> +}
>> +
>> +; SI-LABEL: @test_sgpr_use_twice_ternary_op_a_imm_a:
>> +; SI: S_LOAD_DWORD [[SGPR:s[0-9]+]]
>> +; SI: V_FMA_F32 [[RESULT:v[0-9]+]], [[SGPR]], 2.0, [[SGPR]]
>> +; SI: BUFFER_STORE_DWORD [[RESULT]]
>> +define void @test_sgpr_use_twice_ternary_op_a_imm_a(float addrspace(1)* %out, float %a) #0 {
>> +  %fma = call float @llvm.fma.f32(float %a, float 2.0, float %a) #1
>> +  store float %fma, float addrspace(1)* %out, align 4
>> +  ret void
>> +}
>> +
>> +; Don't use fma since fma c, x, y is canonicalized to fma x, c, y
>> +; SI-LABEL: @test_sgpr_use_twice_ternary_op_imm_a_a:
>> +; SI: S_LOAD_DWORD [[SGPR:s[0-9]+]]
>> +; SI: V_MAD_I32_I24 [[RESULT:v[0-9]+]], 2, [[SGPR]], [[SGPR]]
>> +; SI: BUFFER_STORE_DWORD [[RESULT]]
>> +define void @test_sgpr_use_twice_ternary_op_imm_a_a(i32 addrspace(1)* %out, i32 %a) #0 {
>> +  %fma = call i32 @llvm.AMDGPU.imad24(i32 2, i32 %a, i32 %a) #1
>> +  store i32 %fma, i32 addrspace(1)* %out, align 4
>> +  ret void
>> +}
>> +
>> +attributes #0 = { nounwind }
>> +attributes #1 = { nounwind readnone }
>> -- 
>> 1.8.4.3
>> 
> 
>> From f87915da2e01e4d44f0b8d6f5a4b3a1e1f5812e4 Mon Sep 17 00:00:00 2001
>> From: Matt Arsenault <Matthew.Arsenault at amd.com>
>> Date: Tue, 23 Sep 2014 18:22:58 -0700
>> Subject: [PATCH 07/10] R600/SI: Move finding SGPR operand to move to separate
>> function
>> 
>> ---
>> lib/Target/R600/SIInstrInfo.cpp | 132 +++++++++++++++++++++-------------------
>> lib/Target/R600/SIInstrInfo.h   |   2 +
>> 2 files changed, 71 insertions(+), 63 deletions(-)
>> 
> 
> LGTM.
> 
>> From 0f0180485cfd44ec5b459010053ba9c0f1858ac1 Mon Sep 17 00:00:00 2001
>> From: Matt Arsenault <Matthew.Arsenault at amd.com>
>> Date: Tue, 23 Sep 2014 00:23:30 -0400
>> Subject: [PATCH 08/10] R600/SI: Add a note about the order of the operands to
>> div_scale
>> 
>> ---
>> lib/Target/R600/AMDGPUISelLowering.cpp | 6 ++++++
>> 1 file changed, 6 insertions(+)
>> 
>> 
> 
> LGTM.
> 
>> From 8898b9e1f8c8e87285e7f2f5b5d58ef8325b6b69 Mon Sep 17 00:00:00 2001
>> From: Matt Arsenault <Matthew.Arsenault at amd.com>
>> Date: Tue, 23 Sep 2014 00:30:35 -0400
>> Subject: [PATCH 09/10] R600/SI: Add strict check lines to div_scale tests.
>> 
>> This has weird operand requirements so it's worthwhile
>> to have very strict checks for its operands.
>> 
>> Add different combinations of SGPR operands.
>> ---
>> test/CodeGen/R600/llvm.AMDGPU.div_scale.ll | 271 +++++++++++++++++++++++++++--
>> 1 file changed, 255 insertions(+), 16 deletions(-)
>> 
> 
> LGTM.
> 
>> From 37f4ce315b7a6a513c7c9e22eb20aa4e671c4980 Mon Sep 17 00:00:00 2001
>> From: Matt Arsenault <Matthew.Arsenault at amd.com>
>> Date: Tue, 23 Sep 2014 17:39:25 -0700
>> Subject: [PATCH 10/10] R600/SI: Use break instead of continue
>> 
>> If an instruction doesn't have src1, it doesn't have src2
>> ---
>> lib/Target/R600/SIInstrInfo.cpp | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>> 
> 
> LGTM.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-R600-SI-Don-t-assert-on-exotic-operand-types.patch
Type: application/octet-stream
Size: 1154 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140925/bc45b201/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-R600-SI-Don-t-move-operands-that-are-required-to-be-.patch
Type: application/octet-stream
Size: 3815 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140925/bc45b201/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0003-R600-SI-Implement-findCommutedOpIndices.patch
Type: application/octet-stream
Size: 2945 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140925/bc45b201/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0004-R600-SI-Partially-move-operand-legalization-to-post-.patch
Type: application/octet-stream
Size: 11241 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140925/bc45b201/attachment-0003.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0005-R600-SI-Allow-same-SGPR-to-be-used-for-multiple-oper.patch
Type: application/octet-stream
Size: 7672 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140925/bc45b201/attachment-0004.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0006-R600-SI-Move-finding-SGPR-operand-to-move-to-separat.patch
Type: application/octet-stream
Size: 5872 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140925/bc45b201/attachment-0005.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0007-R600-SI-Add-a-note-about-the-order-of-the-operands-t.patch
Type: application/octet-stream
Size: 1229 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140925/bc45b201/attachment-0006.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0008-R600-SI-Add-strict-check-lines-to-div_scale-tests.patch
Type: application/octet-stream
Size: 16508 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140925/bc45b201/attachment-0007.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0009-R600-SI-Use-break-instead-of-continue.patch
Type: application/octet-stream
Size: 854 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140925/bc45b201/attachment-0008.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0010-R600-SI-Fix-emitting-trailing-whitespace-after-s_wai.patch
Type: application/octet-stream
Size: 2480 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140925/bc45b201/attachment-0009.obj>