PATCHES: R600/SI: Remove SelectionDAG operand folding

Tue Jan 6 11:16:53 PST 2015

On 01/06/2015 01:56 PM, Tom Stellard wrote:
> On Fri, Dec 19, 2014 at 11:14:57AM -0500, Matt Arsenault wrote:
>> >
>> >On 12/18/2014 08:02 PM, Tom Stellard wrote:
>>> > >Hi,
>>> > >
>>> > >This series of patches removes the legacy operand folding that was done on
>>> > >the SelectionDAG.  The SIFoldOperands MachineInstr pass now provides the
>>> > >same functionality.
>>> > >
>>> > >-Tom
>>> > >
>>> > >
>>> > >0006-R600-SI-Add-a-V_MOV_B64-pseudo-instruction.patch
>>> > >
>>> > >
>>> > >  From 8ff850ac46ab082a06882e743946297c2791d6e3 Mon Sep 17 00:00:00 2001
>>> > >From: Tom Stellard<thomas.stellard at amd.com>
>>> > >Date: Wed, 17 Dec 2014 16:35:19 -0500
>>> > >Subject: [PATCH 6/7] R600/SI: Add a V_MOV_B64 pseudo instruction
>>> > >
>>> > >This is used to simplify the SIFoldOperands pass and make it easier to
>>> > >fold immediates.
>>> > >---
>>> > >   lib/Target/R600/SIFoldOperands.cpp         |  3 +++
>>> > >   lib/Target/R600/SIInstrInfo.cpp            | 29 +++++++++++++++++++++++++++
>>> > >   lib/Target/R600/SIInstructions.td          |  6 ++++++
>>> > >   test/CodeGen/R600/atomic_cmp_swap_local.ll | 10 ++++------
>>> > >   test/CodeGen/R600/imm.ll                   |  7 ++-----
>>> > >   test/CodeGen/R600/local-atomics64.ll       | 32 ++++++++++++------------------
>>> > >   6 files changed, 57 insertions(+), 30 deletions(-)
>> >
>> >I thought you were avoiding this with the previous patch? Does the pass
>> >handle constants specially and this is for SGPRs only?
>> >
> This patch is required to handle this case:
>
> %vreg0 = S_MOV_B64 1  SReg_64:%vreg0
> %vreg1 = COPY %vreg0  VReg_64:%vreg1, SReg_64:%vreg0j
> %vreg2 = V_ADD_I32 %vreg1:sub0, ...
> %vreg3 = V_ADDC_I32 %vreg1:sub1, ..
>
> We can't fold the immediate through the copy, so we need to replace the
> copy with V_MOV_B64.
>
> Here are updated patches.
>
> -Tom
>
> 0001-R600-SI-Refactor-SIFoldOperands-to-simplify-immediat.patch
>
>
>  From 1adbfd810ac491e15b5defea79cc08841a5cd1e6 Mon Sep 17 00:00:00 2001
> From: Tom Stellard<thomas.stellard at amd.com>
> Date: Wed, 17 Dec 2014 11:41:55 -0500
> Subject: [PATCH 1/8] R600/SI: Refactor SIFoldOperands to simplify immediate
>   folding
>
> This will make a future patch much less intrusive.
> ---
>   lib/Target/R600/SIFoldOperands.cpp | 79 ++++++++++++++++++++++++++------------
>   1 file changed, 54 insertions(+), 25 deletions(-)
>

LGTM

>
>
> 0002-R600-SI-Teach-SIFoldOperands-to-split-64-bit-constan.patch
>
>
>  From 413f5b31b115c2b08b690df00ed4de2d068e9670 Mon Sep 17 00:00:00 2001
> From: Tom Stellard<thomas.stellard at amd.com>
> Date: Thu, 11 Dec 2014 18:49:19 -0500
> Subject: [PATCH 2/8] R600/SI: Teach SIFoldOperands to split 64-bit constants
>   when folding
>
> This allows folding of sequences like:
>
> s[0:1] = s_mov_b64 4
> v_add_i32 v0, s0, v0
> v_addc_u32 v1, s1, v1
>
> into
>
> v_add_i32 v0, 4, v0
> v_add_i32 v1, 0, v1
> ---
>   lib/Target/R600/SIFoldOperands.cpp   | 62 +++++++++++++++++++++---------------
>   lib/Target/R600/SIInstrInfo.cpp      | 14 ++++++++
>   lib/Target/R600/SIInstrInfo.h        |  4 +++
>   test/CodeGen/R600/operand-folding.ll | 17 ++++++++++
>   test/CodeGen/R600/sint_to_fp.f64.ll  |  8 ++---
>   test/CodeGen/R600/uint_to_fp.f64.ll  |  8 ++---
>   6 files changed, 80 insertions(+), 33 deletions(-)

LGTM

> diff --git a/lib/Target/R600/SIInstrInfo.cpp b/lib/Target/R600/SIInstrInfo.cpp
> index a58a46e..33ec312 100644
> --- a/lib/Target/R600/SIInstrInfo.cpp
> +++ b/lib/Target/R600/SIInstrInfo.cpp
> @@ -418,6 +418,20 @@ unsigned SIInstrInfo::commuteOpcode(unsigned Opcode) const {
>     return Opcode;
>   }
>   
> +unsigned SIInstrInfo::getMovOpcode(const TargetRegisterClass *DstRC) const {
> +
> +  if (DstRC->getSize() == 4) {
> +    if (RI.isSGPRClass(DstRC))
> +      return AMDGPU::S_MOV_B32;
> +    else
> +      return AMDGPU::V_MOV_B32_e32;
No return after else. This should be a ternary operator

> +  } else if (DstRC->getSize() == 8 && RI.isSGPRClass(DstRC)) {
> +    return AMDGPU::S_MOV_B64;
> +  } else {
> +    return AMDGPU::COPY;
> +  }
> +}
> +
>   static bool shouldTryToSpillVGPRs(MachineFunction *MF) {
>   
>     SIMachineFunctionInfo *MFI = MF->getInfo<SIMachineFunctionInfo>();
> diff --git a/lib/Target/R600/SIInstrInfo.h b/lib/Target/R600/SIInstrInfo.h
> index 6d63816..f766dc8 100644
> --- a/lib/Target/R600/SIInstrInfo.h
> +++ b/lib/Target/R600/SIInstrInfo.h
> @@ -110,6 +110,10 @@ public:
>   
>     bool expandPostRAPseudo(MachineBasicBlock::iterator MI) const override;
>   
> +  // \brief Returns an opcode that can be used to move a value to a \p DstRC
> +  // register.  If there is no hardware instruction that can store to \p
> +  // DstRC, then AMDGPU::COPY is returned.
> +  unsigned getMovOpcode(const TargetRegisterClass *DstRC) const;
>     unsigned commuteOpcode(unsigned Opcode) const;
>   
>     MachineInstr *commuteInstruction(MachineInstr *MI,
> diff --git a/test/CodeGen/R600/operand-folding.ll b/test/CodeGen/R600/operand-folding.ll
> index 05177b4..ebe7f1a 100644
> --- a/test/CodeGen/R600/operand-folding.ll
> +++ b/test/CodeGen/R600/operand-folding.ll
> @@ -36,5 +36,22 @@ endif:
>     ret void
>   }
>   
> +; CHECK-LABEL: {{^}}fold_64bit_constant_add:
> +; CHECK-NOT: s_mov_b64
> +; FIXME: It would be better if we could use v_add here and drop the extra
> +; v_mov_b32 instructions.
> +; CHECK-DAG: s_add_u32 [[LO:s[0-9]+]], s{{[0-9]+}}, 1
> +; CHECK-DAG: s_addc_u32 [[HI:s[0-9]+]], s{{[0-9]+}}, 0
> +; CHECK-DAG: v_mov_b32_e32 v[[VLO:[0-9]+]], [[LO]]
> +; CHECK-DAG: v_mov_b32_e32 v[[VHI:[0-9]+]], [[HI]]
> +; CHECK: buffer_store_dwordx2 v{{\[}}[[VLO]]:[[VHI]]{{\]}},
> +
> +define void @fold_64bit_constant_add(i64 addrspace(1)* %out, i32 %cmp, i64 %val) {
> +entry:
> +  %tmp0 = add i64 %val, 1
> +  store i64 %tmp0, i64 addrspace(1)* %out
> +  ret void
> +}
> +
>   declare i32 @llvm.r600.read.tidig.x() #0
>   attributes #0 = { readnone }
> diff --git a/test/CodeGen/R600/sint_to_fp.f64.ll b/test/CodeGen/R600/sint_to_fp.f64.ll
> index 6e4f87c..efbdf25 100644
> --- a/test/CodeGen/R600/sint_to_fp.f64.ll
> +++ b/test/CodeGen/R600/sint_to_fp.f64.ll
> @@ -12,10 +12,10 @@ define void @sint_to_fp_i32_to_f64(double addrspace(1)* %out, i32 %in) {
>   
>   ; SI-LABEL: {{^}}sint_to_fp_i1_f64:
>   ; SI: v_cmp_eq_i32_e64 [[CMP:s\[[0-9]+:[0-9]\]]],
> -; FIXME: We should the VGPR sources for V_CNDMASK are copied from SGPRs,
> -; we should be able to fold the SGPRs into the V_CNDMASK instructions.
> -; SI: v_cndmask_b32_e64 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}, [[CMP]]
> -; SI: v_cndmask_b32_e64 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}, [[CMP]]
> +; We can't fold the SGPRs into v_cndmask_b32_e64, because it already
> +; uses an SGPR for [[CMP]]
> +; SI: v_cndmask_b32_e64 v{{[0-9]+}}, 0, v{{[0-9]+}}, [[CMP]]
> +; SI: v_cndmask_b32_e64 v{{[0-9]+}}, 0, v{{[0-9]+}}, [[CMP]]
>   ; SI: buffer_store_dwordx2
>   ; SI: s_endpgm
>   define void @sint_to_fp_i1_f64(double addrspace(1)* %out, i32 %in) {
> diff --git a/test/CodeGen/R600/uint_to_fp.f64.ll b/test/CodeGen/R600/uint_to_fp.f64.ll
> index d16872b..fa70bdf 100644
> --- a/test/CodeGen/R600/uint_to_fp.f64.ll
> +++ b/test/CodeGen/R600/uint_to_fp.f64.ll
> @@ -72,10 +72,10 @@ define void @s_uint_to_fp_v4i32_to_v4f64(<4 x double> addrspace(1)* %out, <4 x i
>   
>   ; SI-LABEL: {{^}}uint_to_fp_i1_to_f64:
>   ; SI: v_cmp_eq_i32_e64 [[CMP:s\[[0-9]+:[0-9]\]]],
> -; FIXME: We should the VGPR sources for V_CNDMASK are copied from SGPRs,
> -; we should be able to fold the SGPRs into the V_CNDMASK instructions.
> -; SI: v_cndmask_b32_e64 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}, [[CMP]]
> -; SI: v_cndmask_b32_e64 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}, [[CMP]]
> +; We can't fold the SGPRs into v_cndmask_b32_e64, because it already
> +; uses an SGPR for [[CMP]]
> +; SI: v_cndmask_b32_e64 v{{[0-9]+}}, 0, v{{[0-9]+}}, [[CMP]]
> +; SI: v_cndmask_b32_e64 v{{[0-9]+}}, 0, v{{[0-9]+}}, [[CMP]]
>   ; SI: buffer_store_dwordx2
>   ; SI: s_endpgm
>   define void @uint_to_fp_i1_to_f64(double addrspace(1)* %out, i32 %in) {
> -- 1.8.5.5
>
> 0003-R600-SI-Add-a-V_MOV_B64-pseudo-instruction.patch
>
>
>  From 99516afa3817cab2299d77fbc2c230bcf15b95a0 Mon Sep 17 00:00:00 2001
> From: Tom Stellard<thomas.stellard at amd.com>
> Date: Wed, 17 Dec 2014 16:35:19 -0500
> Subject: [PATCH 3/8] R600/SI: Add a V_MOV_B64 pseudo instruction
>
> This is used to simplify the SIFoldOperands pass and make it easier to
> fold immediates.
> ---
>   lib/Target/R600/SIFoldOperands.cpp         |  1 +
>   lib/Target/R600/SIInstrInfo.cpp            | 31 +++++++++++++++++++++++++++++
>   lib/Target/R600/SIInstructions.td          |  6 ++++++
>   test/CodeGen/R600/atomic_cmp_swap_local.ll | 10 ++++------
>   test/CodeGen/R600/imm.ll                   |  7 ++-----
>   test/CodeGen/R600/local-atomics64.ll       | 32 ++++++++++++------------------
>   6 files changed, 57 insertions(+), 30 deletions(-)
>
> diff --git a/lib/Target/R600/SIFoldOperands.cpp b/lib/Target/R600/SIFoldOperands.cpp
> index 23d4a4d..ddb2852 100644
> --- a/lib/Target/R600/SIFoldOperands.cpp
> +++ b/lib/Target/R600/SIFoldOperands.cpp
> @@ -86,6 +86,7 @@ static bool isSafeToFold(unsigned Opcode) {
>     switch(Opcode) {
>     case AMDGPU::V_MOV_B32_e32:
>     case AMDGPU::V_MOV_B32_e64:
> +  case AMDGPU::V_MOV_B64_PSEUDO:
>     case AMDGPU::S_MOV_B32:
>     case AMDGPU::S_MOV_B64:
>     case AMDGPU::COPY:
> diff --git a/lib/Target/R600/SIInstrInfo.cpp b/lib/Target/R600/SIInstrInfo.cpp
> index 33ec312..3a34903 100644
> --- a/lib/Target/R600/SIInstrInfo.cpp
> +++ b/lib/Target/R600/SIInstrInfo.cpp
> @@ -427,6 +427,8 @@ unsigned SIInstrInfo::getMovOpcode(const TargetRegisterClass *DstRC) const {
>         return AMDGPU::V_MOV_B32_e32;
>     } else if (DstRC->getSize() == 8 && RI.isSGPRClass(DstRC)) {
>       return AMDGPU::S_MOV_B64;
> +  } else if (DstRC->getSize() == 8 && !RI.isSGPRClass(DstRC)) {
> +    return  AMDGPU::V_MOV_B64_PSEUDO;
>     } else {
>       return AMDGPU::COPY;
>     }
> @@ -676,6 +678,35 @@ bool SIInstrInfo::expandPostRAPseudo(MachineBasicBlock::iterator MI) const {
>       // This is just a placeholder for register allocation.
>       MI->eraseFromParent();
>       break;
> +
> +  case AMDGPU::V_MOV_B64_PSEUDO: {
> +    unsigned Dst = MI->getOperand(0).getReg();
> +    unsigned DstLo = RI.getSubReg(Dst, AMDGPU::sub0);
> +    unsigned DstHi = RI.getSubReg(Dst, AMDGPU::sub1);
> +
> +    const MachineOperand &SrcOp = MI->getOperand(1);
> +    // FIXME: Will this work for 64-bit floating point immediates?
> +    assert(!SrcOp.isFPImm());
> +    if (SrcOp.isImm()) {
> +      APInt Imm(64, SrcOp.getImm());
> +      BuildMI(MBB, MI, DL, get(AMDGPU::V_MOV_B32_e32), DstLo)
> +              .addImm(Imm.getLoBits(32).getZExtValue())
> +              .addReg(Dst, RegState::Implicit);
> +      BuildMI(MBB, MI, DL, get(AMDGPU::V_MOV_B32_e32), DstHi)
> +              .addImm(Imm.getHiBits(32).getZExtValue())
> +              .addReg(Dst, RegState::Implicit);
> +    } else {
> +      assert(SrcOp.isReg());
> +      BuildMI(MBB, MI, DL, get(AMDGPU::V_MOV_B32_e32), DstLo)
> +              .addReg(RI.getSubReg(SrcOp.getReg(), AMDGPU::sub0))
> +              .addReg(Dst, RegState::Implicit);
> +      BuildMI(MBB, MI, DL, get(AMDGPU::V_MOV_B32_e32), DstHi)
> +              .addReg(RI.getSubReg(SrcOp.getReg(), AMDGPU::sub1))
> +              .addReg(Dst, RegState::Implicit);
> +    }
> +    MI->eraseFromParent();
> +    break;
> +  }
>     }
>     return true;
>   }
> diff --git a/lib/Target/R600/SIInstructions.td b/lib/Target/R600/SIInstructions.td
> index 463287e..fede819 100644
> --- a/lib/Target/R600/SIInstructions.td
> +++ b/lib/Target/R600/SIInstructions.td
> @@ -1742,6 +1742,12 @@ defm V_TRIG_PREOP_F64 : VOP3Inst <
>   //===----------------------------------------------------------------------===//
>   let isCodeGenOnly = 1, isPseudo = 1 in {
>   
> +let hasSideEffects = 0, mayLoad = 0, mayStore = 0 in {
> +// 64-bit vector move instruction.  This is mainly used by the SIFoldOperands
> +// pass to enable folding of inline immediates.
> +def V_MOV_B64_PSEUDO : InstSI <(outs VReg_64:$dst), (ins VSrc_64:$src0), "", []>;
> +} // end let hasSideEffects = 0, mayLoad = 0, mayStore = 0
> +
>   let hasSideEffects = 1 in {
>   def SGPR_USE : InstSI <(outs),(ins), "", []>;
>   }
> diff --git a/test/CodeGen/R600/atomic_cmp_swap_local.ll b/test/CodeGen/R600/atomic_cmp_swap_local.ll
> index 223f4d3..a77627c 100644
> --- a/test/CodeGen/R600/atomic_cmp_swap_local.ll
> +++ b/test/CodeGen/R600/atomic_cmp_swap_local.ll
> @@ -20,9 +20,8 @@ define void @lds_atomic_cmpxchg_ret_i32_offset(i32 addrspace(1)* %out, i32 addrs
>   ; FUNC-LABEL: {{^}}lds_atomic_cmpxchg_ret_i64_offset:
>   ; SI: s_load_dword [[PTR:s[0-9]+]], s{{\[[0-9]+:[0-9]+\]}}, 0xb
>   ; SI: s_load_dwordx2 s{{\[}}[[LOSWAP:[0-9]+]]:[[HISWAP:[0-9]+]]{{\]}}, s{{\[[0-9]+:[0-9]+\]}}, 0xd
> -; SI: s_mov_b64  s{{\[}}[[LOSCMP:[0-9]+]]:[[HISCMP:[0-9]+]]{{\]}}, 7
> -; SI-DAG: v_mov_b32_e32 v[[LOVCMP:[0-9]+]], s[[LOSCMP]]
> -; SI-DAG: v_mov_b32_e32 v[[HIVCMP:[0-9]+]], s[[HISCMP]]
> +; SI-DAG: v_mov_b32_e32 v[[LOVCMP:[0-9]+]], 7
> +; SI-DAG: v_mov_b32_e32 v[[HIVCMP:[0-9]+]], 0
>   ; SI-DAG: v_mov_b32_e32 [[VPTR:v[0-9]+]], [[PTR]]
>   ; SI-DAG: v_mov_b32_e32 v[[LOSWAPV:[0-9]+]], s[[LOSWAP]]
>   ; SI-DAG: v_mov_b32_e32 v[[HISWAPV:[0-9]+]], s[[HISWAP]]
> @@ -69,9 +68,8 @@ define void @lds_atomic_cmpxchg_noret_i32_offset(i32 addrspace(3)* %ptr, i32 %sw
>   ; FUNC-LABEL: {{^}}lds_atomic_cmpxchg_noret_i64_offset:
>   ; SI: s_load_dword [[PTR:s[0-9]+]], s{{\[[0-9]+:[0-9]+\]}}, 0x9
>   ; SI: s_load_dwordx2 s{{\[}}[[LOSWAP:[0-9]+]]:[[HISWAP:[0-9]+]]{{\]}}, s{{\[[0-9]+:[0-9]+\]}}, 0xb
> -; SI: s_mov_b64  s{{\[}}[[LOSCMP:[0-9]+]]:[[HISCMP:[0-9]+]]{{\]}}, 7
> -; SI-DAG: v_mov_b32_e32 v[[LOVCMP:[0-9]+]], s[[LOSCMP]]
> -; SI-DAG: v_mov_b32_e32 v[[HIVCMP:[0-9]+]], s[[HISCMP]]
> +; SI-DAG: v_mov_b32_e32 v[[LOVCMP:[0-9]+]], 7
> +; SI-DAG: v_mov_b32_e32 v[[HIVCMP:[0-9]+]], 0
>   ; SI-DAG: v_mov_b32_e32 [[VPTR:v[0-9]+]], [[PTR]]
>   ; SI-DAG: v_mov_b32_e32 v[[LOSWAPV:[0-9]+]], s[[LOSWAP]]
>   ; SI-DAG: v_mov_b32_e32 v[[HISWAPV:[0-9]+]], s[[HISWAP]]
> diff --git a/test/CodeGen/R600/imm.ll b/test/CodeGen/R600/imm.ll
> index 79f36b6..1cc03b8 100644
> --- a/test/CodeGen/R600/imm.ll
> +++ b/test/CodeGen/R600/imm.ll
> @@ -474,12 +474,9 @@ define void @add_inline_imm_64_f64(double addrspace(1)* %out, double %x) {
>   }
>   
>   
> -; FIXME: These shoudn't bother materializing in SGPRs
> -
>   ; CHECK-LABEL: {{^}}store_inline_imm_0.0_f64
> -; CHECK: s_mov_b64 s{{\[}}[[LO_SREG:[0-9]+]]:[[HI_SREG:[0-9]+]]{{\]}}, 0{{$}}
> -; CHECK-DAG: v_mov_b32_e32 v[[LO_VREG:[0-9]+]], s[[LO_SREG]]
> -; CHECK-DAG: v_mov_b32_e32 v[[HI_VREG:[0-9]+]], s[[HI_SREG]]
> +; CHECK: v_mov_b32_e32 v[[LO_VREG:[0-9]+]], 0
> +; CHECK: v_mov_b32_e32 v[[HI_VREG:[0-9]+]], 0
>   ; CHECK: buffer_store_dwordx2 v{{\[}}[[LO_VREG]]:[[HI_VREG]]{{\]}}
>   define void @store_inline_imm_0.0_f64(double addrspace(1)* %out) {
>     store double 0.0, double addrspace(1)* %out
> diff --git a/test/CodeGen/R600/local-atomics64.ll b/test/CodeGen/R600/local-atomics64.ll
> index ce0cf59..b39581e 100644
> --- a/test/CodeGen/R600/local-atomics64.ll
> +++ b/test/CodeGen/R600/local-atomics64.ll
> @@ -30,9 +30,8 @@ define void @lds_atomic_add_ret_i64(i64 addrspace(1)* %out, i64 addrspace(3)* %p
>   
>   ; FUNC-LABEL: {{^}}lds_atomic_add_ret_i64_offset:
>   ; SI: s_load_dword [[PTR:s[0-9]+]], s{{\[[0-9]+:[0-9]+\]}}, 0xb
> -; SI: s_mov_b64 s{{\[}}[[LOSDATA:[0-9]+]]:[[HISDATA:[0-9]+]]{{\]}}, 9
> -; SI-DAG: v_mov_b32_e32 v[[LOVDATA:[0-9]+]], s[[LOSDATA]]
> -; SI-DAG: v_mov_b32_e32 v[[HIVDATA:[0-9]+]], s[[HISDATA]]
> +; SI-DAG: v_mov_b32_e32 v[[LOVDATA:[0-9]+]], 9
> +; SI-DAG: v_mov_b32_e32 v[[HIVDATA:[0-9]+]], 0
>   ; SI-DAG: v_mov_b32_e32 [[VPTR:v[0-9]+]], [[PTR]]
>   ; SI: ds_add_rtn_u64 [[RESULT:v\[[0-9]+:[0-9]+\]]], [[VPTR]], v{{\[}}[[LOVDATA]]:[[HIVDATA]]{{\]}} offset:32 [M0]
>   ; SI: buffer_store_dwordx2 [[RESULT]],
> @@ -45,9 +44,8 @@ define void @lds_atomic_add_ret_i64_offset(i64 addrspace(1)* %out, i64 addrspace
>   }
>   
>   ; FUNC-LABEL: {{^}}lds_atomic_inc_ret_i64:
> -; SI: s_mov_b64 s{{\[}}[[LOSDATA:[0-9]+]]:[[HISDATA:[0-9]+]]{{\]}}, -1
> -; SI-DAG: v_mov_b32_e32 v[[LOVDATA:[0-9]+]], s[[LOSDATA]]
> -; SI-DAG: v_mov_b32_e32 v[[HIVDATA:[0-9]+]], s[[HISDATA]]
> +; SI: v_mov_b32_e32 v[[LOVDATA:[0-9]+]], -1
> +; SI: v_mov_b32_e32 v[[HIVDATA:[0-9]+]], -1
>   ; SI: ds_inc_rtn_u64 [[RESULT:v\[[0-9]+:[0-9]+\]]], [[VPTR]], v{{\[}}[[LOVDATA]]:[[HIVDATA]]{{\]}}
>   ; SI: buffer_store_dwordx2 [[RESULT]],
>   ; SI: s_endpgm
> @@ -87,9 +85,8 @@ define void @lds_atomic_sub_ret_i64_offset(i64 addrspace(1)* %out, i64 addrspace
>   }
>   
>   ; FUNC-LABEL: {{^}}lds_atomic_dec_ret_i64:
> -; SI: s_mov_b64 s{{\[}}[[LOSDATA:[0-9]+]]:[[HISDATA:[0-9]+]]{{\]}}, -1
> -; SI-DAG: v_mov_b32_e32 v[[LOVDATA:[0-9]+]], s[[LOSDATA]]
> -; SI-DAG: v_mov_b32_e32 v[[HIVDATA:[0-9]+]], s[[HISDATA]]
> +; SI: v_mov_b32_e32 v[[LOVDATA:[0-9]+]], -1
> +; SI: v_mov_b32_e32 v[[HIVDATA:[0-9]+]], -1
>   ; SI: ds_dec_rtn_u64 [[RESULT:v\[[0-9]+:[0-9]+\]]], [[VPTR]], v{{\[}}[[LOVDATA]]:[[HIVDATA]]{{\]}}
>   ; SI: buffer_store_dwordx2 [[RESULT]],
>   ; SI: s_endpgm
> @@ -277,10 +274,9 @@ define void @lds_atomic_add_noret_i64(i64 addrspace(3)* %ptr) nounwind {
>   
>   ; FUNC-LABEL: {{^}}lds_atomic_add_noret_i64_offset:
>   ; SI: s_load_dword [[PTR:s[0-9]+]], s{{\[[0-9]+:[0-9]+\]}}, 0x9
> -; SI: s_mov_b64 s{{\[}}[[LOSDATA:[0-9]+]]:[[HISDATA:[0-9]+]]{{\]}}, 9
> -; SI-DAG: v_mov_b32_e32 v[[LOVDATA:[0-9]+]], s[[LOSDATA]]
> -; SI-DAG: v_mov_b32_e32 v[[HIVDATA:[0-9]+]], s[[HISDATA]]
> -; SI-DAG: v_mov_b32_e32 [[VPTR:v[0-9]+]], [[PTR]]
> +; SI: v_mov_b32_e32 [[VPTR:v[0-9]+]], [[PTR]]
> +; SI: v_mov_b32_e32 v[[LOVDATA:[0-9]+]], 9
> +; SI: v_mov_b32_e32 v[[HIVDATA:[0-9]+]], 0
>   ; SI: ds_add_u64 [[VPTR]], v{{\[}}[[LOVDATA]]:[[HIVDATA]]{{\]}} offset:32 [M0]
>   ; SI: s_endpgm
>   define void @lds_atomic_add_noret_i64_offset(i64 addrspace(3)* %ptr) nounwind {
> @@ -290,9 +286,8 @@ define void @lds_atomic_add_noret_i64_offset(i64 addrspace(3)* %ptr) nounwind {
>   }
>   
>   ; FUNC-LABEL: {{^}}lds_atomic_inc_noret_i64:
> -; SI: s_mov_b64 s{{\[}}[[LOSDATA:[0-9]+]]:[[HISDATA:[0-9]+]]{{\]}}, -1
> -; SI-DAG: v_mov_b32_e32 v[[LOVDATA:[0-9]+]], s[[LOSDATA]]
> -; SI-DAG: v_mov_b32_e32 v[[HIVDATA:[0-9]+]], s[[HISDATA]]
> +; SI: v_mov_b32_e32 v[[LOVDATA:[0-9]+]], -1
> +; SI: v_mov_b32_e32 v[[HIVDATA:[0-9]+]], -1
>   ; SI: ds_inc_u64 [[VPTR]], v{{\[}}[[LOVDATA]]:[[HIVDATA]]{{\]}}
>   ; SI: s_endpgm
>   define void @lds_atomic_inc_noret_i64(i64 addrspace(3)* %ptr) nounwind {
> @@ -327,9 +322,8 @@ define void @lds_atomic_sub_noret_i64_offset(i64 addrspace(3)* %ptr) nounwind {
>   }
>   
>   ; FUNC-LABEL: {{^}}lds_atomic_dec_noret_i64:
> -; SI: s_mov_b64 s{{\[}}[[LOSDATA:[0-9]+]]:[[HISDATA:[0-9]+]]{{\]}}, -1
> -; SI-DAG: v_mov_b32_e32 v[[LOVDATA:[0-9]+]], s[[LOSDATA]]
> -; SI-DAG: v_mov_b32_e32 v[[HIVDATA:[0-9]+]], s[[HISDATA]]
> +; SI: v_mov_b32_e32 v[[LOVDATA:[0-9]+]], -1
> +; SI: v_mov_b32_e32 v[[HIVDATA:[0-9]+]], -1
>   ; SI: ds_dec_u64 [[VPTR]], v{{\[}}[[LOVDATA]]:[[HIVDATA]]{{\]}}
>   ; SI: s_endpgm
>   define void @lds_atomic_dec_noret_i64(i64 addrspace(3)* %ptr) nounwind {
> -- 1.8.5.5
>
> 0004-R600-SI-Remove-VReg_32-register-class.patch
>
>
>  From 754af776b8bfaa365fdd28d733205e61bfd11e2a Mon Sep 17 00:00:00 2001
> From: Tom Stellard<thomas.stellard at amd.com>
> Date: Mon, 5 Jan 2015 12:06:23 -0500
> Subject: [PATCH 4/8] R600/SI: Remove VReg_32 register class
>
> Use VGPR_32 register class instead.  These two register classes were
> identical and having separate classes was causing
> SIInstrInfo::isLegalOperands() to be overly conservative in some cases.
>
> This change is necessary to prevent future paches from missing a folding
> opportunity in fneg-fabs.ll.
LGTM. These have been confusing me

> 0005-R600-SI-Only-fold-immediates-that-have-one-use.patch
>
>  From c9d83c6bfc01b9ffe5b575bf36bac5e5cbbf0ee1 Mon Sep 17 00:00:00 2001
> From: Tom Stellard<thomas.stellard at amd.com>
> Date: Mon, 5 Jan 2015 15:15:42 -0500
> Subject: [PATCH 5/8] R600/SI: Only fold immediates that have one use
>
> Folding the same immediate into multiple instruction will increase
> program size, which can hurt performance.
> ---
>   lib/Target/R600/SIFoldOperands.cpp   |  9 ++++++++-
>   test/CodeGen/R600/operand-folding.ll | 35 +++++++++++++++++++++++++++++++++++
>   2 files changed, 43 insertions(+), 1 deletion(-)
>
> diff --git a/lib/Target/R600/SIFoldOperands.cpp b/lib/Target/R600/SIFoldOperands.cpp
> index ddb2852..545905b 100644
> --- a/lib/Target/R600/SIFoldOperands.cpp
> +++ b/lib/Target/R600/SIFoldOperands.cpp
> @@ -138,6 +138,14 @@ bool SIFoldOperands::runOnMachineFunction(MachineFunction &MF) {
>           continue;
>   
>         MachineOperand &OpToFold = MI.getOperand(1);
> +      bool FoldingImm = OpToFold.isImm() || OpToFold.isFPImm();
> +
> +      // Folding immediates with more than one use will increase program side.
> +      // FIXME: This will also reduce register usage, which may be better
> +      // in some cases.  A better heuristic is needed.
> +      if (FoldingImm && !TII->isInlineConstant(OpToFold) &&
> +          !MRI.hasOneUse(MI.getOperand(0).getReg()))
> +        continue;
>   
>         // FIXME: Fold operands with subregs.
>         if (OpToFold.isReg() &&
> @@ -158,7 +166,6 @@ bool SIFoldOperands::runOnMachineFunction(MachineFunction &MF) {
>             continue;
>           }
>   
> -        bool FoldingImm = OpToFold.isImm() || OpToFold.isFPImm();
>           APInt Imm;
>   
>           if (FoldingImm) {
> diff --git a/test/CodeGen/R600/operand-folding.ll b/test/CodeGen/R600/operand-folding.ll
> index ebe7f1a..fb92045 100644
> --- a/test/CodeGen/R600/operand-folding.ll
> +++ b/test/CodeGen/R600/operand-folding.ll
> @@ -53,5 +53,40 @@ entry:
>     ret void
>   }
>   
> +; Inline constants should always be folded.
> +
> +; CHECK-LABEL: {{^}}vector_inline:
> +; CHECK: v_xor_b32_e32 v{{[0-9]+}}, 5, v{{[0-9]+}}
> +; CHECK: v_xor_b32_e32 v{{[0-9]+}}, 5, v{{[0-9]+}}
> +; CHECK: v_xor_b32_e32 v{{[0-9]+}}, 5, v{{[0-9]+}}
> +; CHECK: v_xor_b32_e32 v{{[0-9]+}}, 5, v{{[0-9]+}}
> +
> +define void @vector_inline(<4 x i32> addrspace(1)* %out) {
> +entry:
> +  %tmp0 = call i32 @llvm.r600.read.tidig.x()
> +  %tmp1 = add i32 %tmp0, 1
> +  %tmp2 = add i32 %tmp0, 2
> +  %tmp3 = add i32 %tmp0, 3
> +  %vec0 = insertelement <4 x i32> undef, i32 %tmp0, i32 0
> +  %vec1 = insertelement <4 x i32> %vec0, i32 %tmp1, i32 1
> +  %vec2 = insertelement <4 x i32> %vec1, i32 %tmp2, i32 2
> +  %vec3 = insertelement <4 x i32> %vec2, i32 %tmp3, i32 3
> +  %tmp4 = xor <4 x i32> <i32 5, i32 5, i32 5, i32 5>, %vec3
> +  store <4 x i32> %tmp4, <4 x i32> addrspace(1)* %out
> +  ret void
> +}
> +
> +; Immediates with one use should be folded
> +; CHECK-LABEL: {{^}}imm_one_use:
> +; CHECK: v_xor_b32_e32 v{{[0-9]+}}, 0x64, v{{[0-9]+}}
> +
> +define void @imm_one_use(i32 addrspace(1)* %out) {
> +entry:
> +  %tmp0 = call i32 @llvm.r600.read.tidig.x()
> +  %tmp1 = xor i32 %tmp0, 100
> +  store i32 %tmp1, i32 addrspace(1)* %out
> +  ret void
> +}
> +


This needs a case with more than one use of a non-inline immediate

>   declare i32 @llvm.r600.read.tidig.x() #0
>   attributes #0 = { readnone }
> -- 1.8.5.5
>
> 0006-R600-SI-Commute-instructions-to-enable-more-folding-.patch
>
>
>  From b1a22fd107bb3d808b12493ebf1c363ae85ded4e Mon Sep 17 00:00:00 2001
> From: Tom Stellard<thomas.stellard at amd.com>
> Date: Wed, 17 Dec 2014 14:03:36 -0500
> Subject: [PATCH 6/8] R600/SI: Commute instructions to enable more folding
>   opportunities
>
> ---
>   lib/Target/R600/SIFoldOperands.cpp           | 65 ++++++++++++++++++++--------
>   lib/Target/R600/SIInstrInfo.cpp              |  5 ++-
>   test/CodeGen/R600/mulhu.ll                   |  2 +-
>   test/CodeGen/R600/sdiv.ll                    |  2 +-
>   test/CodeGen/R600/use-sgpr-multiple-times.ll |  4 +-
>   5 files changed, 55 insertions(+), 23 deletions(-)
>
LGTM

>
> 0007-R600-SI-Remove-SIISelLowering-legalizeOperands.patch
>
>
>  From 923b828ea1a81d6595438a840fb7c94cffefc5c4 Mon Sep 17 00:00:00 2001
> From: Tom Stellard<thomas.stellard at amd.com>
> Date: Thu, 11 Dec 2014 19:05:21 -0500
> Subject: [PATCH 7/8] R600/SI: Remove SIISelLowering::legalizeOperands()
>
> Its functionality has been replaced by calling
> SIInstrInfo::legalizeOperands() from
> SIISelLowering::AdjstInstrPostInstrSelection() and running the
> SIFoldOperands and SIShrinkInstructions passes.
> ---
>   lib/Target/R600/SIISelLowering.cpp   | 174 +----------------------------------
>   lib/Target/R600/SIISelLowering.h     |   3 -
>   test/CodeGen/R600/fneg.ll            |   2 +-
>   test/CodeGen/R600/imm.ll             |   4 +-
>   test/CodeGen/R600/operand-folding.ll |  21 +++++
>   test/CodeGen/R600/seto.ll            |   2 +-
>   test/CodeGen/R600/setuo.ll           |   2 +-
>   test/CodeGen/R600/sint_to_fp.f64.ll  |   4 +-
>   test/CodeGen/R600/sint_to_fp.ll      |   2 +-
>   test/CodeGen/R600/uint_to_fp.f64.ll  |   4 +-
>   test/CodeGen/R600/uint_to_fp.ll      |   2 +-
>   test/CodeGen/R600/xor.ll             |   2 +-
>   12 files changed, 34 insertions(+), 188 deletions(-)
>

LGTM

>
> 0008-R600-SI-Add-pattern-for-bitcasting-fp-immediates-to-.patch
>
>
>  From 8d9fe7fe2860455344791bab039d273cf087b62e Mon Sep 17 00:00:00 2001
> From: Tom Stellard<thomas.stellard at amd.com>
> Date: Tue, 6 Jan 2015 10:46:00 -0500
> Subject: [PATCH 8/8] R600/SI: Add pattern for bitcasting fp immediates to
>   integers
>
> The backend now assumes that all immediates are integers.  This allows
> us to simplify immediate handling code, becasue we no longer need to
> handle fp and integer immediates differently.
> ---
>   lib/Target/R600/AMDGPUMCInstLower.cpp    | 12 ------------
>   lib/Target/R600/SIFoldOperands.cpp       |  9 ++-------
>   lib/Target/R600/SIISelLowering.cpp       | 10 +++++-----
>   lib/Target/R600/SIInstrInfo.cpp          | 32 ++++++++++++++------------------
>   lib/Target/R600/SIInstrInfo.td           | 12 ++++++++++++
>   lib/Target/R600/SIInstructions.td        |  6 +++---
>   lib/Target/R600/SILowerControlFlow.cpp   |  5 ++---
>   lib/Target/R600/SIShrinkInstructions.cpp |  9 +--------
>   8 files changed, 39 insertions(+), 56 deletions(-)
>
LGTM

> diff --git a/lib/Target/R600/SIInstrInfo.td b/lib/Target/R600/SIInstrInfo.td
> index fc60dd6..030abc7 100644
> --- a/lib/Target/R600/SIInstrInfo.td
> +++ b/lib/Target/R600/SIInstrInfo.td
> @@ -159,6 +159,18 @@ def as_i64imm: SDNodeXForm<imm, [{
>     return CurDAG->getTargetConstant(N->getSExtValue(), MVT::i64);
>   }]>;
>   
> +// Copied from the AArch64 backend:
> +def bitcast_fpimm_to_i32 : SDNodeXForm<fpimm, [{
> +return CurDAG->getTargetConstant(
> +  N->getValueAPF().bitcastToAPInt().getZExtValue(), MVT::i32);
> +}]>;
> +
> +// Copied from the AArch64 backend:
> +def bitcast_fpimm_to_i64 : SDNodeXForm<fpimm, [{
> +return CurDAG->getTargetConstant(
> +  N->getValueAPF().bitcastToAPInt().getZExtValue(), MVT::i64);
> +}]>;
> +
These should probably go in TargetSelectionDAG.td

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150106/cde5cfb1/attachment.html>