[llvm] r375446 - [AMDGPU] Select AGPR in PHI operand legalization
Stanislav Mekhanoshin via llvm-commits
llvm-commits at lists.llvm.org
Mon Oct 21 12:25:27 PDT 2019
Author: rampitec
Date: Mon Oct 21 12:25:27 2019
New Revision: 375446
URL: http://llvm.org/viewvc/llvm-project?rev=375446&view=rev
Log:
[AMDGPU] Select AGPR in PHI operand legalization
If a PHI defines AGPR legalize its operands to AGPR.
At the moment we can get an AGPR PHI with VGPR operands.
I am not aware of any problems as it seems to be handled
gracefully in RA, but this is not right anyway.
It also slightly decreases VGPR pressure in some cases
because we do not have to a copy via VGPR.
Differential Revision: https://reviews.llvm.org/D69206
Modified:
llvm/trunk/lib/Target/AMDGPU/SIInstrInfo.cpp
llvm/trunk/test/CodeGen/AMDGPU/mfma-loop.ll
Modified: llvm/trunk/lib/Target/AMDGPU/SIInstrInfo.cpp
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/AMDGPU/SIInstrInfo.cpp?rev=375446&r1=375445&r2=375446&view=diff
==============================================================================
--- llvm/trunk/lib/Target/AMDGPU/SIInstrInfo.cpp (original)
+++ llvm/trunk/lib/Target/AMDGPU/SIInstrInfo.cpp Mon Oct 21 12:25:27 2019
@@ -4576,6 +4576,10 @@ void SIInstrInfo::legalizeOperands(Machi
VRC = RI.hasAGPRs(getOpRegClass(MI, 0))
? RI.getEquivalentAGPRClass(SRC)
: RI.getEquivalentVGPRClass(SRC);
+ } else {
+ VRC = RI.hasAGPRs(getOpRegClass(MI, 0))
+ ? RI.getEquivalentAGPRClass(VRC)
+ : RI.getEquivalentVGPRClass(VRC);
}
RC = VRC;
} else {
Modified: llvm/trunk/test/CodeGen/AMDGPU/mfma-loop.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AMDGPU/mfma-loop.ll?rev=375446&r1=375445&r2=375446&view=diff
==============================================================================
--- llvm/trunk/test/CodeGen/AMDGPU/mfma-loop.ll (original)
+++ llvm/trunk/test/CodeGen/AMDGPU/mfma-loop.ll Mon Oct 21 12:25:27 2019
@@ -1,13 +1,64 @@
; RUN: llc -march=amdgcn -mcpu=gfx908 -verify-machineinstrs < %s | FileCheck -check-prefix=GCN %s
; GCN-LABEL: {{^}}test_mfma_loop_zeroinit:
-; GCN-COUNT32: v_accvgpr_write_b32
+
+; Check that we do not use 32 temp vgprs, but rotate 3 vgprs only.
+; 3 vgprs are needed to avoid wait states between writes.
+
+; FIXME: We should not be using and temporary registers at all.
+; At the moment we initialize an sgpr, then copy it via vgprs.
+
+; GCN: v_accvgpr_write_b32 a{{[0-9]+}}, [[TMP2:v[0-9]+]]
+; GCN: v_accvgpr_write_b32 a{{[0-9]+}}, [[TMP3:v[0-9]+]]
+
+; GCN: v_accvgpr_write_b32 a{{[0-9]+}}, [[TMP1:v[0-9]+]]
+; GCN: v_accvgpr_write_b32 a{{[0-9]+}}, [[TMP2]]
+; GCN: v_accvgpr_write_b32 a{{[0-9]+}}, [[TMP3]]
+
+; GCN: v_accvgpr_write_b32 a{{[0-9]+}}, [[TMP1]]
+; GCN: v_accvgpr_write_b32 a{{[0-9]+}}, [[TMP2]]
+; GCN: v_accvgpr_write_b32 a{{[0-9]+}}, [[TMP3]]
+
+; GCN: v_accvgpr_write_b32 a{{[0-9]+}}, [[TMP1]]
+; GCN: v_accvgpr_write_b32 a{{[0-9]+}}, [[TMP2]]
+; GCN: v_accvgpr_write_b32 a{{[0-9]+}}, [[TMP3]]
+
+; GCN: v_accvgpr_write_b32 a{{[0-9]+}}, [[TMP1]]
+; GCN: v_accvgpr_write_b32 a{{[0-9]+}}, [[TMP2]]
+; GCN: v_accvgpr_write_b32 a{{[0-9]+}}, [[TMP3]]
+
+; GCN: v_accvgpr_write_b32 a{{[0-9]+}}, [[TMP1]]
+; GCN: v_accvgpr_write_b32 a{{[0-9]+}}, [[TMP2]]
+; GCN: v_accvgpr_write_b32 a{{[0-9]+}}, [[TMP3]]
+
+; GCN: v_accvgpr_write_b32 a{{[0-9]+}}, [[TMP1]]
+; GCN: v_accvgpr_write_b32 a{{[0-9]+}}, [[TMP2]]
+; GCN: v_accvgpr_write_b32 a{{[0-9]+}}, [[TMP3]]
+
+; GCN: v_accvgpr_write_b32 a{{[0-9]+}}, [[TMP1]]
+; GCN: v_accvgpr_write_b32 a{{[0-9]+}}, [[TMP2]]
+; GCN: v_accvgpr_write_b32 a{{[0-9]+}}, [[TMP3]]
+
+; GCN: v_accvgpr_write_b32 a{{[0-9]+}}, [[TMP1]]
+; GCN: v_accvgpr_write_b32 a{{[0-9]+}}, [[TMP2]]
+; GCN: v_accvgpr_write_b32 a{{[0-9]+}}, [[TMP3]]
+
+; GCN: v_accvgpr_write_b32 a{{[0-9]+}}, [[TMP1]]
+; GCN: v_accvgpr_write_b32 a{{[0-9]+}}, [[TMP2]]
+; GCN: v_accvgpr_write_b32 a{{[0-9]+}}, [[TMP3]]
+
+; Check that we do not copy agprs to vgprs and back inside the loop.
+
; GCN: [[LOOP:BB[0-9_]+]]:
; GCN-NOT: v_accvgpr
; GCN: v_mfma_f32_32x32x1f32
; GCN-NOT: v_accvgpr
; GCN: s_cbranch_scc1 [[LOOP]]
+
+; Final result should be read only once after the loop.
+
; GCN-COUNT32: v_accvgpr_read_b32
+
define amdgpu_kernel void @test_mfma_loop_zeroinit(<32 x float> addrspace(1)* %arg) {
entry:
br label %for.cond.preheader
More information about the llvm-commits
mailing list