[llvm] r326570 - [ARM] Fix codegen for VLD3/VLD4/VST3/VST4 with WB
Florian Hahn via llvm-commits
llvm-commits at lists.llvm.org
Fri Mar 2 05:02:55 PST 2018
Author: fhahn
Date: Fri Mar 2 05:02:55 2018
New Revision: 326570
URL: http://llvm.org/viewvc/llvm-project?rev=326570&view=rev
Log:
[ARM] Fix codegen for VLD3/VLD4/VST3/VST4 with WB
Code generation of VLD3, VLD4, VST3 and VST4 with register writeback is
broken due to 2 separate bugs:
1) VLD1d64TPseudoWB_register and VLD1d64QPseudoWB_register are missing
rules to expand them to non pseudo MIR. These are selected for
ARMISD::VLD3_UPD/VLD4_UPD with v1i64 vectors in SelectVLD.
2) Selection of the right VLD/VST instruction is broken for load and
store of 3 and 4 v1i64 vectors. SelectVLD and SelectVST are called
with MIR opcode for fixed writeback (ie increment is access size)
and call getVLDSTRegisterUpdateOpcode() to select an opcode with
register writeback if base register update is of a different size.
Since getVLDSTRegisterUpdateOpcode() only knows about
VLD1/VLD2/VST1/VST2 the call is currently conditional on the number
of element in the vector.
However, VLD1/VST1 is selected by SelectVLD/SelectVST's caller for
load and stores of 3 or 4 v1i64 vectors. Therefore the opcode is not
updated which later lead to a fixed writeback instruction being
constructed with an extra operand for the register writeback.
This patch addresses the two issues as follows:
- it adds the necessary mapping from VLD1d64TPseudoWB_register and
VLD1d64QPseudoWB_register to VLD1d64Twb_register and
VLD1d64Qwb_register respectively. Like for the existing _fixed
variants, the cost of these is bumped for unaligned access.
- it changes the logic in SelectVLD and SelectVSD to call isVLDfixed
and isVSTfixed respectively to decide whether the opcode should be
updated. It also reworks the logic and comments for pushing the
writeback offset operand and r0 operand to clarify the logic:
writeback offset needs to be pushed if it's a register writeback,
r0 needs to be pushed if not and the instruction is a
VLD1/VLD2/VST1/VST2.
Reviewers: rengolin, t.p.northover, samparker
Reviewed By: samparker
Patch by Thomas Preud'homme <thomas.preudhomme at arm.com>
Differential Revision: https://reviews.llvm.org/D42970
Modified:
llvm/trunk/lib/Target/ARM/ARMBaseInstrInfo.cpp
llvm/trunk/lib/Target/ARM/ARMExpandPseudoInsts.cpp
llvm/trunk/lib/Target/ARM/ARMISelDAGToDAG.cpp
llvm/trunk/test/CodeGen/ARM/vld3.ll
llvm/trunk/test/CodeGen/ARM/vld4.ll
llvm/trunk/test/CodeGen/ARM/vst3.ll
llvm/trunk/test/CodeGen/ARM/vst4.ll
Modified: llvm/trunk/lib/Target/ARM/ARMBaseInstrInfo.cpp
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/ARM/ARMBaseInstrInfo.cpp?rev=326570&r1=326569&r2=326570&view=diff
==============================================================================
--- llvm/trunk/lib/Target/ARM/ARMBaseInstrInfo.cpp (original)
+++ llvm/trunk/lib/Target/ARM/ARMBaseInstrInfo.cpp Fri Mar 2 05:02:55 2018
@@ -4214,6 +4214,7 @@ ARMBaseInstrInfo::getOperandLatency(cons
case ARM::VLD3d32Pseudo:
case ARM::VLD1d64TPseudo:
case ARM::VLD1d64TPseudoWB_fixed:
+ case ARM::VLD1d64TPseudoWB_register:
case ARM::VLD3d8Pseudo_UPD:
case ARM::VLD3d16Pseudo_UPD:
case ARM::VLD3d32Pseudo_UPD:
@@ -4231,6 +4232,7 @@ ARMBaseInstrInfo::getOperandLatency(cons
case ARM::VLD4d32Pseudo:
case ARM::VLD1d64QPseudo:
case ARM::VLD1d64QPseudoWB_fixed:
+ case ARM::VLD1d64QPseudoWB_register:
case ARM::VLD4d8Pseudo_UPD:
case ARM::VLD4d16Pseudo_UPD:
case ARM::VLD4d32Pseudo_UPD:
Modified: llvm/trunk/lib/Target/ARM/ARMExpandPseudoInsts.cpp
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/ARM/ARMExpandPseudoInsts.cpp?rev=326570&r1=326569&r2=326570&view=diff
==============================================================================
--- llvm/trunk/lib/Target/ARM/ARMExpandPseudoInsts.cpp (original)
+++ llvm/trunk/lib/Target/ARM/ARMExpandPseudoInsts.cpp Fri Mar 2 05:02:55 2018
@@ -156,8 +156,10 @@ static const NEONLdStTableEntry NEONLdSt
{ ARM::VLD1d64QPseudo, ARM::VLD1d64Q, true, false, false, SingleSpc, 4, 1 ,false},
{ ARM::VLD1d64QPseudoWB_fixed, ARM::VLD1d64Qwb_fixed, true, true, false, SingleSpc, 4, 1 ,false},
+{ ARM::VLD1d64QPseudoWB_register, ARM::VLD1d64Qwb_register, true, true, true, SingleSpc, 4, 1 ,false},
{ ARM::VLD1d64TPseudo, ARM::VLD1d64T, true, false, false, SingleSpc, 3, 1 ,false},
{ ARM::VLD1d64TPseudoWB_fixed, ARM::VLD1d64Twb_fixed, true, true, false, SingleSpc, 3, 1 ,false},
+{ ARM::VLD1d64TPseudoWB_register, ARM::VLD1d64Twb_register, true, true, true, SingleSpc, 3, 1 ,false},
{ ARM::VLD2LNd16Pseudo, ARM::VLD2LNd16, true, false, false, SingleSpc, 2, 4 ,true},
{ ARM::VLD2LNd16Pseudo_UPD, ARM::VLD2LNd16_UPD, true, true, true, SingleSpc, 2, 4 ,true},
@@ -1503,6 +1505,7 @@ bool ARMExpandPseudo::ExpandMI(MachineBa
case ARM::VLD3d32Pseudo:
case ARM::VLD1d64TPseudo:
case ARM::VLD1d64TPseudoWB_fixed:
+ case ARM::VLD1d64TPseudoWB_register:
case ARM::VLD3d8Pseudo_UPD:
case ARM::VLD3d16Pseudo_UPD:
case ARM::VLD3d32Pseudo_UPD:
@@ -1520,6 +1523,7 @@ bool ARMExpandPseudo::ExpandMI(MachineBa
case ARM::VLD4d32Pseudo:
case ARM::VLD1d64QPseudo:
case ARM::VLD1d64QPseudoWB_fixed:
+ case ARM::VLD1d64QPseudoWB_register:
case ARM::VLD4d8Pseudo_UPD:
case ARM::VLD4d16Pseudo_UPD:
case ARM::VLD4d32Pseudo_UPD:
Modified: llvm/trunk/lib/Target/ARM/ARMISelDAGToDAG.cpp
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/ARM/ARMISelDAGToDAG.cpp?rev=326570&r1=326569&r2=326570&view=diff
==============================================================================
--- llvm/trunk/lib/Target/ARM/ARMISelDAGToDAG.cpp (original)
+++ llvm/trunk/lib/Target/ARM/ARMISelDAGToDAG.cpp Fri Mar 2 05:02:55 2018
@@ -1794,15 +1794,17 @@ void ARMDAGToDAGISel::SelectVLD(SDNode *
Ops.push_back(Align);
if (isUpdating) {
SDValue Inc = N->getOperand(AddrOpIdx + 1);
- // FIXME: VLD1/VLD2 fixed increment doesn't need Reg0. Remove the reg0
- // case entirely when the rest are updated to that form, too.
bool IsImmUpdate = isPerfectIncrement(Inc, VT, NumVecs);
- if ((NumVecs <= 2) && !IsImmUpdate)
- Opc = getVLDSTRegisterUpdateOpcode(Opc);
- // FIXME: We use a VLD1 for v1i64 even if the pseudo says vld2/3/4, so
- // check for that explicitly too. Horribly hacky, but temporary.
- if ((NumVecs > 2 && !isVLDfixed(Opc)) || !IsImmUpdate)
- Ops.push_back(IsImmUpdate ? Reg0 : Inc);
+ if (!IsImmUpdate) {
+ // We use a VLD1 for v1i64 even if the pseudo says vld2/3/4, so
+ // check for the opcode rather than the number of vector elements.
+ if (isVLDfixed(Opc))
+ Opc = getVLDSTRegisterUpdateOpcode(Opc);
+ Ops.push_back(Inc);
+ // VLD1/VLD2 fixed increment does not need Reg0 so only include it in
+ // the operands if not such an opcode.
+ } else if (!isVLDfixed(Opc))
+ Ops.push_back(Reg0);
}
Ops.push_back(Pred);
Ops.push_back(Reg0);
@@ -1948,16 +1950,17 @@ void ARMDAGToDAGISel::SelectVST(SDNode *
Ops.push_back(Align);
if (isUpdating) {
SDValue Inc = N->getOperand(AddrOpIdx + 1);
- // FIXME: VST1/VST2 fixed increment doesn't need Reg0. Remove the reg0
- // case entirely when the rest are updated to that form, too.
bool IsImmUpdate = isPerfectIncrement(Inc, VT, NumVecs);
- if (NumVecs <= 2 && !IsImmUpdate)
- Opc = getVLDSTRegisterUpdateOpcode(Opc);
- // FIXME: We use a VST1 for v1i64 even if the pseudo says vld2/3/4, so
- // check for that explicitly too. Horribly hacky, but temporary.
- if (!IsImmUpdate)
+ if (!IsImmUpdate) {
+ // We use a VST1 for v1i64 even if the pseudo says VST2/3/4, so
+ // check for the opcode rather than the number of vector elements.
+ if (isVSTfixed(Opc))
+ Opc = getVLDSTRegisterUpdateOpcode(Opc);
Ops.push_back(Inc);
- else if (NumVecs > 2 && !isVSTfixed(Opc))
+ }
+ // VST1/VST2 fixed increment does not need Reg0 so only include it in
+ // the operands if not such an opcode.
+ else if (!isVSTfixed(Opc))
Ops.push_back(Reg0);
}
Ops.push_back(SrcReg);
Modified: llvm/trunk/test/CodeGen/ARM/vld3.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/ARM/vld3.ll?rev=326570&r1=326569&r2=326570&view=diff
==============================================================================
--- llvm/trunk/test/CodeGen/ARM/vld3.ll (original)
+++ llvm/trunk/test/CodeGen/ARM/vld3.ll Fri Mar 2 05:02:55 2018
@@ -96,6 +96,19 @@ define <1 x i64> @vld3i64_update(i64** %
ret <1 x i64> %tmp4
}
+define <1 x i64> @vld3i64_reg_update(i64** %ptr, i64* %A) nounwind {
+;CHECK-LABEL: vld3i64_reg_update:
+;CHECK: vld1.64 {d16, d17, d18}, [{{r[0-9]+|lr}}:64], {{r[0-9]+|lr}}
+ %tmp0 = bitcast i64* %A to i8*
+ %tmp1 = call %struct.__neon_int64x1x3_t @llvm.arm.neon.vld3.v1i64.p0i8(i8* %tmp0, i32 16)
+ %tmp5 = getelementptr i64, i64* %A, i32 1
+ store i64* %tmp5, i64** %ptr
+ %tmp2 = extractvalue %struct.__neon_int64x1x3_t %tmp1, 0
+ %tmp3 = extractvalue %struct.__neon_int64x1x3_t %tmp1, 2
+ %tmp4 = add <1 x i64> %tmp2, %tmp3
+ ret <1 x i64> %tmp4
+}
+
define <16 x i8> @vld3Qi8(i8* %A) nounwind {
;CHECK-LABEL: vld3Qi8:
;Check the alignment value. Max for this instruction is 64 bits:
Modified: llvm/trunk/test/CodeGen/ARM/vld4.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/ARM/vld4.ll?rev=326570&r1=326569&r2=326570&view=diff
==============================================================================
--- llvm/trunk/test/CodeGen/ARM/vld4.ll (original)
+++ llvm/trunk/test/CodeGen/ARM/vld4.ll Fri Mar 2 05:02:55 2018
@@ -96,6 +96,19 @@ define <1 x i64> @vld4i64_update(i64** %
ret <1 x i64> %tmp4
}
+define <1 x i64> @vld4i64_reg_update(i64** %ptr, i64* %A) nounwind {
+;CHECK-LABEL: vld4i64_reg_update:
+;CHECK: vld1.64 {d16, d17, d18, d19}, [{{r[0-9]+|lr}}:256], {{r[0-9]+|lr}}
+ %tmp0 = bitcast i64* %A to i8*
+ %tmp1 = call %struct.__neon_int64x1x4_t @llvm.arm.neon.vld4.v1i64.p0i8(i8* %tmp0, i32 64)
+ %tmp5 = getelementptr i64, i64* %A, i32 1
+ store i64* %tmp5, i64** %ptr
+ %tmp2 = extractvalue %struct.__neon_int64x1x4_t %tmp1, 0
+ %tmp3 = extractvalue %struct.__neon_int64x1x4_t %tmp1, 2
+ %tmp4 = add <1 x i64> %tmp2, %tmp3
+ ret <1 x i64> %tmp4
+}
+
define <16 x i8> @vld4Qi8(i8* %A) nounwind {
;CHECK-LABEL: vld4Qi8:
;Check the alignment value. Max for this instruction is 256 bits:
Modified: llvm/trunk/test/CodeGen/ARM/vst3.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/ARM/vst3.ll?rev=326570&r1=326569&r2=326570&view=diff
==============================================================================
--- llvm/trunk/test/CodeGen/ARM/vst3.ll (original)
+++ llvm/trunk/test/CodeGen/ARM/vst3.ll Fri Mar 2 05:02:55 2018
@@ -73,6 +73,18 @@ define void @vst3i64_update(i64** %ptr,
ret void
}
+define void @vst3i64_reg_update(i64** %ptr, <1 x i64>* %B) nounwind {
+;CHECK-LABEL: vst3i64_reg_update
+;CHECK: vst1.64 {d{{.*}}, d{{.*}}, d{{.*}}}, [r{{.*}}], r{{.*}}
+ %A = load i64*, i64** %ptr
+ %tmp0 = bitcast i64* %A to i8*
+ %tmp1 = load <1 x i64>, <1 x i64>* %B
+ call void @llvm.arm.neon.vst3.p0i8.v1i64(i8* %tmp0, <1 x i64> %tmp1, <1 x i64> %tmp1, <1 x i64> %tmp1, i32 1)
+ %tmp2 = getelementptr i64, i64* %A, i32 1
+ store i64* %tmp2, i64** %ptr
+ ret void
+}
+
define void @vst3Qi8(i8* %A, <16 x i8>* %B) nounwind {
;CHECK-LABEL: vst3Qi8:
;Check the alignment value. Max for this instruction is 64 bits:
Modified: llvm/trunk/test/CodeGen/ARM/vst4.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/ARM/vst4.ll?rev=326570&r1=326569&r2=326570&view=diff
==============================================================================
--- llvm/trunk/test/CodeGen/ARM/vst4.ll (original)
+++ llvm/trunk/test/CodeGen/ARM/vst4.ll Fri Mar 2 05:02:55 2018
@@ -72,6 +72,18 @@ define void @vst4i64_update(i64** %ptr,
ret void
}
+define void @vst4i64_reg_update(i64** %ptr, <1 x i64>* %B) nounwind {
+;CHECK-LABEL: vst4i64_reg_update:
+;CHECK: vst1.64 {d16, d17, d18, d19}, [r{{[0-9]+}}], r{{[0-9]+}}
+ %A = load i64*, i64** %ptr
+ %tmp0 = bitcast i64* %A to i8*
+ %tmp1 = load <1 x i64>, <1 x i64>* %B
+ call void @llvm.arm.neon.vst4.p0i8.v1i64(i8* %tmp0, <1 x i64> %tmp1, <1 x i64> %tmp1, <1 x i64> %tmp1, <1 x i64> %tmp1, i32 1)
+ %tmp2 = getelementptr i64, i64* %A, i32 1
+ store i64* %tmp2, i64** %ptr
+ ret void
+}
+
define void @vst4Qi8(i8* %A, <16 x i8>* %B) nounwind {
;CHECK-LABEL: vst4Qi8:
;Check the alignment value. Max for this instruction is 256 bits:
More information about the llvm-commits
mailing list