[llvm] r191410 - Fix PR 17368: disable vector mul distribution for square of add/sub for ARM
Weiming Zhao
weimingz at codeaurora.org
Wed Sep 25 16:12:06 PDT 2013
Author: weimingz
Date: Wed Sep 25 18:12:06 2013
New Revision: 191410
URL: http://llvm.org/viewvc/llvm-project?rev=191410&view=rev
Log:
Fix PR 17368: disable vector mul distribution for square of add/sub for ARM
Generally, it is desirable to distribute (a + b) * c to a*c + b*c for
ARM with VMLx forwarding, where a, b and c are vectors.
However, for (a + b)*(a + b), distribution will result in one extra
instruction.
With distribution:
x = a + b (add)
y = a * x (mul)
z = y + b * y (mla)
Without distribution:
x = a + b (add)
z = x * x (mul)
This patch checks if a mul is a square of add/sub. If yes, skip
distribution.
Modified:
llvm/trunk/lib/Target/ARM/ARMISelLowering.cpp
llvm/trunk/test/CodeGen/ARM/vmul.ll
Modified: llvm/trunk/lib/Target/ARM/ARMISelLowering.cpp
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/ARM/ARMISelLowering.cpp?rev=191410&r1=191409&r2=191410&view=diff
==============================================================================
--- llvm/trunk/lib/Target/ARM/ARMISelLowering.cpp (original)
+++ llvm/trunk/lib/Target/ARM/ARMISelLowering.cpp Wed Sep 25 18:12:06 2013
@@ -8342,6 +8342,13 @@ static SDValue PerformSUBCombine(SDNode
/// is faster than
/// vadd d3, d0, d1
/// vmul d3, d3, d2
+// However, for (A + B) * (A + B),
+// vadd d2, d0, d1
+// vmul d3, d0, d2
+// vmla d3, d1, d2
+// is slower than
+// vadd d2, d0, d1
+// vmul d3, d2, d2
static SDValue PerformVMULCombine(SDNode *N,
TargetLowering::DAGCombinerInfo &DCI,
const ARMSubtarget *Subtarget) {
@@ -8361,6 +8368,9 @@ static SDValue PerformVMULCombine(SDNode
std::swap(N0, N1);
}
+ if (N0 == N1)
+ return SDValue();
+
EVT VT = N->getValueType(0);
SDLoc DL(N);
SDValue N00 = N0->getOperand(0);
Modified: llvm/trunk/test/CodeGen/ARM/vmul.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/ARM/vmul.ll?rev=191410&r1=191409&r2=191410&view=diff
==============================================================================
--- llvm/trunk/test/CodeGen/ARM/vmul.ll (original)
+++ llvm/trunk/test/CodeGen/ARM/vmul.ll Wed Sep 25 18:12:06 2013
@@ -515,6 +515,17 @@ entry:
ret void
}
+define <8 x i8> @no_distribute(<8 x i8> %a, <8 x i8> %b) nounwind {
+entry:
+; CHECK: no_distribute
+; CHECK: vadd.i8
+; CHECK: vmul.i8
+; CHECK-NOT: vmla.i8
+ %0 = add <8 x i8> %a, %b
+ %1 = mul <8x i8> %0, %0
+ ret <8 x i8> %1
+}
+
; If one operand has a zero-extend and the other a sign-extend, vmull
; cannot be used.
define i16 @vmullWithInconsistentExtensions(<8 x i8> %vec) {
More information about the llvm-commits
mailing list