[PATCH] transform fadd chains to increase parallelism

Tue Apr 28 11:16:01 PDT 2015

================
Comment at: lib/CodeGen/SelectionDAG/DAGCombiner.cpp:7662
@@ +7661,3 @@
+    // and 1 dependent operation:
+    //   (fadd x, (fadd y, (fadd z, w))) -> (fadd (fadd x, y), (fadd z, w))
+    if (N0.getOpcode() == ISD::FADD &&  N0.hasOneUse() &&
----------------
I would prefer the comment to match the actual code, i.e., invert the order of the operand:
(fadd (fadd (fadd z, w), y), x) -> (fadd (fadd z, w), (fadd x, y))

You could even use named operands like this:
(fadd N0: (fadd N00: (fadd z, w), N01: y), N1: x) -> (fadd N00: (fadd z, w), (fadd N1: x, M01: y))

================
Comment at: lib/CodeGen/SelectionDAG/DAGCombiner.cpp:7666
@@ +7665,3 @@
+      SDValue N00 = N0.getOperand(0);
+      SDValue N01 = N0.getOperand(1);
+      if (N00.getOpcode() == ISD::FADD) {
----------------
You can move this assignment into the next if.

================
Comment at: test/CodeGen/X86/fp-fast.ll:124
@@ +123,3 @@
+; CHECK-NEXT:    vaddss {{%xmm[0-9], %xmm[0-9]}}, [[XMM1:%xmm[0-9]]]
+; CHECK-NEXT:    vaddss {{%xmm[0-9], %xmm[0-9]}}, [[XMM2:%xmm[0-9]]]
+; CHECK-NEXT:    vaddss [[XMM2]], [[XMM1]], 
----------------
Can’t you be more specific on the input registers?
With a pattern like this, I believe even the old inefficient sequence would match, wouldn’t it?

http://reviews.llvm.org/D9232

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/