[PATCH] transform fadd chains to increase parallelism

Tue Apr 28 11:27:08 PDT 2015

================
Comment at: test/CodeGen/X86/fp-fast.ll:124
@@ +123,3 @@
+; CHECK-NEXT:    vaddss {{%xmm[0-9], %xmm[0-9]}}, [[XMM1:%xmm[0-9]]]
+; CHECK-NEXT:    vaddss {{%xmm[0-9], %xmm[0-9]}}, [[XMM2:%xmm[0-9]]]
+; CHECK-NEXT:    vaddss [[XMM2]], [[XMM1]], 
----------------
qcolombet wrote:
> Can’t you be more specific on the input registers?
> With a pattern like this, I believe even the old inefficient sequence would match, wouldn’t it?
Hi Quentin,

Thanks for reviewing this patch. I don't think we can be more specific on the inputs: we know that xmm0 - xmm3 are the input registers, but the order of the operands as well as the order of the first two adds may be commuted (not by this patch, but some future patch)?

I made sure that the last check will not work without this patch. It requires that the outputs of the first two adds are inputs to the third add. This final add check is actually too specific because it fixes the order of the operands. I tried every regex combo that I could think of to make that more flexible, but couldn't get anything to work with FileCheck.

http://reviews.llvm.org/D9232

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/