[PATCH] D116039: [X86] Combine reduce (add (mul x, y)) to VNNI instruction.

Mon Dec 27 11:02:39 PST 2021

craig.topper added inline comments.

================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:42176
+
+  if (Root && (Root.getOpcode() == ISD::SIGN_EXTEND ||
+               Root.getOpcode() == ISD::ZERO_EXTEND ||
----------------
craig.topper wrote:
> craig.topper wrote:
> > Is this code valid for this transform? There's a large comment of justification for why it is ok for SAD. I think I only saw a test for the SIGN_EXTEND case?
> Oops I see the other test. I need to think about the math.
I don't think we can do this if the multiply result is zero extended. Each of the 4 multiplies done by vpdpbusd compute a signed 16-bit product that will be sign extended before adding into the accumulator.

I think we also need to verify that the multiply has at least 2x the number of bits of the input. We shouldn't match (sign_extend (mul (vXi9 (zext (vXi8 X))), (vXi9 (zext (vXi8 Y)))). Does anything prevent that right now?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D116039/new/

https://reviews.llvm.org/D116039