[PATCH] D116039: [X86] Combine reduce (add (mul x, y)) to VNNI instruction.
Craig Topper via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Dec 27 11:02:39 PST 2021
craig.topper added inline comments.
================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:42176
+
+ if (Root && (Root.getOpcode() == ISD::SIGN_EXTEND ||
+ Root.getOpcode() == ISD::ZERO_EXTEND ||
----------------
craig.topper wrote:
> craig.topper wrote:
> > Is this code valid for this transform? There's a large comment of justification for why it is ok for SAD. I think I only saw a test for the SIGN_EXTEND case?
> Oops I see the other test. I need to think about the math.
I don't think we can do this if the multiply result is zero extended. Each of the 4 multiplies done by vpdpbusd compute a signed 16-bit product that will be sign extended before adding into the accumulator.
I think we also need to verify that the multiply has at least 2x the number of bits of the input. We shouldn't match (sign_extend (mul (vXi9 (zext (vXi8 X))), (vXi9 (zext (vXi8 Y)))). Does anything prevent that right now?
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D116039/new/
https://reviews.llvm.org/D116039
More information about the llvm-commits
mailing list