[PATCH] D116039: [X86] Combine reduce (add (mul x, y)) to VNNI instruction.

Sat Dec 25 21:57:03 PST 2021

pengfei added a comment.

Format the code as Lint suggested?

================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:41810
+    std::swap(Op0, Op1);
+
+  auto IsFreeTruncation =
----------------
Can we return false here if `Op0` is not ZERO_EXTEND?

================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:41813-41814
+      [](SDValue &Op) -> bool {
+        if ((Op.getOpcode() == ISD::ZERO_EXTEND ||
+             Op.getOpcode() == ISD::SIGN_EXTEND) &&
+            Op.getOperand(0).getValueType().getScalarSizeInBits() <= 8)
----------------
How about `ANY_EXTEND` ? The same below.

================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:41826
+  if ((IsFreeTruncation(Op0) && DAG.ComputeMinSignedBits(Op0) <= 9 &&
+       Op0.getOpcode() == ISD::ZERO_EXTEND) &&
+      (IsFreeTruncation(Op1) && DAG.ComputeMinSignedBits(Op1) <= 8))
----------------
It's better to hoist to line 41810. How about `ANY_EXTEND`?

================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:41870
+                               ZExt0.getValueType();
+  LogBias = 2;
+
----------------
It is always 2? Better to add a comment to explain.

================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:41887-41888
+
+  // Actually build the DotProduct, split as 128/256/512 bits for
+  // SSE/AVX2/AVX512BW.
+  auto DpBuilder = [&](SelectionDAG &DAG, const SDLoc &DL,
----------------
AVXVNNI implies AVX2. We won't need to split to 128 bits.

================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:42156-42157
+  EVT ExtractVT = Extract->getValueType(0);
+  // Verify the type we're extracting is i32, as the output element type of
+  // vpdpbusd and vpdpwssd is i32.
+  if (ExtractVT != MVT::i32)
----------------
Can we generate i32 first then do the truncation?

================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:42162-42163
+  EVT VT = Extract->getOperand(0).getValueType();
+  if (!isPowerOf2_32(VT.getVectorNumElements()))
+    return SDValue();
+
----------------
Can check `DCI.isAfterLegalizeDAG()` before calling the function instead?

================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:42210
+  EVT ResVT = EVT::getVectorVT(*DAG.getContext(), ExtractVT,
+                               DpVT.getSizeInBits() / ExtractSizeInBits);
+  DP = DAG.getBitcast(ResVT, DP);
----------------
Can ues `ExtractVT.getSizeInBits()` derectly.

================
Comment at: llvm/test/CodeGen/X86/dpbusd.ll:13
+; AVXVNNI-NEXT:    vextracti128 $1, %ymm0, %xmm1
+; AVXVNNI-NEXT:    vpaddd %xmm1, %xmm0, %xmm0
+; AVXVNNI-NEXT:    vpshufd {{.*#+}} xmm1 = xmm0[2,3,2,3]
----------------
This is the only and a strange diff with the AVX512 code. Is there anything wrong in one of each?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D116039/new/

https://reviews.llvm.org/D116039