[llvm] [X86] Disable the `vpdpwssd -> vpmaddwd + vpaddd` combiner pattern on AMD Zen4 (PR #84347)

Thu Mar 7 09:29:56 PST 2024

https://github.com/bjacob created https://github.com/llvm/llvm-project/pull/84347

This tweaks the existing `vpdpwssd -> vpmaddwd + vpaddd` machine combiner pattern to *not* kick in on AMD Zen4, fixing Issue #84182.

This pattern was introduced in https://github.com/llvm/llvm-project/commit/8f7f9d86a7555263ef08fded15a6b778d796ec3f , and I have expressed in https://github.com/llvm/llvm-project/issues/84182#issuecomment-1981685009 some generic concerns about it, which are not even AMD-specific: my actual first choice would be to simply revert that commit. This PR only exists on the assumption that we don't want to do that, that we want to keep the current behavior outside of the AMD Zen4 target where Issue #84182 shows it to be a clear regression. To be clear though, what I am saying in https://github.com/llvm/llvm-project/issues/84182#issuecomment-1981685009 is that it also is a regression on non-AMD CPUs, on different kinds of test cases, one being the simple one-intrinsic testcase add in this PR (having only one intrinsic it clearly falls out of the scope of the rationale for https://github.com/llvm/llvm-project/commit/8f7f9d86a7555263ef08fded15a6b778d796ec3f, so the fact that that commit did affect it seems unintentional) and another being real world optimized SIMD code that, unlike the code motivating this in the commit message, would use enough registers to not be gated on instruction latency.

This PR also fixes what seems like a bug(?) in  https://github.com/llvm/llvm-project/commit/8f7f9d86a7555263ef08fded15a6b778d796ec3f : in
https://github.com/llvm/llvm-project/blob/3714f937b835c06c8c32ca4f3f61ba2317db2296/llvm/lib/Target/X86/X86InstrInfo.cpp#L10584-L10586
the `return true;` was not part of the `if` branch, so the function would return there even if the `if` was not taken, preventing the `TargetInstrInfo::getMachineCombinerPatterns` from being reached. This seems unintentional and seems to be a departure from all the other `getMachineCombinerPatterns` functions that  I can find, where either the target-specific case or the fallback `TargetInstrInfo::getMachineCombinerPatterns` is taken.

>From 712174707d11ad3d152655f63bd0f471292ec1eb Mon Sep 17 00:00:00 2001
From: Benoit Jacob <jacob.benoit.1 at gmail.com>
Date: Thu, 7 Mar 2024 11:53:11 -0500
Subject: [PATCH] vpdpwssd

---
 llvm/lib/Target/X86/X86InstrInfo.cpp     | 19 +++++++++++++------
 llvm/test/CodeGen/X86/znver4-vpdpwssd.ll | 15 +++++++++++++++
 2 files changed, 28 insertions(+), 6 deletions(-)
 create mode 100644 llvm/test/CodeGen/X86/znver4-vpdpwssd.ll

diff --git a/llvm/lib/Target/X86/X86InstrInfo.cpp b/llvm/lib/Target/X86/X86InstrInfo.cpp
index 3f0557e651f89b..1834d72a62cf63 100644
--- a/llvm/lib/Target/X86/X86InstrInfo.cpp
+++ b/llvm/lib/Target/X86/X86InstrInfo.cpp
@@ -10563,17 +10563,20 @@ void X86InstrInfo::buildClearRegister(Register Reg, MachineBasicBlock &MBB,
 bool X86InstrInfo::getMachineCombinerPatterns(
     MachineInstr &Root, SmallVectorImpl<MachineCombinerPattern> &Patterns,
     bool DoRegPressureReduce) const {
+  bool EnableVPDPWSSTPatterns = !Subtarget.getCPU().starts_with("znver");
   unsigned Opc = Root.getOpcode();
   switch (Opc) {
   default:
-    return TargetInstrInfo::getMachineCombinerPatterns(Root, Patterns,
-                                                       DoRegPressureReduce);
+    break;
   case X86::VPDPWSSDrr:
   case X86::VPDPWSSDrm:
   case X86::VPDPWSSDYrr:
   case X86::VPDPWSSDYrm: {
-    Patterns.push_back(MachineCombinerPattern::DPWSSD);
-    return true;
+    if (EnableVPDPWSSTPatterns) {
+      Patterns.push_back(MachineCombinerPattern::DPWSSD);
+      return true;
+    }
+    break;
   }
   case X86::VPDPWSSDZ128r:
   case X86::VPDPWSSDZ128m:
@@ -10581,11 +10584,15 @@ bool X86InstrInfo::getMachineCombinerPatterns(
   case X86::VPDPWSSDZ256m:
   case X86::VPDPWSSDZr:
   case X86::VPDPWSSDZm: {
-    if (Subtarget.hasBWI())
+    if (EnableVPDPWSSTPatterns && Subtarget.hasBWI()) {
       Patterns.push_back(MachineCombinerPattern::DPWSSD);
-    return true;
+      return true;
+    }
+    break;
   }
   }
+  return TargetInstrInfo::getMachineCombinerPatterns(Root, Patterns,
+                                                     DoRegPressureReduce);
 }
 
 static void
diff --git a/llvm/test/CodeGen/X86/znver4-vpdpwssd.ll b/llvm/test/CodeGen/X86/znver4-vpdpwssd.ll
new file mode 100644
index 00000000000000..2958c73835e433
--- /dev/null
+++ b/llvm/test/CodeGen/X86/znver4-vpdpwssd.ll
@@ -0,0 +1,15 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mcpu=znver4 | FileCheck %s
+
+define <16 x i32> @vpdpwssd_test(<16 x i32> %0, <16 x i32> %1, <16 x i32> %2) {
+; CHECK-LABEL: vpdpwssd_test:
+; CHECK:       # %bb.0:
+; CHECK-NEXT:    vpdpwssd %zmm2, %zmm1, %zmm0
+; CHECK-NEXT:    retq
+  %4 = tail call <16 x i32> @llvm.x86.avx512.vpdpwssd.512(<16 x i32> %0, <16 x i32> %1, <16 x i32> %2)
+  ret <16 x i32> %4
+}
+
+declare <16 x i32> @llvm.x86.avx512.vpdpwssd.512(<16 x i32>, <16 x i32>, <16 x i32>) #1
+
+attributes #1 = { mustprogress nocallback nofree nosync nounwind willreturn memory(none) }