[llvm] [ValueTracking][X86] Compute KnownBits for phadd/phsub (PR #92429)

Sat Jun 15 06:22:19 PDT 2024

================
@@ -37276,6 +37296,55 @@ void X86TargetLowering::computeKnownBitsForTargetNode(const SDValue Op,
       computeKnownBitsForPSADBW(LHS, RHS, Known, DemandedElts, DAG, Depth);
       break;
     }
+    case Intrinsic::x86_ssse3_phadd_d:
+    case Intrinsic::x86_ssse3_phadd_w:
+    case Intrinsic::x86_ssse3_phadd_d_128:
+    case Intrinsic::x86_ssse3_phadd_w_128:
+    case Intrinsic::x86_avx2_phadd_d:
+    case Intrinsic::x86_avx2_phadd_w: {
+      Known = computeKnownBitsForHorizontalOperation(
+          Op, DemandedElts, Depth, /*OpIndexStart=*/1, DAG,
+          [](const KnownBits &KnownLHS, const KnownBits &KnownRHS) {
+            return KnownBits::computeForAddSub(
+                /*Add=*/true, /*NSW=*/false, /*NUW=*/false, KnownLHS, KnownRHS);
+          });
+      break;
+    }
+    case Intrinsic::x86_ssse3_phadd_sw:
+    case Intrinsic::x86_ssse3_phadd_sw_128:
+    case Intrinsic::x86_avx2_phadd_sw: {
+      Known = computeKnownBitsForHorizontalOperation(
+          Op, DemandedElts, Depth, /*OpIndexStart=*/1, DAG,
+          [](const KnownBits &KnownLHS, const KnownBits &KnownRHS) {
+            return KnownBits::sadd_sat(KnownLHS, KnownRHS);
+          });
+      break;
+    }
+    case Intrinsic::x86_ssse3_phsub_d:
+    case Intrinsic::x86_ssse3_phsub_w:
+    case Intrinsic::x86_ssse3_phsub_d_128:
+    case Intrinsic::x86_ssse3_phsub_w_128:
+    case Intrinsic::x86_avx2_phsub_d:
+    case Intrinsic::x86_avx2_phsub_w: {
+      Known = computeKnownBitsForHorizontalOperation(
+          Op, DemandedElts, Depth, /*OpIndexStart=*/1, DAG,
+          [](const KnownBits &KnownLHS, const KnownBits &KnownRHS) {
+            return KnownBits::computeForAddSub(/*Add=*/false, /*NSW=*/false,
+                                               /*NUW=*/false, KnownLHS,
+                                               KnownRHS);
+          });
+      break;
+    }
+    case Intrinsic::x86_ssse3_phsub_sw:
+    case Intrinsic::x86_ssse3_phsub_sw_128:
+    case Intrinsic::x86_avx2_phsub_sw: {
+      Known = computeKnownBitsForHorizontalOperation(
+          Op, DemandedElts, Depth, /*OpIndexStart=*/1, DAG,
+          [](const KnownBits &KnownLHS, const KnownBits &KnownRHS) {
+            return KnownBits::ssub_sat(KnownLHS, KnownRHS);
+          });
+      break;
----------------
mskamp wrote:

The implementation handled the intrinsics because otherwise some test cases would not fold. For example, the test case that truncates `<4 x i32>` to `<4 x i16>` does not fold when handling only the `X86ISD::HADD`/`HSUB`. In contrast, tests that truncate `<8 x i32>` to `<8 x i16>` work fine this way.

After looking at this problem again, I believe that the code that replaces the shuffle with a pack instruction might be too strict. This is probably also the case in this example: https://godbolt.org/z/KW5b6r7xW

Anyway, I've removed the handling of the intrinsics and adapted the test cases such that they still fold with only the `X86ISD::HADD`/`HSUB` nodes.

https://github.com/llvm/llvm-project/pull/92429