[llvm] [ValueTracking][X86] Compute KnownBits for phadd/phsub (PR #92429)
via llvm-commits
llvm-commits at lists.llvm.org
Sat Jun 15 06:22:19 PDT 2024
================
@@ -37276,6 +37296,55 @@ void X86TargetLowering::computeKnownBitsForTargetNode(const SDValue Op,
computeKnownBitsForPSADBW(LHS, RHS, Known, DemandedElts, DAG, Depth);
break;
}
+ case Intrinsic::x86_ssse3_phadd_d:
+ case Intrinsic::x86_ssse3_phadd_w:
+ case Intrinsic::x86_ssse3_phadd_d_128:
+ case Intrinsic::x86_ssse3_phadd_w_128:
+ case Intrinsic::x86_avx2_phadd_d:
+ case Intrinsic::x86_avx2_phadd_w: {
+ Known = computeKnownBitsForHorizontalOperation(
+ Op, DemandedElts, Depth, /*OpIndexStart=*/1, DAG,
+ [](const KnownBits &KnownLHS, const KnownBits &KnownRHS) {
+ return KnownBits::computeForAddSub(
+ /*Add=*/true, /*NSW=*/false, /*NUW=*/false, KnownLHS, KnownRHS);
+ });
+ break;
+ }
+ case Intrinsic::x86_ssse3_phadd_sw:
+ case Intrinsic::x86_ssse3_phadd_sw_128:
+ case Intrinsic::x86_avx2_phadd_sw: {
+ Known = computeKnownBitsForHorizontalOperation(
+ Op, DemandedElts, Depth, /*OpIndexStart=*/1, DAG,
+ [](const KnownBits &KnownLHS, const KnownBits &KnownRHS) {
+ return KnownBits::sadd_sat(KnownLHS, KnownRHS);
+ });
+ break;
+ }
+ case Intrinsic::x86_ssse3_phsub_d:
+ case Intrinsic::x86_ssse3_phsub_w:
+ case Intrinsic::x86_ssse3_phsub_d_128:
+ case Intrinsic::x86_ssse3_phsub_w_128:
+ case Intrinsic::x86_avx2_phsub_d:
+ case Intrinsic::x86_avx2_phsub_w: {
+ Known = computeKnownBitsForHorizontalOperation(
+ Op, DemandedElts, Depth, /*OpIndexStart=*/1, DAG,
+ [](const KnownBits &KnownLHS, const KnownBits &KnownRHS) {
+ return KnownBits::computeForAddSub(/*Add=*/false, /*NSW=*/false,
+ /*NUW=*/false, KnownLHS,
+ KnownRHS);
+ });
+ break;
+ }
+ case Intrinsic::x86_ssse3_phsub_sw:
+ case Intrinsic::x86_ssse3_phsub_sw_128:
+ case Intrinsic::x86_avx2_phsub_sw: {
+ Known = computeKnownBitsForHorizontalOperation(
+ Op, DemandedElts, Depth, /*OpIndexStart=*/1, DAG,
+ [](const KnownBits &KnownLHS, const KnownBits &KnownRHS) {
+ return KnownBits::ssub_sat(KnownLHS, KnownRHS);
+ });
+ break;
----------------
mskamp wrote:
The implementation handled the intrinsics because otherwise some test cases would not fold. For example, the test case that truncates `<4 x i32>` to `<4 x i16>` does not fold when handling only the `X86ISD::HADD`/`HSUB`. In contrast, tests that truncate `<8 x i32>` to `<8 x i16>` work fine this way.
After looking at this problem again, I believe that the code that replaces the shuffle with a pack instruction might be too strict. This is probably also the case in this example: https://godbolt.org/z/KW5b6r7xW
Anyway, I've removed the handling of the intrinsics and adapted the test cases such that they still fold with only the `X86ISD::HADD`/`HSUB` nodes.
https://github.com/llvm/llvm-project/pull/92429
More information about the llvm-commits
mailing list