[llvm] [X86] Generate `vpmuludq` instead of `vpmullq` (PR #121456)
via llvm-commits
llvm-commits at lists.llvm.org
Wed Jan 1 23:37:40 PST 2025
https://github.com/abhishek-kaushik22 created https://github.com/llvm/llvm-project/pull/121456
When lowering `_mm512_mul_epu32` intrinsic if the generated value if later used in a vector shuffle we generate `vpmullq` instead of `vpmuludq` (https://godbolt.org/z/WbaGMqs8e) because `SimplifyDemandedVectorElts` simplifies the arguments and we fail the combine to `PMULDQ`.
Added an override to `shouldSimplifyDemandedVectorElts` in `X86TargetLowering` to check if we can combine the `MUL` to `PMULDQ` first.
>From a0551f887bf63971ecb3bb16155b48972bb631b8 Mon Sep 17 00:00:00 2001
From: abhishek-kaushik22 <abhishek.kaushik at intel.com>
Date: Thu, 2 Jan 2025 13:05:07 +0530
Subject: [PATCH] [X86] Generate `vpmuludq` instead of `vpmullq`
When lowering `_mm512_mul_epu32` intrinsic if the generated value if later used in a vector shuffle we generate `vpmullq` instead of `vpmuludq` (https://godbolt.org/z/WbaGMqs8e) because `SimplifyDemandedVectorElts` simplifies the arguments and we fail the combine to `PMULDQ`.
Added an override to `shouldSimplifyDemandedVectorElts` in `X86TargetLowering` to check if we can combine the `MUL` to `PMULDQ` first.
---
llvm/lib/Target/X86/X86ISelLowering.cpp | 21 +++++++++++++++++++++
llvm/lib/Target/X86/X86ISelLowering.h | 3 +++
2 files changed, 24 insertions(+)
diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp
index a0514e93d6598b..e104264bcbf918 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -60832,3 +60832,24 @@ Align X86TargetLowering::getPrefLoopAlignment(MachineLoop *ML) const {
return Align(1ULL << ExperimentalPrefInnermostLoopAlignment);
return TargetLowering::getPrefLoopAlignment();
}
+
+bool X86TargetLowering::shouldSimplifyDemandedVectorElts(
+ SDValue Op, const TargetLoweringOpt &TLO) const {
+ if (Op.getOpcode() == ISD::VECTOR_SHUFFLE) {
+ SDValue V0 = peekThroughBitcasts(Op.getOperand(0));
+ SDValue V1 = peekThroughBitcasts(Op.getOperand(1));
+
+ if (V0.getOpcode() == ISD::MUL || V1.getOpcode() == ISD::MUL) {
+ SDNode *Mul = V0.getOpcode() == ISD::MUL ? V0.getNode() : V1.getNode();
+ SelectionDAG &DAG = TLO.DAG;
+ const X86Subtarget &Subtarget = DAG.getSubtarget<X86Subtarget>();
+ const SDLoc DL(Mul);
+
+ if (SDValue V = combineMulToPMULDQ(Mul, DL, DAG, Subtarget)) {
+ DAG.ReplaceAllUsesWith(Mul, V.getNode());
+ return false;
+ }
+ }
+ }
+ return true;
+}
diff --git a/llvm/lib/Target/X86/X86ISelLowering.h b/llvm/lib/Target/X86/X86ISelLowering.h
index 2b7a8eaf249d83..0a6cd53f557bb2 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.h
+++ b/llvm/lib/Target/X86/X86ISelLowering.h
@@ -1207,6 +1207,9 @@ namespace llvm {
bool hasBitTest(SDValue X, SDValue Y) const override;
+ bool shouldSimplifyDemandedVectorElts(
+ SDValue Op, const TargetLoweringOpt &TLO) const override;
+
bool shouldProduceAndByConstByHoistingConstFromShiftsLHSOfAnd(
SDValue X, ConstantSDNode *XC, ConstantSDNode *CC, SDValue Y,
unsigned OldShiftOpcode, unsigned NewShiftOpcode,
More information about the llvm-commits
mailing list