[llvm] [RISCV] Use vmv.v.x for any rv32 e64 splat with equal halves (PR #130530)
Philip Reames via llvm-commits
llvm-commits at lists.llvm.org
Mon Mar 10 08:21:47 PDT 2025
================
@@ -4361,27 +4361,19 @@ static SDValue splatPartsI64WithVL(const SDLoc &DL, MVT VT, SDValue Passthru,
if ((LoC >> 31) == HiC)
return DAG.getNode(RISCVISD::VMV_V_X_VL, DL, VT, Passthru, Lo, VL);
- // If vl is equal to VLMAX or fits in 4 bits and Hi constant is equal to Lo,
- // we could use vmv.v.x whose EEW = 32 to lower it. This allows us to use
- // vlmax vsetvli or vsetivli to change the VL.
- // FIXME: Support larger constants?
- // FIXME: Support non-constant VLs by saturating?
+ // Use vmv.v.x with EEW=32. Use either a vsetivli or vsetvli to change
+ // VL. This can temporarily increase VL if VL less than VLMAX.
if (LoC == HiC) {
SDValue NewVL;
- if (isAllOnesConstant(VL) ||
- (isa<RegisterSDNode>(VL) &&
- cast<RegisterSDNode>(VL)->getReg() == RISCV::X0))
- NewVL = DAG.getRegister(RISCV::X0, MVT::i32);
- else if (isa<ConstantSDNode>(VL) && isUInt<4>(VL->getAsZExtVal()))
+ if (isa<ConstantSDNode>(VL) && isUInt<4>(VL->getAsZExtVal()))
NewVL = DAG.getNode(ISD::ADD, DL, VL.getValueType(), VL, VL);
-
- if (NewVL) {
- MVT InterVT =
- MVT::getVectorVT(MVT::i32, VT.getVectorElementCount() * 2);
- auto InterVec = DAG.getNode(RISCVISD::VMV_V_X_VL, DL, InterVT,
- DAG.getUNDEF(InterVT), Lo, NewVL);
- return DAG.getNode(ISD::BITCAST, DL, VT, InterVec);
- }
+ else
+ NewVL = DAG.getRegister(RISCV::X0, MVT::i32);
+ MVT InterVT =
+ MVT::getVectorVT(MVT::i32, VT.getVectorElementCount() * 2);
+ auto InterVec = DAG.getNode(RISCVISD::VMV_V_X_VL, DL, InterVT,
+ DAG.getUNDEF(InterVT), Lo, NewVL);
+ return DAG.getNode(ISD::BITCAST, DL, VT, InterVec);
}
}
----------------
preames wrote:
Somewhat of an aside, if we wanted to avoid the vlse in general, we could use a vmv.v.x followed by a masked vmerge.vxm where the mask is 0101... to handle any hi/lo values. This isn't "better" per se, except that it avoids some memory traffic. For cores which don't implement the optimized broadcast load case, this might be a win. I don't have an example where this would be worthwhile, so I don't plan to pursue this now.
https://github.com/llvm/llvm-project/pull/130530
More information about the llvm-commits
mailing list