[llvm] [X86] LowervXi8MulWithUNPCK - remove special case constant folding handling (PR #163567)
Simon Pilgrim via llvm-commits
llvm-commits at lists.llvm.org
Thu Oct 16 08:16:01 PDT 2025
================
@@ -665,14 +665,12 @@ define <16 x i8> @combine_vec_udiv_nonuniform4(<16 x i8> %x) {
;
; XOP-LABEL: combine_vec_udiv_nonuniform4:
; XOP: # %bb.0:
-; XOP-NEXT: movl $171, %eax
+; XOP-NEXT: movl $249, %eax
; XOP-NEXT: vmovd %eax, %xmm1
; XOP-NEXT: vpmovzxbw {{.*#+}} xmm2 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero
-; XOP-NEXT: vpmullw %xmm1, %xmm2, %xmm1
-; XOP-NEXT: vpsrlw $8, %xmm1, %xmm1
-; XOP-NEXT: movl $249, %eax
-; XOP-NEXT: vmovd %eax, %xmm2
-; XOP-NEXT: vpshlb %xmm2, %xmm1, %xmm1
+; XOP-NEXT: vpmullw {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm2, %xmm2
----------------
RKSimon wrote:
Because we keep the bitcasts, we bump the instruction recursion depth just enough that SimplifyDemandedVectorElts can't remove the VZEXT_MOVL node for us - and then we perform this in combineTargetShuffle for all VZEXT_MOVL(SCALAR_TO_VECTOR(CONSTANT)) cases:
```
// Load a scalar integer constant directly to XMM instead of transferring an
// immediate value from GPR.
// vzext_movl (scalar_to_vector C) --> load [C,0...]
if (N0.getOpcode() == ISD::SCALAR_TO_VECTOR) {
if (auto *C = dyn_cast<ConstantSDNode>(N0.getOperand(0))) {
// Create a vector constant - scalar constant followed by zeros.
EVT ScalarVT = N0.getOperand(0).getValueType();
Type *ScalarTy = ScalarVT.getTypeForEVT(*DAG.getContext());
Constant *Zero = ConstantInt::getNullValue(ScalarTy);
SmallVector<Constant *, 32> ConstantVec(NumElts, Zero);
ConstantVec[0] = const_cast<ConstantInt *>(C->getConstantIntValue());
// Load the vector constant from constant pool.
MVT PVT = TLI.getPointerTy(DAG.getDataLayout());
SDValue CP = DAG.getConstantPool(ConstantVector::get(ConstantVec), PVT);
MachinePointerInfo MPI =
MachinePointerInfo::getConstantPool(DAG.getMachineFunction());
Align Alignment = cast<ConstantPoolSDNode>(CP)->getAlign();
return DAG.getLoad(VT, DL, DAG.getEntryNode(), CP, MPI, Alignment,
MachineMemOperand::MOLoad);
}
}
```
Its odd that if we just have SCALAR_TO_VECTOR(CONSTANT) we keep the gpr->xmm transfer, especially as VZEXT_MOVL will most likely disappear due to the implicit zeroing of upper elements by MOVD/Q. IIRC we've encountered this many times and sometimes we've tried to avoid a load, and other times we've wanted to fold the load to reduce register pressure - we never come up with a solution that works in all cases, although I suspect trying to solve this in DAG is where we're going wrong.
https://github.com/llvm/llvm-project/pull/163567
More information about the llvm-commits
mailing list