[llvm] [X86] LowervXi8MulWithUNPCK - remove special case constant folding handling (PR #163567)

Thu Oct 16 08:16:01 PDT 2025

================
@@ -665,14 +665,12 @@ define <16 x i8> @combine_vec_udiv_nonuniform4(<16 x i8> %x) {
 ;
 ; XOP-LABEL: combine_vec_udiv_nonuniform4:
 ; XOP:       # %bb.0:
-; XOP-NEXT:    movl $171, %eax
+; XOP-NEXT:    movl $249, %eax
 ; XOP-NEXT:    vmovd %eax, %xmm1
 ; XOP-NEXT:    vpmovzxbw {{.*#+}} xmm2 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero
-; XOP-NEXT:    vpmullw %xmm1, %xmm2, %xmm1
-; XOP-NEXT:    vpsrlw $8, %xmm1, %xmm1
-; XOP-NEXT:    movl $249, %eax
-; XOP-NEXT:    vmovd %eax, %xmm2
-; XOP-NEXT:    vpshlb %xmm2, %xmm1, %xmm1
+; XOP-NEXT:    vpmullw {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm2, %xmm2
----------------
RKSimon wrote:

Because we keep the bitcasts, we bump the instruction recursion depth just enough that SimplifyDemandedVectorElts can't remove the VZEXT_MOVL node for us - and then we perform this in combineTargetShuffle for all VZEXT_MOVL(SCALAR_TO_VECTOR(CONSTANT)) cases:
```
    // Load a scalar integer constant directly to XMM instead of transferring an
    // immediate value from GPR.
    // vzext_movl (scalar_to_vector C) --> load [C,0...]
    if (N0.getOpcode() == ISD::SCALAR_TO_VECTOR) {
      if (auto *C = dyn_cast<ConstantSDNode>(N0.getOperand(0))) {
        // Create a vector constant - scalar constant followed by zeros.
        EVT ScalarVT = N0.getOperand(0).getValueType();
        Type *ScalarTy = ScalarVT.getTypeForEVT(*DAG.getContext());
        Constant *Zero = ConstantInt::getNullValue(ScalarTy);
        SmallVector<Constant *, 32> ConstantVec(NumElts, Zero);
        ConstantVec[0] = const_cast<ConstantInt *>(C->getConstantIntValue());

        // Load the vector constant from constant pool.
        MVT PVT = TLI.getPointerTy(DAG.getDataLayout());
        SDValue CP = DAG.getConstantPool(ConstantVector::get(ConstantVec), PVT);
        MachinePointerInfo MPI =
            MachinePointerInfo::getConstantPool(DAG.getMachineFunction());
        Align Alignment = cast<ConstantPoolSDNode>(CP)->getAlign();
        return DAG.getLoad(VT, DL, DAG.getEntryNode(), CP, MPI, Alignment,
                           MachineMemOperand::MOLoad);
      }
    }
```
Its odd that if we just have SCALAR_TO_VECTOR(CONSTANT) we keep the gpr->xmm transfer, especially as VZEXT_MOVL will most likely disappear due to the implicit zeroing of upper elements by MOVD/Q. IIRC we've encountered this many times and sometimes we've tried to avoid a load, and other times we've wanted to fold the load to reduce register pressure - we never come up with a solution that works in all cases, although I suspect trying to solve this in DAG is where we're going wrong.

https://github.com/llvm/llvm-project/pull/163567