[llvm] [X86][AVX] Match v4f64 blend from shuffle of scalar values. (PR #135753)

Tue Apr 15 06:40:27 PDT 2025

================
@@ -9040,6 +9040,39 @@ X86TargetLowering::LowerBUILD_VECTOR(SDValue Op, SelectionDAG &DAG) const {
   MVT OpEltVT = Op.getOperand(0).getSimpleValueType();
   unsigned NumElems = Op.getNumOperands();
 
+  // Match BUILD_VECTOR of scalars that we can lower to X86ISD::BLENDI via
+  // shuffles.
+  //
+  //   v4f64 = BUILD_VECTOR X,Y,Y,X
+  //   >>>
+  //       t1: v4f64 = BUILD_VECTOR X,u,u,u
+  //     t3: v4f64 = vector_shuffle<0,u,u,0> t1, u
+  //       t2: v4f64 = BUILD_VECTOR Y,u,u,u
+  //     t4: v4f64 = vector_shuffle<u,0,0,u> t2, u
+  //   v4f64 = vector_shuffle<0,5,6,3> t3, t4
+  //
+  if (Subtarget.hasAVX() && VT == MVT::v4f64 && Op->getNumOperands() == 4u) {
----------------
RKSimon wrote:

we can generalize this to handle any "blend of a pair of splats" pattern - with SSE41 or later - with 16-bit elements or larger - if there are just 2 scalars.

https://github.com/llvm/llvm-project/pull/135753