[llvm] [X86] Shrink width of masked loads/stores (PR #105451)

Wed Aug 21 11:32:35 PDT 2024

================
@@ -51536,20 +51536,111 @@ combineMaskedLoadConstantMask(MaskedLoadSDNode *ML, SelectionDAG &DAG,
   return DCI.CombineTo(ML, Blend, NewML.getValue(1), true);
 }
 
+static bool tryShrinkMaskedOperation(SelectionDAG &DAG, const SDLoc &DL,
+                                     SDValue Mask, EVT OrigVT,
+                                     SDValue *ValInOut, EVT *NewVTOut,
+                                     SDValue *NewMaskOut) {
+  // Ensure we have a reasonable input type.
+  // Also ensure ensure input bits is larger then xmm, otherwise its not
+  // profitable to try to shrink.
+  if (!OrigVT.isSimple() || !OrigVT.isVector() ||
+      OrigVT.getSizeInBits() <= 128 || !isPowerOf2_64(OrigVT.getSizeInBits()) ||
+      !isPowerOf2_64(OrigVT.getScalarSizeInBits()))
+    return false;
+
+  SmallVector<SDValue> OrigMask;
+  APInt DemandedElts = getDemandedEltsForMaskedOp(
+      Mask, OrigVT.getVectorNumElements(), &OrigMask);
+  if (DemandedElts.isAllOnes() || DemandedElts.isZero())
+    return false;
+
+  unsigned OrigNumElts = OrigVT.getVectorNumElements();
+  unsigned ReqElts =
+      DemandedElts.getBitWidth() - DemandedElts.countLeadingZeros();
----------------
goldsteinn wrote:

We could but the codegen is only "free" for the lower one. Ill add it as a potential todo.

https://github.com/llvm/llvm-project/pull/105451