[llvm] [AMDGPU][SDAG] Legalise v2i32 or/xor/and instructions to make use of 64-bit wide instructions (PR #140694)

Wed Jun 18 13:54:38 PDT 2025

================
@@ -4056,6 +4056,58 @@ SDValue AMDGPUTargetLowering::performShlCombine(SDNode *N,
   SDLoc SL(N);
   SelectionDAG &DAG = DCI.DAG;
 
+  // When the shl64_reduce optimisation code is passed through vector
+  // legalization //some scalarising occurs. After ISD::AND was legalised, this
+  // resulted in the AND instructions no longer being elided, as mentioned
+  // below. The following code should make sure this takes place.
+  // ConstantSDNode *CVANDRHS = dyn_cast<ConstantSDNode>(RHS->getOperand(1));
+  if (RHS->getOpcode() == ISD::EXTRACT_VECTOR_ELT) {
+    SDValue VAND = RHS.getOperand(0);
+    ConstantSDNode *CRRHS = dyn_cast<ConstantSDNode>(RHS->getOperand(1));
+    uint64_t AndIndex = RHS->getConstantOperandVal(1);
+    if (VAND->getOpcode() == ISD::AND && CRRHS) {
+      SDValue LHSAND = VAND.getOperand(0);
+      SDValue RHSAND = VAND.getOperand(1);
+      if (RHSAND->getOpcode() == ISD::BUILD_VECTOR) {
+        // Part of shlcombine is to optimise for the case where its possible
+        // to reduce shl64 to shl32 if shift range is [63-32]. This
+        // transforms: DST = shl i64 X, Y to [0, shl i32 X, (Y & 32) ]. The
----------------
LU-JOHN wrote:

Should be [0, shl i32 X, (Y & 31) ]

https://github.com/llvm/llvm-project/pull/140694