[llvm] [WIP][SelectionDAG] Add support for the 3-way comparison intrinsics [US]CMP (PR #91871)

Mon May 20 15:21:04 PDT 2024

================
@@ -6273,6 +6382,32 @@ SDValue DAGTypeLegalizer::WidenVecOp_EXTEND(SDNode *N) {
   }
 }
 
+SDValue DAGTypeLegalizer::WidenVecOp_CMP(SDNode *N) {
+  SDLoc dl(N);
+
+  EVT OpVT = N->getOperand(0).getValueType();
+  EVT ResVT = N->getValueType(0);
+  SDValue LHS = GetWidenedVector(N->getOperand(0));
+  SDValue RHS = GetWidenedVector(N->getOperand(1));
+
+  // 1. EXTRACT_SUBVECTOR
+  // 2. SIGN_EXTEND/ZERO_EXTEND
+  // 3. CMP
+  LHS = DAG.getNode(ISD::EXTRACT_SUBVECTOR, dl, OpVT, LHS,
+                    DAG.getVectorIdxConstant(0, dl));
+  RHS = DAG.getNode(ISD::EXTRACT_SUBVECTOR, dl, OpVT, RHS,
+                    DAG.getVectorIdxConstant(0, dl));
+
+  // At this point the result type is guaranteed to be valid, so we can use it
+  // as the operand type by extending it appropriately
+  ISD::NodeType ExtendOpcode =
+      N->getOpcode() == ISD::SCMP ? ISD::SIGN_EXTEND : ISD::ZERO_EXTEND;
+  LHS = DAG.getNode(ExtendOpcode, dl, ResVT, LHS);
+  RHS = DAG.getNode(ExtendOpcode, dl, ResVT, RHS);
----------------
Poseydon42 wrote:

I may be very mistaken here, but as far as I understand the other option would have been to use widened operands (i.e. `v4i8` widened to `v16i8`). This would in turn cause the result type to be widened as well (i.e. in the same scenario `v4i32` might have had to be widened to `v16i32`). The result type might now become too wide, requiring it to be split, which would split the operands, creating a legalization cycle and/or an assertion failure down the line (when I tried to remove sign extension just now I triggered an assertion because a mask and operands of `VSELECT` have different number of arguments, but I believe that I had also run into the described legalization cycle before).

If we just extract the subvector from a widened operand without sign-extending it, we wouldn't actually widen the operand at all, i.e. the sequence of type changes might look like `v4i8` (original type) -> `v16i8` (widened type) -> `v4i8` (type extracted from the widened vector), which would mean that we didn't change the instruction at all and instead just ran into another cycle.

What I have decided to do instead is after we get the widened operand and extract the necessary number of elements from it, we can then sign extend this extracted vector to the type of the result of the instruction. We know that this type is legal at this stage and that the number of elements in the result and the operands must match, so the sign extension is technically valid and we have shifted the job of legalizing a vector operand with a small element size to `SIGN_EXTEND`/`ZERO_EXTEND`. Whether this is a good idea or not I am not sure and would be happy to hear your opinion on it.

https://github.com/llvm/llvm-project/pull/91871