[llvm] [X86] Remove X86ISD::VSHLDV/VSHRDV and use ISD::FSHL/FSHR opcodes directly (PR #157616)

Tue Sep 23 04:23:31 PDT 2025

================
@@ -12298,72 +12298,76 @@ defm : vpclmulqdq_aliases<"VPCLMULQDQZ256", VR256X, i256mem>;
 // VBMI2
 //===----------------------------------------------------------------------===//
 
-multiclass VBMI2_shift_var_rm<bits<8> Op, string OpStr, SDNode OpNode,
+multiclass VBMI2_shift_var_rm<bits<8> Op, string OpStr, SDNode OpNode, bit SwapLR,
                               X86FoldableSchedWrite sched, X86VectorVTInfo VTI> {
   let Constraints = "$src1 = $dst",
----------------
TianYe717 wrote:

Hi @RKSimon,

I've double-checked the TableGen multiclass and the generated backend code. The `$src1 = $dst` constraint is correct for these AVX512 3-src instructions—the hardware  always expects the first source and the destination to be tied  together, regardless of operand swapping at the IR level (e.g., when  SwapLR is  set).

When we swap operands logically (with SwapLR), we just make sure that the correct IR operand is mapped to `$src1` in the pattern, so the constraint always applies to the right pair. The constraint itself doesn't need to change.

I've also looked at the file generated by tablegen.

For both VPSHLDVDZ128r and VPSHRDVDZ128r, they each have 4 operands and share the same MCOperandInfo entry (index 3583):
```
{ 18786,	4,	1,	0,	2372,	0,	0,	3583,	X86ImpOpBase + 0,	0, 0xa0b8f8004829ULL },  // VPSHLDVDZ128r
{ 18942,	4,	1,	0,	2372,	0,	0,	3583,	X86ImpOpBase + 0,	0, 0xa0b9f8004829ULL },  // VPSHRDVDZ128r
```

The MCOperandInfo at index 3583 looks like this. The  MCOI_TIED_TO(0) constraint means the second operand is tied to the first (i.e., the first input operand is the same as the destination):
```
/* 3583 */ { X86::VR128XRegClassID, 0, MCOI::OPERAND_REGISTER, 0 }, 
              { X86::VR128XRegClassID, 0, MCOI::OPERAND_REGISTER, MCOI_TIED_TO(0) },
              { X86::VR128XRegClassID, 0, MCOI::OPERAND_REGISTER, 0 }, 
              { X86::VR128XRegClassID, 0, MCOI::OPERAND_REGISTER, 0 },
```

So the operand constraint in the .inc file is correct and matches the instruction definition:
```
VPSHLDVD xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst : Concatenate xmm1 and xmm2, extract result shifted to the left by value in xmm3/m128 into xmm1.
VPSHRDVD xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst : Concatenate xmm1 and xmm2, extract result shifted to the right by value in xmm3/m128 into xmm1.
```

For the more complex VPSHLDVDZ128rkz and VPSHRDVDZ128rkz  instructions, it's similar: the first input operand is still the  destination, and the constraint is preserved.
```
{ 18788,	5,	1,	0,	2370,	0,	0,	1924,	X86ImpOpBase + 0,	0, 0xa6b8f8004829ULL },  // VPSHLDVDZ128rkz
{ 18944,	5,	1,	0,	2370,	0,	0,	1924,	X86ImpOpBase + 0,	0, 0xa6b9f8004829ULL },  // VPSHRDVDZ128rkz
```

MCOperandInfo at index 1924:
```
/* 1924 */ { X86::VR128XRegClassID, 0, MCOI::OPERAND_REGISTER, 0 }, 
             { X86::VR128XRegClassID, 0, MCOI::OPERAND_REGISTER, MCOI_TIED_TO(0) }, 
             { X86::VK4WMRegClassID,   0, MCOI::OPERAND_REGISTER, 0 }, 
             { X86::VR128XRegClassID, 0, MCOI::OPERAND_REGISTER, 0 }, 
             { X86::VR128XRegClassID, 0, MCOI::OPERAND_REGISTER, 0 },
```

So this also guarantees that the first input operand is the destination and the tied constraint is correctly maintained.

https://github.com/llvm/llvm-project/pull/157616