<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/123727>123727</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [X86][AVX512] Generate `valignq` instead of `vpermi2pd`
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            question,
            backend:X86
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
            abhishek-kaushik22
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          abhishek-kaushik22
      </td>
    </tr>
</table>

<pre>
    Consider this ir:

```llvm
define <8 x double> @bar(i64 %x, ptr %arr, <8 x double> %val1, <8 x double> %val2) {
entry:
    %gep1 = getelementptr double, ptr %arr, i64 %x
    %gepload1 = load <8 x double>, ptr %gep1, align 16
    %z = add i64 %x, 8
    %gep3 = getelementptr double, ptr %arr, i64 %z
 %gepload3 = load <8 x double>, ptr %gep3, align 16
    %shuffle1 = shufflevector <8 x double> %gepload1, <8 x double> %gepload3, <8 x i32> <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8>
    store <8 x double> %shuffle1, ptr %gep1, align 16
    ret <8 x double> %gepload3
}
```

Currently, we generate 
```asm
.LCPI0_1:
 .byte   1                               # 0x1
        .byte   2 # 0x2
        .byte   3                               # 0x3
        .byte   4                               # 0x4
        .byte   5 # 0x5
        .byte   6 # 0x6
        .byte   7                               # 0x7
 .byte   8                               # 0x8
bar: # @bar
        vmovupd zmm1, zmmword ptr [rsi + 8*rdi]
 vmovupd zmm0, zmmword ptr [rsi + 8*rdi + 64]
        vpmovsxbq       zmm2, qword ptr [rip + .LCPI0_1] # zmm2 = [1,2,3,4,5,6,7,8]
        vpermi2pd zmm2, zmm1, zmm0
        vmovupd zmmword ptr [rsi + 8*rdi], zmm2
 ret
```

but can't we generate a `valignq` instead of the full permute?
```asm
bar:                                    # @bar
 vmovupd zmm1, zmmword ptr [rsi + 8*rdi]
        vmovupd zmm0, zmmword ptr [rsi + 8*rdi + 64]
        valignq zmm2, zmm1, zmm0, 1
        vmovupd zmmword ptr [rsi + 8*rdi], zmm2
        ret

```
This saves us from saving the pattern `.LCPI0_1` in the binary.

</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJykVktv6zYT_TX0hohBDUU9Flr48flDgS66KIq7KyhrZLHRwyEp39i_viAlxXatm7i3goAMMzrDM49DUxqjDi1iRsSaAMi8UqbC15dX2ZtKvQIQACK2C9nbqtPZo3-Rd8U523StUQVqaitlqNKErwjzb8SGt65PDWGrAkvVIiV8k9B3WnR9XiPh_6MkZLnUBBIVhZSAeCewoUernS21dqtHDIiTrINPfEAgpSReE7bC1urzQItS6vwHPAaU8C09oMUaG2yt23CM8bD9B7HbAHUniyGIsx5o3ERxu7mlrNWhpUF0DXPxeFkU9Db35G4f_u-JXlyAK0v-LEs-z9JUfVnWOCQ7Lk64t52eLf5Umx81Z2J141ccvJNvFAfqkc6AyeCTEU6GmIxoMuLJSFxeA3ljOz03cdecvm6TRvt5GmxF4u3tvA_jv-m1xtbWZxfzO9IDtqilRXr7qTROGctfN7_9wv4Mxhld5meLlNKAfv4Q4JS9ByNP90xIGH0w4-NPReUzyPApZDiDFKNPzPii0RfN-OKndoxvi5Y8BXESc4cOX_l_jEfQlcCp6U79saCXpvEzcWma750uhlERa20UJbCmCYGVLhQRrv-3IPY1yNtROGKnfY9NdzLv-du4vjSNF8HbXSB19OCPsRFbn4X72GuUiLVj7ZBOOU40Ti9OKk4lyT_3RN0oGIjDSHzKms0X5YtiDFg_fRrtozby3tK9bAnE9k4akpKInbwA30jEqGqNRVnQrqS2Qlr2dU0d294i4btHIY0tfeK57_pPtfuxKj_d9SHj-frDhgb_tQnjM_bivh2_u59tI09oaG9oqbvGrVR78CU_SmtRt64vH-PmG-O9uWqlPi8JWy2KjBcpT-UCsyDmcShEmoSLKuPA9xHGIk1ECixMIQ9FWSaFjLiMy5QvVAYMBAsgCBhPw3CZYBBEMURlGqPgASchw0aqeunuEctOHxbKmB6zAHgM8aKWOdZmvMS89Wis6lp3dYENAcjl_hXbgvDVtyQaLzQ6c5Fe8v5gSMhqZay5xrbK1v5K5L4XWyLWqz--icAB6f8_jvAfTalzTHIiEVv0us4qa4_GHe2wI7A7KFv1-XLfNQR2_mY0_Hk56u4v3FsCO5-eIbAbMzxl8HcAAAD__77GoY4">