<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/62365>62365</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            Improvements to buildvector codegen
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          preames
      </td>
    </tr>
</table>

<pre>
    Looking at the examples below, we've got a couple of possibilities for ways to improve generic buildvector codegen. Please take the follow as a list of ideas; not all of these may work out. Note that I'm also talking about the generic case with no repeated elements, etc..

For vectors with power of two lengths less or equal than 64 bit, we can do shift/or on the scalar side + a single scalar-vector move.  This may require a VTYPE toggle, but that's likely cheaper than a series of inserts.

For vectors with power of two lengths greater than 64 bits, we can group into 64 bit chunks.  This reduces the number of vector instructions and I to V moves, at the cost of extra scalar work.

We should be able to use either vslide1up or vslide1down.  If we can exploit the undefined tail property, we should be able to do this without individual VL toggles between inserts.  Note that this requires undefined tail, *not* simply tail agnostic.  Combined with the above, we should have one vsetvli + VLEN/64 inserts.  

Note that the case where VLEN=128 is particularly important - as it is the minimum guaranteed by V, and thus what SLP is able to target by default.  

```
$ cat buildvector.ll 

define <2 x i32> @buildvec_2xi32(i32 %a, i32 %b) {
  %v1 = insertelement <2 x i32> poison, i32 %a, i32 0
  %v2 = insertelement <2 x i32> %v1, i32 %b, i32 1
  ret <2 x i32> %v2
}

define <4 x i32> @buildvec_4xi32(i32 %a, i32 %b, i32 %c, i32 %d) {
  %v1 = insertelement <4 x i32> poison, i32 %a, i32 0
  %v2 = insertelement <4 x i32> %v1, i32 %b, i32 1
  %v3 = insertelement <4 x i32> %v2, i32 %c, i32 2
  %v4 = insertelement <4 x i32> %v3, i32 %d, i32 3
  ret <4 x i32> %v4
}

```

```
$ ./opt -S buildvector.ll -O3 | ./llc -mtriple=riscv64 -mattr=+v
        .text
        .attribute      4, 16
        .attribute      5, "rv64i2p1_f2p2_d2p2_v1p0_zicsr2p0_zve32f1p0_zve32x1p0_zve64d1p0_zve64f1p0_zve64x1p0_zvl128b1p0_zvl32b1p0_zvl64b1p0"
        .file   "buildvector.ll"
        .globl  buildvec_2xi32 # -- Begin function buildvec_2xi32
        .p2align        2
        .type   buildvec_2xi32,@function
        .variant_cc     buildvec_2xi32
buildvec_2xi32: # @buildvec_2xi32
# %bb.0:
        vsetivli        zero, 2, e32, mf2, ta, ma
        vmv.v.x v8, a1
        vsetvli zero, zero, e32, mf2, tu, ma
        vmv.s.x v8, a0
        ret
.Lfunc_end0:
        .size   buildvec_2xi32, .Lfunc_end0-buildvec_2xi32
                                        # -- End function
        .globl  buildvec_4xi32                  # -- Begin function buildvec_4xi32
        .p2align        2
        .type   buildvec_4xi32,@function
        .variant_cc     buildvec_4xi32
buildvec_4xi32: # @buildvec_4xi32
# %bb.0:
        addi    sp, sp, -16
        sw      a3, 12(sp)
        sw      a2, 8(sp)
        sw      a1, 4(sp)
        sw      a0, 0(sp)
        mv      a0, sp
        vsetivli        zero, 4, e32, m1, ta, ma
        vle32.v v8, (a0)
        addi    sp, sp, 16
        ret
.Lfunc_end1:
        .size   buildvec_4xi32, .Lfunc_end1-buildvec_4xi32
                                        # -- End function
        .section        ".note.GNU-stack","",@progbits
```
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJysV1uPozgW_jXOy1EQGEIlD3moS_eqpVJvSzNbq30qGXwAbxmbsQ1J-tePbEiKXKqnZ9SlEhxsn9t3Ph_HzFpRK8QtWT2Q1dOC9a7RZtsZZC3aRaH5Yfus9ZtQNTAHrkHAPWs7iRYKlHpH6CPskNC7AaHWDhiUuu8kgq6g09aKQkjhBFqotIEdO1hwGkTbGe01UKERJRS9kHzA0mkDpeZYo4rgm0RmERx7w-C40lLqHTALDKSwzrsQHJkl6QMo71tKP-YatAgtO8BOmzfQvYvgq3beCHPwhdC7Fpi0GhyTY2KF7sfcjvGU3vFOuAaUBoMdMoccUGKLylmfM7oyikj8ROL78flZGxgzsKNmp3doQjw7DRJV7RoLEq0FbQD_6Jn0ASnIMyiEG3GEkingGmwjKkeot6lViMyWTDIDVnAEQh-AgRWqlseJ5QReqweMAH5vhA0IGPyjFwaBwcvv__v2CZyua4neWRFSZo7QOwtSvKE8QNkg69CMcTGwaHzhPMzKonH2HyRcG4-dmadqZ7nWRvcdCOX0NAll06s3e8zBIO9LtAEC1bfF6GBKVijrTF86oZUFpjh88dx6CSAEJxNjSz2SBffOsCOSnhtn-fwXwTa6lxwKBFZI9MZ6i4DCNWhgsFJwTPrO12_64HqnIoAv1TEf3HdSi9FtrzhWQiEHx4SEzugOjTtM2V_74hqcz9mD6QkpFBeD4J4oL89T5fyucztEdSoJzLjtRshCye2Ff--W0HulHaH3YEXbycMYGKuVtk6UEcCjbougEQrqk2CFHvA85IYNCFohDBbdIEXg48vzp6-Efs6zWWBzdOdB4rS_GjQ4KqZPCV2DsNAx40TZS2bkwXcJbRxTDpZ-1wvnV3j1VijR9i3UPTNMOUQOxQFeQskVB9f0Fnbe12_P37zOEWHHTI3Or-VYsV66iyBJHk__4yfNoGRu3p0iKc80RoSBpI8U9iBSStJPQLL4qPJK936QrkVKgdAV8zFOckHoBsjdw2gJ_NCQAEmfJgindnNhvNPCajUzczIZzw3RvzQU3J2HM8rJ0Y7BWzp0yv_u6TYQ2W0gsh8CcZLLmcx_HqDsVwGU_R2A_Ir0p-zQWynSuZ3sp-yk5_CMcnpRsAud7GbBLrn-0QaI_DHUOVj-drkPlv9Ogdw9hhVSlrBsnRGdRJI-GWHLIc9g2TLnDEmfCH0Yjo42kcO9e__yS0TROyTxJvM5JfntydXYw6gZ8kzQLnmtaEdfuX8MSRe_fhelNdQLA6a0So7SfpLyjJ-k6iRNszKh62ISU3qU8sxLhNL3iCohfTCE0nM8zhbVUheSxJvzPgCEprBcwgPWQkHVq3B2wUWzOBnpKJOiViTezAbdocMrw4Q-kiw-GnxfPDAjmHKvZXmtEhZdjKX3IcTrBjbRIQ37oIhikt6fvPhjQAxSkHjzHY32RQp0xxAXtFV4ubANW_au1g7REO29tA6dOzmzeG7w-L6w2d-waec2T-zeGJwoFz17nF5R8bMsIiu-3wQWZgrLW7DAT_5Nxf-kOFxX6oowoV9-aOQjBmX_gEHZ32dQdoNB2UcMyv6SQYxzX2zbebjH53LWBOzOrwmtL_FHiF-xuZgNlVp_MBm6ePbBZOwn46vJdjhN2u5HXM_mvExuU11iSqPhREtC197y5kcAzPK_Qd3kh9TNrqmbLG_V45dQ1-I04HtipLTD6F9f_7O0jpVvvinSx_CcKNYZXYdLwPmBs-DblG_SDVvgNsnXNFslaZosmu1dxtN8U-aMVnGxrjiu0zwp10VRFetNSpOF2NKYpnFGV8kmXaVJlNN4tV7n-d2KI655RrIYWyZkJOXQRtrUC2Ftj9ucpvlqIVmB0ob7L6UKdxAmfbirp4XZep1l0deWZLG_dNp3K044idsv41U2XAz9b8wbV9lFb-S2ca6zvmr0M6Gfa-GavohK3YbDczi-lp3R_8fS3_1CHNb_pvZx_hkAAP__jb2ruA">