<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/67080>67080</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [X86] `i8` -> `half` vector conversion is scalarized
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            bug,
            backend:X86,
            performance,
            llvm:SelectionDAG
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          dcaballe
      </td>
    </tr>
</table>

<pre>
    The following vector conversion seems to be scalarized when compiled for AVX-512: 

```
define <64 x half> @test(<64 x i8> %int8) {
    %fp16 = uitofp <64 x i8> %int8 to <64 x half>
    ret <64 x half> %fp16
}
```
`llc test.ll -mcpu=cascadelake`

```
 vpmovzxbw      %ymm0, %zmm1    vextracti32x4   $3, %zmm1, %xmm4
        vpextrw $7, %xmm4, %eax
        vcvtsi2ss       %eax, %xmm2, %xmm2
        vcvtps2ph       $4, %xmm2, %xmm2
        vmovd   %xmm2, %eax
        vpinsrw $0, %eax, %xmm0, %xmm5
        vpextrw $6, %xmm4, %eax
        vcvtsi2ss       %eax, %xmm3, %xmm2
        vcvtps2ph       $4, %xmm2, %xmm2
        vmovd   %xmm2, %eax
        vpinsrw $0, %eax, %xmm0, %xmm6
        vextracti32x4   $2, %zmm1, %xmm3
        vpextrw $7, %xmm3, %eax
        vcvtsi2ss       %eax, %xmm7, %xmm2
        vcvtps2ph       $4, %xmm2, %xmm2
        vmovd   %xmm2, %eax
        vpinsrw $0, %eax, %xmm0, %xmm7
 ....
```

Interesting enough, the `float` -> `half` vector conversion seems to be supported:
```
define <64 x half> @test(<64 x float> %int8) {
    %fp16 = fptrunc <64 x float> %int8 to <64 x half>
 ret <64 x half> %fp16
}
```
```
        vcvtps2ph       $4, %zmm0, %ymm0
        vcvtps2ph       $4, %zmm1, %ymm1
        vinsertf64x4    $1, %ymm1, %zmm0, %zmm0
        vcvtps2ph       $4, %zmm2, %ymm1
        vcvtps2ph       $4, %zmm3, %ymm2
        vinsertf64x4    $1, %ymm2, %zmm1, %zmm1
        retq
```

So probably the intermediate `i8` -> `float` step in the conversion is where the implementation gap is.

</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzMVk2T4yYQ_TXoQtmFQR_WQQfvOk7lvKnUXhFqWWRBEEAej399CsljaTJ2ZpK9TJVLxvD69aO_LO69PPYAFcq-oGyf8CF0xlWN4DVXCpLaNM_V7x3g1ihlnmR_xCcQwTgsTH8C56XpsQfQHgeDa8BecMWdvECDnzrosTDaSgUNbo3Duz--r7INRWyHEdkjsrs-c3L9jD8baGUPGLGveYrPuOOqRewXjFISwAdEty8ncjvu00z2YYtoiVHxZaLAGMf91m5yjNgeDzKY1uK7hlH4P3zNJA7CWyET8VV7sb97CZQTpQSOitdK4ZUWdkBsL7gXvAHFf8AMvWePT1ab0-VcPyFSIpo9a00Q_RqdX7TeIFKe4BwcF0Eyek5HUMqWiGl51jp9cVOebLSZGNNiiZiWwM8zVpyCl9T7yX88uuHpcrk0sJ7abqJP34Nrc2om7gXklQIre_-iliwANzYyL7P7l8z__yXZ57tkPpu9TT69m3z2bvLZf4xL8fniUlx7Zr1er-934_j8rQ_gwIc4xaA3w7GLFKEDjHLSKsMDyglejT2ek7Hdc_LevBusNS5Ag9hPTrNJwAcHWmuDG3qBHxk_Hmo_M9Feh_RR0i9zbsap9T58M8M3M1z2Hlxo8_Slwl_B3vi6fMwXvevrIZzNcPoxaXf68LL05SD89S9F-s1g60zNa_U8lqaMRauhkTyMdSq3yyK9Va0PYLHsR5NFqUof_4YdTFTaKtDQBx7i2ZFbLP21YZKmYk3JSp5AtcnLjKVpXrCkq_KszArB0yKr84IBiE3R0k1ZQskZNNsmkRUllJGSUrIhKWPrrNym0YSXWZalpEUpAc2lWit10mvjjon0foAqL8iWJIrXoPz4_kFpPRwRjQGMay5-QB_76vs2v-1acK1xmvcCbnuRF7HdN1Ag4s32u1_jWbZPXBXPVvVw9CglSvrgZxVBBjW--ET-bP82uI8ngPSLF51kcKrqQrA-zgB6QPRwlKEb6rUwGtHDKG_6Wlln_gQRED2MMfCIHsYw_B0AAP__pSqkkA">