<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/78897>78897</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[x86] Bad optimization of a multiply by a scalar register that came from a vector
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
Validark
</td>
</tr>
</table>
<pre>
[Godbolt link](https://zig.godbolt.org/z/r8TxGx3vY)
I have the following code:
```zig
const std = @import("std");
export fn produceShuffleVectorForByte(x: u8) @Vector(16, u8) {
const unique_bytes: @Vector(8, u8) = @bitCast(@as(u64, 0x8040_2010_0804_0201));
const splatted = @as(@Vector(8, u8), @splat(x));
const selector = (splatted & unique_bytes) != unique_bytes;
const vec: u64 = @bitCast(@select(u8, selector, @as(@Vector(8, u8), @splat(0b00010001)), @as(@Vector(8, u8), @splat(0))));
const prefix_sums1: @Vector(8, u8) = @bitCast(((vec) *% 0x1111111111111111) << 4);
const prefix_sums2: @Vector(8, u8) = @bitCast(((vec ^ 0x1111111111111111) *% 0x1111111111111111) << 4);
const interleaved_shuffle_vector = @select(u8, selector, prefix_sums1, prefix_sums2);
return @shuffle(
u8,
interleaved_shuffle_vector << @splat(4) >> @splat(4),
interleaved_shuffle_vector >> @splat(4),
std.simd.interlace(.{ std.simd.iota(i32, 8), ~std.simd.iota(i32, 8) }),
);
}
```
LLVM currently tries to optimize prefix_sums1 by operating directly on the vector we got `vec` from.
```diff
.LCPI0_0:
.byte 1
.byte 2
.byte 4
.byte 8
.byte 16
.byte 32
.byte 64
.byte 128
.zero 1
.zero 1
.zero 1
.zero 1
.zero 1
.zero 1
.zero 1
.zero 1
.LCPI0_1:
.byte 17
.byte 17
.byte 17
.byte 17
.byte 17
.byte 17
.byte 17
.byte 17
.zero 1
.zero 1
.zero 1
.zero 1
.zero 1
.zero 1
.zero 1
.zero 1
.quad 1229782938247303440
.quad 1229782938247303440
.LCPI0_3:
.quad 286331153
.quad 286331153
.LCPI0_4:
.byte 15
.byte 15
.byte 15
.byte 15
.byte 15
.byte 15
.byte 15
.byte 15
.zero 1
.zero 1
.zero 1
.zero 1
.zero 1
.zero 1
.zero 1
.zero 1
.LCPI0_5:
.zero 16,15
.LCPI0_6:
.quad 1229782938247303440
produceShuffleVectorForByte:
vmovd xmm0, edi
vpxor xmm1, xmm1, xmm1
- vpbroadcastq xmm3, qword ptr [rip + .LCPI0_6]
movabs rcx, 76861433640456465
vpbroadcastb xmm0, xmm0
vpand xmm0, xmm0, xmmword ptr [rip + .LCPI0_0]
vpcmpeqb xmm0, xmm0, xmm1
vpand xmm1, xmm0, xmmword ptr [rip + .LCPI0_1]
- vpmuludq xmm4, xmm1, xmmword ptr [rip + .LCPI0_3]
vmovq rax, xmm1
- vpsrlq xmm2, xmm1, 32
- vpmuludq xmm1, xmm1, xmm3
- vpmuludq xmm2, xmm2, xmm3
xor rcx, rax
movabs rax, 1229782938247303440
imul rax, rcx
- vpaddq xmm2, xmm4, xmm2
- vpsllq xmm2, xmm2, 32
vmovq xmm3, rax
- vpaddq xmm1, xmm1, xmm2
vpblendvb xmm0, xmm3, xmm1, xmm0
vpand xmm1, xmm0, xmmword ptr [rip + .LCPI0_4]
vpsrlw xmm0, xmm0, 4
vpand xmm0, xmm0, xmmword ptr [rip + .LCPI0_5]
vpunpcklbw xmm0, xmm1, xmm0
ret
```
The bad optimization is highlighted in red.
I tried sticking an xor with 1 into the `prefix_sums1` calculation.
```diff
- const prefix_sums1: @Vector(8, u8) = @bitCast(((vec) *% 0x1111111111111111) << 4);
+ const prefix_sums1: @Vector(8, u8) = @bitCast(((vec ^ 1) *% 0x1111111111111111) << 4);
```
Based on that, I got a much better emit. With the xor with 1 removed, here is the better assembly I got:
```asm
.LCPI0_0:
.byte 1
.byte 2
.byte 4
.byte 8
.byte 16
.byte 32
.byte 64
.byte 128
.zero 1
.zero 1
.zero 1
.zero 1
.zero 1
.zero 1
.zero 1
.zero 1
.LCPI0_1:
.byte 17
.byte 17
.byte 17
.byte 17
.byte 17
.byte 17
.byte 17
.byte 17
.zero 1
.zero 1
.zero 1
.zero 1
.zero 1
.zero 1
.zero 1
.zero 1
.LCPI0_2:
.byte 15
.byte 15
.byte 15
.byte 15
.byte 15
.byte 15
.byte 15
.byte 15
.zero 1
.zero 1
.zero 1
.zero 1
.zero 1
.zero 1
.zero 1
.zero 1
.LCPI0_3:
.zero 16,15
produceShuffleVectorForByte:
vmovd xmm0, edi
vpxor xmm1, xmm1, xmm1
movabs rcx, 76861433640456465
movabs rdx, 1229782938247303440
vpbroadcastb xmm0, xmm0
vpand xmm0, xmm0, xmmword ptr [rip + .LCPI0_0]
vpcmpeqb xmm0, xmm0, xmm1
vpand xmm1, xmm0, xmmword ptr [rip + .LCPI0_1]
vmovq rax, xmm1
xor rcx, rax
imul rax, rdx
imul rcx, rdx
vmovq xmm1, rax
vmovq xmm2, rcx
vpblendvb xmm0, xmm2, xmm1, xmm0
vpand xmm1, xmm0, xmmword ptr [rip + .LCPI0_2]
vpsrlw xmm0, xmm0, 4
vpand xmm0, xmm0, xmmword ptr [rip + .LCPI0_3]
vpunpcklbw xmm0, xmm1, xmm0
ret
```
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzcWV-PmzoW_zTOy1EjY_6EPOShM9O5qtSVVtqrrvZpZLCTeGswY5tM0of97CsbyAAhmaSd3rt7UVomx8e_8xf4nUCNEZuS8xWK71D8MKO13Sq9-kqlYFR_m2WKHdzab4plSlqQovyG4gdE0q21lUHhR0QeEXn8LjbzTaMzV3rjJIg86vT3_W_7cPcvRJYIPyD88TNs6Y6D3XJYKynViyg3kCvGHZLXaP9PcPP5LjaNJFelsWAsAxQ-AIqwKCqlLSIpIsRYhghxVsK7PgzfOx1Yl1Bpxeqc_2Nbr9eSf-W5VfpR6buD5YikexR-hDpFZOmQm1VE0iBB5L6TL1pkAIDGmboUzzV_yg6Wu0wMtqa9nY2_mbD31HiHI0wNImmdRE4L71Mc4SeCA_yEUxw9YYIDF0svnNaiqSS1lh9z4GGmzbo_UIT9Dh_hCPAYheHS724wSfpqgyTDEF0sJHBqw8hPIHc89wlNosngG4suAd7dzoHW4-tjwhnGOHD_2uBuBuj2naRmHFGl-Vrsn0xdmOCmUvuPy4dP3kdEYsD7YHQ0W-9ReA_RdJF69skP2QcUfzpn-Va3xs6J0nItOd1x9mSaC-xp1-upixUfZHb4feKCdkY1t7UuPWxjzAV5XHWHtzIUXfTRh9jri6iJ_BMKP43FtwFfRjCWzY0o2LzBoLmLZI4Wd70VZSkiqQiJy03Xwv-5sA5o8TCwMkji4mF0f-0n98uXr3-DvNaal1YewGrBDVgFqrKiEN_5oFaQHUBVXFPrbuFMaJ67Tar0N_c2Ay8cNsoCSrC7BBIMa62K-eSNnon1uhHNv9z__TN-wsdHQpfqubvfAEAwJSbTutG0OD2DnLTyThCegU3O4AbjVpx_51qdeOyFp4FcJ_653VN-tRkPzmd8cYv8Nu0r5DcDXBHz_1ItnmvKICBkuUjJMkxJtAhxGEV4Sg3OK7ZlDF_L2G0haRKGQRCH04ij5RYn6uF0iY7PFOAm-fug9ORnFX_8kvgj26PNd3x6-TUeORbahdbqJqe6l5voEv8dQ-0KtXNtsS8K7B4rnImRQrVXulHwz-zB2Wt-OGpmWlGWU2OfodkROs3nF6UZVFYDiu-0qACROzjGFj8M7RVqRzMDoPO927xI0iSIwjCJcBQnUdJVvWcsg57__jwKgJZsrNGcLziGTxzbVXlR8ees-z6BF0xZ7iXsCrvB0e5rWota1uy5ZzcaFeICXngaR6F2DkzT_biOu8po-exNkL6J7sl4waVxb4SvmCe6HTYZ6h41fL8dO8D52awfW6Px_M2bqChqCa-ROryjU5SxYaDR0alRqEaOUkL6KRlmtWv5o8-vMEeD40yNgapM8pLtsn7DhKM951v8hlaLJlrcaPly2trRe1xR8YS5uqzybzJ7ObmixoFqbi-Q2d-3HDLKOvJKrVAlCANbsdlKsdm6CVeUoDkb8NHPnvYyMFbk3xyzpaVvvhdhtxA4tq88wUUJHkwuCYacyryW3tBbFPfDnzxYuhK8mwd-tPzBSXKqcnfUcNYMEm5ouofPfoigUNT5FjJuLdfAC2Hn8E9XFVeOXok0L9SOM7dxyzV3NXca7T5qDC8yeWhAz_3qRE1xbha5OITATWPIW0PISP7mLPLGEPL_woOuGEN-eqIYyN9_Xnn3Ce2XlKHNNzmf7x8m3n_OXPDrBuV3Ap2Y0DqFPsn_g8l6p3klzR5rszeY11-flvdrMUmjz_PX9hjTUnZuPZ9e7yx3EZwY6CmQAfHtkcuTZJFfQTHJz1HM23pjYtS5nmC2xynPnLFVyJbhks74KljgOI0IDoPZdhUnOEhInofrZUDTeJ1mhIcZSdia8SyJFzOxIphEOCBBEGJM4vkiopQs8ZqwPE8ppyjCvKBCzqXcFXOlNzNhTM1XizRdLmaSZlwa_7KOkJK_gF9ExOV0plduz4es3hgUYSmMNa8oVljp3_LtUzdfw92YGqu1Z1jSikoeIDsABZNTSTVovhHGcSfHxyCnBfc_4wJtf-Wd1Vquhq8DN8Ju62yeqwKRR-dDe_pQafVv_ybg0XtuEHn0kf03AAD__xnCdcE">