<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/103564>103564</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
Interleaving two vectors of 32 x u8 prefers `vpermi2b` over `vpermb`??
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
Validark
</td>
</tr>
</table>
<pre>
This code:
```zig
const std = @import("std");
const V = @Vector(32, u8);
export fn interlace(a: V, b: V) @Vector(@sizeOf(V) * 2, u8) {
return std.simd.interlace(.{ a, b });
}
```
LLVM version:
```llvm
define dso_local <64 x i8> @interlace(<32 x i8> %0, <32 x i8> %1) local_unnamed_addr {
Entry:
%2 = shufflevector <32 x i8> %0, <32 x i8> %1, <64 x i32> <i32 0, i32 32, i32 1, i32 33, i32 2, i32 34, i32 3, i32 35, i32 4, i32 36, i32 5, i32 37, i32 6, i32 38, i32 7, i32 39, i32 8, i32 40, i32 9, i32 41, i32 10, i32 42, i32 11, i32 43, i32 12, i32 44, i32 13, i32 45, i32 14, i32 46, i32 15, i32 47, i32 16, i32 48, i32 17, i32 49, i32 18, i32 50, i32 19, i32 51, i32 20, i32 52, i32 21, i32 53, i32 22, i32 54, i32 23, i32 55, i32 24, i32 56, i32 25, i32 57, i32 26, i32 58, i32 27, i32 59, i32 28, i32 60, i32 29, i32 61, i32 30, i32 62, i32 31, i32 63>
ret <64 x i8> %2
}
```
For some reason, results in this emit for Zen 4:
```asm
.LCPI0_0:
.byte 0
.byte 64
.byte 1
.byte 65
.byte 2
.byte 66
.byte 3
.byte 67
.byte 4
.byte 68
.byte 5
.byte 69
.byte 6
.byte 70
.byte 7
.byte 71
.byte 8
.byte 72
.byte 9
.byte 73
.byte 10
.byte 74
.byte 11
.byte 75
.byte 12
.byte 76
.byte 13
.byte 77
.byte 14
.byte 78
.byte 15
.byte 79
.byte 48
.byte 112
.byte 49
.byte 113
.byte 50
.byte 114
.byte 51
.byte 115
.byte 52
.byte 116
.byte 53
.byte 117
.byte 54
.byte 118
.byte 55
.byte 119
.byte 56
.byte 120
.byte 57
.byte 121
.byte 58
.byte 122
.byte 59
.byte 123
.byte 60
.byte 124
.byte 61
.byte 125
.byte 62
.byte 126
.byte 63
.byte 127
interlace:
vinserti64x4 zmm2, zmm0, ymm0, 1
vmovdqa64 zmm0, zmmword ptr [rip + .LCPI0_0]
vinserti64x4 zmm1, zmm1, ymm1, 1
vpermi2b zmm0, zmm2, zmm1
ret
```
What is this??
I can break apart the steps for how an interleave could work like so:
```zig
export fn join(a: V, b: V) @Vector(@sizeOf(V) * 2, u8) {
return std.simd.join(a, b);
}
export fn interlaceJoined(v: @Vector(@sizeOf(V) * 2, u8)) @Vector(@sizeOf(V) * 2, u8) {
const iota1 = std.simd.iota(u8, 32);
const iota2 = iota1 + @as(@Vector(32, u8), @splat(32));
return @shuffle(u8, v, undefined, std.simd.interlace(.{ iota1, iota2 }));
}
```
I get:
```asm
join:
vinsertf64x4 zmm0, zmm0, ymm1, 1
ret
.LCPI2_0:
.byte 0
.byte 32
.byte 1
.byte 33
.byte 2
.byte 34
.byte 3
.byte 35
.byte 4
.byte 36
.byte 5
.byte 37
.byte 6
.byte 38
.byte 7
.byte 39
.byte 8
.byte 40
.byte 9
.byte 41
.byte 10
.byte 42
.byte 11
.byte 43
.byte 12
.byte 44
.byte 13
.byte 45
.byte 14
.byte 46
.byte 15
.byte 47
.byte 16
.byte 48
.byte 17
.byte 49
.byte 18
.byte 50
.byte 19
.byte 51
.byte 20
.byte 52
.byte 21
.byte 53
.byte 22
.byte 54
.byte 23
.byte 55
.byte 24
.byte 56
.byte 25
.byte 57
.byte 26
.byte 58
.byte 27
.byte 59
.byte 28
.byte 60
.byte 29
.byte 61
.byte 30
.byte 62
.byte 31
.byte 63
interlaceJoined:
vmovdqa64 zmm1, zmmword ptr [rip + .LCPI2_0]
vpermb zmm0, zmm1, zmm0
ret
```
Yet, when I put them together:
```zig
export fn joinThenInterlace(a: V, b: V) @Vector(@sizeOf(V) * 2, u8) {
return interlaceJoined(join(a, b));
}
```
LLVM decides to "optimize" it to the version with `vinserti64x4+vmovdqa64+vinserti64x4+vpermi2b`. What gives? [Godbolt link](https://zig.godbolt.org/z/338nd7WcT)
Shouldn't it just be `vinsertf64x4+vmovdqa64+vpermb`?
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJy0WVFv2zgS_jX0y2ANcShK8oMfmqQ-5NDDHXBFF3cvhWzRNhtJ9Im00-bXHyhZlKNwNi0WWwQOw48cfpwhOd-4pbX60Cq1ZvKOyYdFeXZH062_lLWuyu5psTXVj_Xno7awM5Vi4gNLHlgyfmbJ8POiD0PPzrTWgXUVMPEALE10czKdY1gwROsqhshwxcTd7fAv4-AvaudMx7AQyPAezsXt2OFTfff2YN-Cbp3q6nKnGBYlEx_gi5-zvbZWr-yxNLH6Rf1zz7AYQPwA0xrA8usiAACdcueu9ZtYWt1Uy9uFliy_g7JfCFj-8Ipf_jDzyi3vT5--_AMuqrPatJQX6_rSDF2V2utWQWXN19rsyhqYuM9S-A66YOJj79cbUkzcC5xAlIkn-KaX-5325r6e27ZsVPW1rKpu2vzH1nU_Ajnwc7APjT2e9_taXXp3vjVMLnc_8RbYd4t7LRD6Cb4xxNm3eOgSYytgIg2t0JBja8KysRUwkY-tgIlibAVMrMZWwNLAMGBpYMgDmE70A5oGjjygaSDJA5oGljygaaDJp_0FnjygaSDKA5oGpjygMjDlAZWBKQZUBqYYUDnFIaAyMMWAysAUAyoDUwyoDExxClRgigGVgSkGNAtMMaDZdGICmk1HJqCZYOLjeKI75eZXCSX-zP3dmA6saRR0qrSm9dY7Zc-1s6BbcP6BVI12sDcd_Fe1kFKXvLTXO778dP-vx-RrcnPhhn_L7Q-nACCJd2fptX_s4MQ4Ge_HWHeWxQcLwnYe759Ty4r4OIJatopyiw_OCf8Q1HI-40ZQy6P-AYhSywn_cIpbSoyfk8sJD3GCXR51Eifo5YSPOEEvL-ZHjqCXR70E_sGK0aN2kxJmOLUhmcwJUluRxG3h8S1JgiHnxKmU1Ing-YyiJM8CdWnkm00SfpIEOY7Rcymp44CEsyTBkCPON0lFEqmXJcqQI-GsjAonUk_MnCFHwllZlCHHq7MmATZ_vy-6tapzOku_p_7vl6bp89JL0_SZ6sf190j90phL9b8yS6_zx3EvTfNsugpOrgMm7zp9AoZ3EPKGfPiDdV-ahl-N8Oui_HbRcc5JdY3GLbxdG8P0YUan3B_kx9-PpQNt-0zIxMb_3KCPsCtb2HaqfILyVHYO3FGBdepk-4x5NM9QjpJelRcFO3OuK3g23RPU-kmBNe_WHlNp8M3o9q-uCsIavfloHUAVLH83ulUVw-Liaf0Coz-1gaHW0saVfBD1ob4xrmRYnHu5JV7VZzeThkrgOh3vPI_SDhyiZZsX_2liT3XprsjqleXJpX7YUGAEFpfeTjuUQZX_g67Gekq92htYDlXZTxdmj3BQ7l2x1kebuOj78cIls1seu3DTLZpEIJIikJARgspKxHAxe_OI6YJ4ZKMPoSAeWMoGlS9n1ASRiyh9Gs1EhA1BJCMim6VzUUGJGyoHRaOXkoqCsE4JinkiSyk9QRhIowGkdFNKSQrqGMzlDiWY4tIQgEeDSCpDSjMRNygop_d0YVwxASEMScFExACJwyDnhRQllyRV5EXDSOlCSixRwjBIpvcEIRIG4rIQCStxTQiAxGHI5tWUoEpp6iGMhnEUhPM8_iYtvNVz_D09hzE957XZ9pUm4yHD_Iwm-49yfvzzUbXwCKdzr7oacOag3FF1vyioPh9V-_gXf-f6ViK9kVm_9o1rpXa6UhacAYZoTk43-kUxRNDOd3oZev1WFp61OwLLklsdzfAuhNO3Z9BVPbMsWUKvgA_6orz-9QH-m6m2pnZQ6_bJBxeLo3Mn692OG4abF31YHoYxS9MdfA_DjRBFW-W_7z6z8XgPn_8-ekXcMsyd5_7tbB1s1Q3ffYxvf4S8Y8RmUa1FtRKrcqHWPEeRJYXkfHFcy5WSWY4yywqZqEptk1WFpVJFttqh2mULvcYE06TgaZImheDLXVbxqqxktVVpLnc-1Kopdb2s60vj97LQ1p7VmidCZumiLreqtv3_MCC26hl6lCEy-bDo1n7Sb9vzwbI0qbV1djLjtKvV-nEsC3R7APdsYPgi2oLZQ_-d87mAU6f2qrO9P6aogLmoLvRdHeF9ce7q9etoHLQ7nrfLnWkYbvpv4odfv506803tHMNNT9sy3Fz3dVnj_wMAAP__zAFE2g">