<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/59356>59356</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
Multiplication by 22, 23, 26, 28 and 29 for address generation optimized worse than native imultiplication on x86
</td>
</tr>
<tr>
<th>Labels</th>
<td>
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
jarikirma
</td>
</tr>
</table>
<pre>
Seen on a microbenchmark on Zen2 system, but on the basis of Godbolt output it seems like an issue (on basis of ovecomplicated instruction lowering) across modern x86-64 implementations on Clang since version 5. Below are the -O3 benchmark results for constants where this performance regression occurs:
MULTIPLIER=22
Optimized: 0m2.926s
IMUL: 0m2.281s
MULTIPLIER=23
Optimized: 0m2.646s
IMUL: 0m2.281s
MULTIPLIER=26
Optimized: 0m2.937s
IMUL: 0m2.276s
MULTIPLIER=28
Optimized: 0m2.920s
IMUL: 0m2.274s
MULTIPLIER=29
Optimized: 0m3.247s
IMUL: 0m2.269s
This instruction lowering appears to happen on all optimization levels including -Os, apart from -O0.
The benchmark is a simple list walker with a per-step constant multiplication:
```
#include <stdint.h>
static const uint8_t table[] =
{ 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 } ;
int main(int argc, char **argv)
{
uint8_t val = 0;
for (uint32_t i = 0; i < 1024 * 1024 * 1024; i++)
{
val = table[MULTIPLIER * val];
}
return val;
}
```
Example output from Clang 13 for MULTIPLIER=29:
```
.text
.file "test.c"
.globl main # -- Begin function main
.p2align 4, 0x90
.type main,@function
main: # @main
.cfi_startproc
# %bb.0:
movl $1073741824, %ecx # imm = 0x40000000
xorl %eax, %eax
.p2align 4, 0x90
.LBB0_1: # =>This Inner Loop Header: Depth=1
leaq (%rax,%rax,8), %rdx
leaq (%rdx,%rdx,2), %rdx
addq %rax, %rdx
decl %ecx
movzbl table(%rax,%rdx), %eax
jne .LBB0_1
# %bb.2:
# kill: def $eax killed $eax killed $rax
retq
.Lfunc_end0:
.size main, .Lfunc_end0-main
.cfi_endproc
# -- End function
.type table,@object # @table
.section .rodata,"a",@progbits
.p2align 4
table:
.byte 1 # 0x1
.zero 288
.size table, 289
.ident "Debian clang version 13.0.1-6~deb11u1"
.section ".note.GNU-stack","",@progbits
.addrsig
```
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzsWE9v2z4S_TT0ZWBBomRZOvgQx0k3QLpZ7LaXvQSUOJbZUKRKUo6Tw372BSnLf1In6A_oJcAvMBSZmnnzZvgoDs2sFY1CXJDZksxWE9a7jTaLH8yIJ2FaNqk0f1n8B1GBVsCgFbXRFap60zLz5Mf-i4qCfbEOW0KvoeqdH3UbhIpZYUGv4YvmlZYOdO-63oFwYBFbC1I8ITAFwtoegdBCq6OT3mKt206KmjnkIJR1pq-d0AqkfkYjVENoCaw22lpoNUejYFfk0zwD0XYSW1SOeXvrCV1LphqwQtUIWzTW48wiWKLUz8AMBsbThxSOyRm0vXQW1tpArZV1TDkLzxsM1sJCh2atTcs8psHGoA2wuq57Y0l6ReIViffXr9_vv9396_7u5t8kXVE6DD50TrTiFTlJryBuaVTS3A6P7r5-vx9HaZHY97HSd7Dy7K9j5e_xSueXsOb5B1jFuznGF7GyD7DKi1hpRLOLvPLyDOubn6xLAgLWdciMBadh4-8HlUsJeojEBnPcovQItey5d5s-WK911jHjYG10C9OHODoPiSdKEhYY2KBKkMI6eGbyCQ08C7cB5nU0tQ67g8qg7aUTg_aFVm-kRPJ4_xm-0nRghkDSa-u4UC7akPTm1Mf6pVAPAaAXyhWPDhyrJA4LH0i6GiwByHwJic8v_rOXER_-NPDf6L-D_mmJf2b0T0v8M6N_WuKfDx3I3O8dy9O9RvgNjAlFaOFvmWlqb1tvmAFCrwi9YqbZknFTJ_PlkeC4NW2Z9FsSxG-wQzNGaOHtUvroQBzMwu01JDHNfJjzm_Cc0GX4lMd4Z8HhEHbcGY9dSEDaMklmqwOlAWB1xg8ADLreqGB8IH-werN3h-vNjoXWYN8fh4Zi6FeTNCT8thn6uB-Iy8jhzh2_rYVEEpeEUofWRTUZ-0__sJG6kiQu_YwBoSlMp7DERihY92pol8JkHhw6yqRoFInLLEhgV54GfulwD-aFl8UjyGASxtN9sT_880xIFp-Hrtfi0TpmXGd0feh-gNBZVUXxSVnKVm9lSDlL4nk6z5KCBraEzrDeBXTRtoN2dlk8_B28d9oM3jNku9GN7X6jBtH9chk_Jr-RYyCerkh6E9rTO6XQwL3WHfwDGUfjIVbYuQ1JV8khskT2MzArCJ2ZQO5wU3hlD2QN373jwkeXcEMvuzDOB5c98tvnHOuxPvXutOavQUrD6nlD0ccrL9Tyh_KCGev2Zk7pYU5DvZ6ElL4uHNdAaIZsF4aQ__rNnIQw6H6O0-P1-IiKn6klsuL1RLZwYjb9VYGo-FF_45K5URzOtX6yHPYF8etBVz-wdu8LfjA9EsM9YhkZzZljoZqU-SUc8Dqjm0o4e1mbYXCAPE23enGeVhKCxrujvKJXNJrEJS2KX4ozJgG0KM9eP3EZCY7KDa-YFVaCKajD-2s8aSdpFEfJNP8fxypJ-uTsDXTMkVAaKe0w-vLP71PrWP20z9Nf302YcW6saC6-DCd8kfIyLdkEF0k-j1Napvl8sllwVsbJLGEpo3HGeFKzlNWMJrxeZ_OCzydiQWNKExrnCaU0o9GcJRmu07xYJwnNaUqyGFsmZCTlto20aSbhd4zFrExn-USyCqUdf1YxC280rfrGkiz2R0B7dHPCSVx8PTvyQfUCIWOgabjmQ-WBKQ60DLuCzxuthQYVmsFLjwdkeNbGIrgNU6CYE1sEcX6o9CfdXZFPeiMXG-e68IMFvSX0thFu01dRrVtCbz3L_b9pZ7QXL6G3IVNL6G1I9v8BAAD__1ojm7M">