<html>
<head>
<base href="https://bugs.llvm.org/">
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW - vectorize widening instructions"
href="https://bugs.llvm.org/show_bug.cgi?id=50256">50256</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>vectorize widening instructions
</td>
</tr>
<tr>
<th>Product</th>
<td>new-bugs
</td>
</tr>
<tr>
<th>Version</th>
<td>trunk
</td>
</tr>
<tr>
<th>Hardware</th>
<td>PC
</td>
</tr>
<tr>
<th>OS</th>
<td>Linux
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>enhancement
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>new bugs
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>sjoerd.meijer@arm.com
</td>
</tr>
<tr>
<th>CC</th>
<td>htmldeveloper@gmail.com, llvm-bugs@lists.llvm.org
</td>
</tr></table>
<p>
<div>
<pre>GCC11 learned a new trick[1] and is now able to vectorise widening instruction
much better. Copying for completeness the example[2] here:
void wide1(char * __restrict a, short *__restrict b, int n) {
for (int x = 0; x < 16; x++)
b[x] = a[x] << 8;
}
GCC11 generates:
ldr q0, [x0]
shll v1.8h, v0.8b, 8
shll2 v0.8h, v0.16b, 8
stp q1, q0, [x1]
ret
whereas with trunk we generate:
ldrb w8, [x0]
ldrb w9, [x0, #1]
ldrb w10, [x0, #15]
lsl w8, w8, #8
strh w8, [x1]
ldrb w8, [x0, #2]
lsl w9, w9, #8
strh w9, [x1, #2]
ldrb w9, [x0, #3]
lsl w8, w8, #8
strh w8, [x1, #4]
ldrb w8, [x0, #4]
lsl w9, w9, #8
strh w9, [x1, #6]
ldrb w9, [x0, #5]
lsl w8, w8, #8
strh w8, [x1, #8]
ldrb w8, [x0, #6]
lsl w9, w9, #8
strh w9, [x1, #10]
ldrb w9, [x0, #7]
lsl w8, w8, #8
strh w8, [x1, #12]
ldrb w8, [x0, #8]
lsl w9, w9, #8
strh w9, [x1, #14]
ldrb w9, [x0, #9]
lsl w8, w8, #8
strh w8, [x1, #16]
ldrb w8, [x0, #10]
lsl w9, w9, #8
strh w9, [x1, #18]
ldrb w9, [x0, #11]
lsl w8, w8, #8
strh w8, [x1, #20]
ldrb w8, [x0, #12]
lsl w9, w9, #8
strh w9, [x1, #22]
ldrb w9, [x0, #13]
lsl w8, w8, #8
strh w8, [x1, #24]
ldrb w8, [x0, #14]
lsl w9, w9, #8
strh w9, [x1, #26]
lsl w9, w10, #8
lsl w8, w8, #8
strh w8, [x1, #28]
strh w9, [x1, #30]
ret
We completely unroll this very early, and then fail to loop or slp vectorise
this (haven't looked into this yet, don't know yet which one).
[1]
<a href="https://community.arm.com/developer/tools-software/tools/b/tools-software-ides-blog/posts/performance-improvements-in-gcc-11">https://community.arm.com/developer/tools-software/tools/b/tools-software-ides-blog/posts/performance-improvements-in-gcc-11</a>
[2] <a href="https://godbolt.org/z/KPe6xjfed">https://godbolt.org/z/KPe6xjfed</a></pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>