<html>
<head>
<base href="https://bugs.llvm.org/">
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW - Missed simplifications for PTEST patterns"
href="https://bugs.llvm.org/show_bug.cgi?id=42035">42035</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>Missed simplifications for PTEST patterns
</td>
</tr>
<tr>
<th>Product</th>
<td>libraries
</td>
</tr>
<tr>
<th>Version</th>
<td>trunk
</td>
</tr>
<tr>
<th>Hardware</th>
<td>PC
</td>
</tr>
<tr>
<th>OS</th>
<td>Linux
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>enhancement
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>Backend: X86
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>david.bolvansky@gmail.com
</td>
</tr>
<tr>
<th>CC</th>
<td>craig.topper@gmail.com, llvm-bugs@lists.llvm.org, llvm-dev@redking.me.uk, spatel+llvm@rotateright.com
</td>
</tr></table>
<p>
<div>
<pre>As mentioned in
<a href="https://mattkretz.github.io/2019/05/27/vectorized-conversion-from-utf8-using-stdx-simd.html">https://mattkretz.github.io/2019/05/27/vectorized-conversion-from-utf8-using-stdx-simd.html</a>
and related created issue on GCC's bugzilla, there are opportunities to
simplify PTEST patterns and neither GCC nor Clang can optimize it currently.
Test cases from GCC's bugzilla:
using __v16qu [[gnu::vector_size(16)]] = unsigned char;
// test sign bit [ptest 0x808080..., x]
bool bad(__v16qu x) {
return __builtin_ia32_ptestz128(~__v16qu(), x > 0x7f);
}
bad(unsigned char __vector(16)): // test for the sign bit, we can optimize to
(with 0x808080... at LC0)
vpxor xmm1, xmm1, xmm1
vpcmpgtb xmm0, xmm1, xmm0
vpcmpeqd xmm1, xmm1, xmm1
vptest xmm1, xmm0
sete al
ret
// test for zero [ptest x, x]
bool bad2(__v16qu x) {
return __builtin_ia32_ptestz128(~__v16qu(), x == 0);
}
bad2(unsigned char __vector(16)): // equivalent to testing scalars for 0
vpxor %xmm1, %xmm1, %xmm1
vpcmpeqb %xmm1, %xmm0, %xmm0
vpcmpeqd %xmm1, %xmm1, %xmm1
vptest %xmm0, %xmm1
sete %al
ret
// test for certain bits [ptest x, k]
bool bad3(__v16qu x, __v16qu k) {
return __builtin_ia32_ptestz128(~__v16qu(), (x & k) == 0);
}
bad3(unsigned char __vector(16), unsigned char __vector(16)):
vpand xmm0, xmm1, xmm0
vpxor xmm1, xmm1, xmm1
vpcmpeqb xmm0, xmm0, xmm1
vpcmpeqd xmm1, xmm1, xmm1
vptest xmm1, xmm0
sete al
ret
With the above transformation we already get PTEST(x&k, x&k) which can
consequently be reduced to PTEST(x, k):</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>