<html>
<head>
<base href="https://bugs.llvm.org/">
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW - Recognize and optimize movemask equivalents for 16/32/64 bits"
href="https://bugs.llvm.org/show_bug.cgi?id=39984">39984</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>Recognize and optimize movemask equivalents for 16/32/64 bits
</td>
</tr>
<tr>
<th>Product</th>
<td>clang
</td>
</tr>
<tr>
<th>Version</th>
<td>trunk
</td>
</tr>
<tr>
<th>Hardware</th>
<td>PC
</td>
</tr>
<tr>
<th>OS</th>
<td>Linux
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>enhancement
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>-New Bugs
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedclangbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>bugzilla@poradnik-webmastera.com
</td>
</tr>
<tr>
<th>CC</th>
<td>htmldeveloper@gmail.com, llvm-bugs@lists.llvm.org, neeilans@live.com, richard-llvm@metafoo.co.uk
</td>
</tr></table>
<p>
<div>
<pre>I found that new clang version (clang version 8.0.0 (trunk 348905)) is able to
optimize expressions which uses both vectors and masks. However there is one
case which is not optimized - movemask equivalents for 16/32/64 bits data. As
"true" in vectors is stored as -1 (all bits set), it means that it is enough to
extract any bit from every vector element to get similar result as movemask
would have. Two possible solutions for 16-bit data are below in test()
function. First one uses packs+movemask instructions, second uses pext. In the
latter case it is also possible to use different masks as 2nd arg, as both
would give the same results.
Code for 32 and 64 bit data would look similar - you need to use move packs to
reduce mask vector to 8-bit data, or wider mask for pext.
For reference I also added function test2() which operates on 8-bit data, and
optimized assembler code generated from it. Resulting code for test() after
fixing this issue should be similar.
[code]
#include <immintrin.h>
void test(void* data1, void* data2)
{
__m128i v1 = _mm_load_si128((__m128i const*)data1);
__m128i v2 = _mm_load_si128((__m128i const*)data2);
v1 = _mm_cmpeq_epi16(v1, _mm_setzero_si128());
v2 = _mm_cmpeq_epi16(v2, _mm_setzero_si128());
#if 1
v1 = _mm_packs_epi16(v1, _mm_setzero_si128());
v2 = _mm_packs_epi16(v2, _mm_setzero_si128());
int m1 = _mm_movemask_epi8(v1);
int m2 = _mm_movemask_epi8(v2);
#else
int m1 = _mm_movemask_epi8(v1);
int m2 = _mm_movemask_epi8(v2);
#define PEXT_MASK 0xaaaa
//#define PEXT_MASK 0x5555
m1 = _pext_u32(m1, PEXT_MASK);
m2 = _pext_u32(m2, PEXT_MASK);
#endif
int m = (m1 | 3) & (m2 | 3);
v1 = _mm_maskz_add_epi16(m, v1, v2);
_mm_store_si128((__m128i*)data2, v1);
}
void test2(void* data1, void* data2)
{
__m128i v1 = _mm_load_si128((__m128i const*)data1);
__m128i v2 = _mm_load_si128((__m128i const*)data2);
__m128i vc1 = _mm_cmpeq_epi8(v1, _mm_setzero_si128());
__m128i vc2 = _mm_cmpeq_epi8(v2, _mm_setzero_si128());
int m1 = _mm_movemask_epi8(vc1);
int m2 = _mm_movemask_epi8(vc2);
int m = (m1 | 3) & (m2 | 3);
v1 = _mm_maskz_add_epi8(m, v1, v2);
_mm_store_si128((__m128i*)data2, v1);
}
[/code]
[asm]
[/asm]</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>