<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/67803>67803</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[VectorCombine][X86] Poor handling of compare-select patterns with AVX2 spoofing on AVX1 targets
</td>
</tr>
<tr>
<th>Labels</th>
<td>
missed-optimization
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
RKSimon
</td>
</tr>
</table>
<pre>
https://godbolt.org/z/Waonx44Mj
For AVX1 only targets we often encounter 'fake-AVX2' code for integer math like:
```c
#if !defined(__AVX2__)
#define _mm256_cmpgt_epi32( a, b ) \
_mm256_setr_m128i( \
_mm_cmpgt_epi32( _mm256_extractf128_si256( (a), 0 ), _mm256_extractf128_si256( (b), 0 ) ), \
_mm_cmpgt_epi32( _mm256_extractf128_si256( (a), 1 ), _mm256_extractf128_si256( (b), 1 ) ) )
#define _mm256_blendv_epi8( a, b, c ) \
_mm256_setr_m128i( \
_mm_blendv_epi8( _mm256_extractf128_si256( (a), 0 ), _mm256_extractf128_si256( (b), 0 ), _mm256_extractf128_si256( (c), 0 ) ), \
_mm_blendv_epi8( _mm256_extractf128_si256( (a), 1 ), _mm256_extractf128_si256( (b), 1 ), _mm256_extractf128_si256( (c), 1 ) ) )
#endif
__m256i cmpsel_epi8(__m256i x, __m256i y, __m256i a, __m256i b) {
__m256i cmp = _mm256_cmpgt_epi32(x,y);
return _mm256_blendv_epi8(a,b,cmp);
}
```
This is really poorly optimized, mainly due to all the bitcasts to/from the __m128i (<2 x i64>) types.
In particular we see this pattern a lot:
```ll
%3 = bitcast <4 x i32> %sext.i to <2 x i64>
%4 = bitcast <4 x i32> %sext.i21 to <2 x i64>
%shuffle.i.i = shufflevector <2 x i64> %3, <2 x i64> %4, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
%7 = bitcast <4 x i64> %shuffle.i.i to <8 x i32>
```
We should be able to get VectorCombine to fold this to a <8 x i32> shufflevector instead, in fact VectorCombine::foldBitcastShuf might handle this if we extend it to binary shuffles, with improved cost handling.
We also see :
```ll
%2 = icmp sgt <8 x i32> %0, %1
%cmp.i = shufflevector <8 x i1> %2, <8 x i1> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
%sext.i = sext <4 x i1> %cmp.i to <4 x i32>
%3 = bitcast <4 x i32> %sext.i to <2 x i64>
%cmp.i20 = shufflevector <8 x i1> %2, <8 x i1> poison, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
%sext.i21 = sext <4 x i1> %cmp.i20 to <4 x i32>
%4 = bitcast <4 x i32> %sext.i21 to <2 x i64>
```
We've managed to combine to a single <8 x i32> icmp , but failed to rejoin the compare result sign extensions. We should be able to handle this in VectorCombine if we handle concatenation of casts (based off what we do in VectorCombine::foldShuffleOfBinops)
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJy0V02P2zgP_jXKhZjAlmInPuQwk2mAFy8Wu9gu2t4M2aZttbJkSPI06a9fSLbz1fRzW2CQkWSRfEg-pCRurWgU4pYkTyR5XvDBtdps__7_a9FptSh0ddy2zvWWsEdC94TuG10VWrqlNg2h-0-E7t9yrQ6r1R_vSfRMosfxd68NPL55F4NW8giOmwadhY8IunaoAFWpB-XQAKHrmn_Ah8c37yihayh1hVBrA0I5bNBAx10LUnxAD2G0kEbjXznNKRM1EBpXWAuFFaGbPPf68pzQ7LRn_Ap519Ekzcuub1yOvWCU0A1wQndQAKEZkGQ3ysxbLTqTdzHdCL_z9JlEWd51t4omGTw4w0tXx3STW0GTNIjSDfeI6A4imAbf2F9c7p9lfg2E-AchxDMEOEf1fmwLiap68YA259j6n_JnAnyj7XcG-Dv2l9-TkJ-D_FMJ-QHI9xJIGapK1JfpzHOvTUDZ9Rbl7MS8eggGp8nxcsIvJ0Uws36aEg0AF2qBsOf7dei1Hz06diFp0A1G3SeXN-qpVXb9pRhZP9_0inH6TyssCAsGuZRH6LU28gi6d6ITn3zn2EHHhW9Z1YDgNHApwbUIhXAlt86C04Tua6O7sJyPtPVRJmxH4QAiXRH2ynvvjj3a5WVk_6eg58aJcpDc-GZoEcF5SD13Do0CDlK7zxudlHM0CE1YCN8ECAjbrbxZRgl75T9bPLil8NivEZ0VrL5HAY2_psK2Q11LXIqlCMqm-QuWTptrqQA5FMnt6mpavTTPdoJRiPwXP4jnAZ0H7BrI-q4vJxOXOEd3NidjdwnyFsG2epAVFAi8kIEEDTp4E1zb6a7wjc5pqLWsxuR5mlyrvomHUNYhD-wSCmpe3qjzCWePXuHT6MjrdqihE03roOWqkhNLRO1JgweHqgLhvOFCKG6Osz3rTXwUrgXR9Ua_YAWltpMSoZorNr5F4NLqQMJvUI6GKAtfurZxN74SmoR8EZrEFzJl13-JHEE4nmTpxILzYq-F1eoXkGMqhYABD2d6zKZHiCMvVje8-CWVFgzQ6HdEYTX7nMyDdB6sr5w4lfPXw0CjLwXiv7eMz2qM0PULQscVb7DyQuW5rjhYoRqJNywL5AsXicFBzYUcBQ2-10KFXlzqrucGwaAdpAN_sR1rxQqt7BLulvZVeambKh_rbdpSalVyh4o7oRXoGsbzwJ_H3GIFuq7hY8udF6n0Z8rONf56JMKf9ZNQureEZotqy6qMZXyB2zjNkixLNlG0aLdFXEdpRas0TqM65bygGeNRVWxqFkebtFyILY0oizKaxSxJV9GSJ2lcZgwZTeN1VsVkFWHHhVxK-dL5S_tCWDvgNl1vIraQvEBpw-Wf0k5Yi9XDdBgGNwml_lVgtl76oRgaS1aRFNbZsz4nnAzvh2t3k2eSPL3bpCR5hr-0NqceFEI3purBosTSzaefHVuXv7mD7bWuw241viPmJ8RiMPL2SSJcOxTLUneE7j2u6d9Db_R7LB2h--C0JXQf_P43AAD__7xcuHE">