<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/82875>82875</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
RISCV optimization on checking certain bits set ((x & mask) == val)
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
Explorer09
</td>
</tr>
</table>
<pre>
It might be common in the C family of languages to check if certain bits are set in an integer with a code pattern like this:
```c
unsigned int x;
if ((x & 0x3000) == 0x1000) {
// Do something...
}
```
And I am surprised when compilers like GCC and Clang didn't realize they can use some bit shifts and inversions of bit masks to save some instructions and emit smaller code.
Here I present 3 possible optimizations that could be implemented in a compiler. Two of them can apply not only to RISC-V, but other RISC architectures as well (except ARM, perhaps). The last one is specific to RISC-V due to the 20-bit immediate operand of its "lui" (load upper immediate) instruction.
The bit masks should be compile time constants, and the "-Os" flag (optimize for size) is assumed.
### Test code
The example code and constants are crafted specifically for RISC-V.
Each group of `pred*` functions should function identically (if not, please let me know; it might be a typo).
* The "a" variants are what I commonly write for checking the set bits.
* The "b" variants are what I believe the compiler should ideally transform the code to. I wrote them to let compiler developers know how the optimization can be done. (But in practice the "b" code might transform to "a", meaning the "optimization" direction reversed.)
* The "c" variants are hacks to make things work. They contain `__asm__ volatile` directives to force GCC or Clang to optimize in the direction I want. The generated assembly should present what I considered the ideal result.
```c
#include <stdbool.h>
#include <stdint.h>
#define POWER_OF_TWO_FACTOR(x) ((x) & -(x))
// -------------------------------------------------------------------
// Example 1: The bitwise AND mask contains lower bits in all ones.
// By converting the bitwise AND into a bitwise OR, an "addi"
// instruction can be saved.
// (This might conflict with optimizations utilizing RISC-V "bclri"
// instruction; use one or the other.)
// (In ARM there are "bic" instructions already, making this
// optimization useless.)
static uint32_t mask1 = 0x55555FFF;
static uint32_t val1 = 0x14501DEF;
// static_assert((mask1 & val1) == val1);
// static_assert((mask1 & 0xFFF) == 0xFFF);
bool pred1a(uint32_t x) {
return ((x & mask1) == val1);
}
bool pred1b(uint32_t x) {
return ((x | ~mask1) == (val1 | ~mask1));
}
bool pred1c(uint32_t x) {
register uint32_t temp = x | ~mask1;
__asm__ volatile ("" : "+r" (temp));
return (temp == (val1 | ~mask1));
}
// -------------------------------------------------------------------
// Example 2: The bitwise AND mask could fit an 11-bit immediate
// operand of RISC-V "andi" instruction with a help of right
// shifting. (Keep the sign bit of the immediate operand zero.)
// (This kind of optimization could also work with other RISC
// architectures, except ARM.)
static uint32_t mask2 = 0x55500000;
static uint32_t val2 = 0x14500000;
// static_assert(mask2 != 0);
// static_assert((mask2 & val2) == val2);
// static_assert(mask2 / POWER_OF_TWO_FACTOR(mask2) <= 0x7FF);
bool pred2a(uint32_t x) {
return ((x & mask2) == val2);
}
bool pred2b(uint32_t x) {
uint32_t factor = POWER_OF_TWO_FACTOR(mask2);
return ((x >> __builtin_ctz(factor)) & (mask2 / factor))
== (val2 / factor);
}
bool pred2c(uint32_t x) {
uint32_t factor = POWER_OF_TWO_FACTOR(mask2);
register uint32_t temp = x >> 20;
__asm__ volatile ("" : "+r" (temp));
return (temp & 0x555) == 0x145;
}
// -------------------------------------------------------------------
// Example 3: The bitwise AND mask could fit a 20-bit immediate
// operand of RISC-V "lui" instruction.
// Only RISC-V has this 20-bit immediate "U-type" format, AFAIK.
static uint32_t mask3 = 0x00055555;
static uint32_t val3 = 0x00045014;
// static_assert(mask3 / POWER_OF_TWO_FACTOR(mask3) <= 0xFFFFF);
bool pred3a(uint32_t x) {
return ((x & mask3) == val3);
}
bool pred3b(uint32_t x) {
uint32_t factor = POWER_OF_TWO_FACTOR(mask3);
return (((x / factor) << 12) & ((mask3 / factor) << 12))
== ((val3 / factor) << 12);
}
bool pred3c(uint32_t x) {
uint32_t factor = POWER_OF_TWO_FACTOR(mask3);
register uint32_t temp = x << 12;
__asm__ volatile ("" : "+r" (temp));
return (temp & ((mask3 / factor) << 12))
== ((val3 / factor) << 12);
}
```
I tested the code in the Compiler Explorer (godbolt.org).
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzEWE9z6jgS_zTKpSuUkSHAgUNCwm5qajdbb7MzR0rYbayNLLkkGUIO-9m3WrIBk4SXN_OmhkolMZb6z6-7f92ScE5uNOKcje_Y-P5KNL40dv7wWitj0Sazq7XJ9_NHD5XclB7WCJmpKqNBavAlwgIKUUm1B1OAEnrTiA068AayErMXkAVkaL2QGtbSOxAWwaGn3YJkeNyghZ30JQjITI5QC-_RalDyBcGX0rH0liX3LOl-3yTxJ4vPjQ4e5CQMXll6F7-WBTA-ZXz6CozfQPKaJknC-AxYes_Se0heh90Xk3YLAONLxpdwb8CZCn0p9WYwGLR6J_dnBpxadatzeARRgWtsbaXDHHYlagKrlgqti_78bbEAoXNYEFSQy1wzPvFgUSj5Ru7iHjKhoXEYTCDQwJWyIOg0-bhF66TRjvCml5VwLwFvJ7btHqmdt03mwzLahRVJqYRSaAPIg1PT_44W4RFqiw61hxRq45xcKwRTe1nJNxEl-VJ4yEyjcsoCWdUKK9Q-IB-CFz0dwPPOkHW-xCo4I-pa7UEbD0arPdn67fHfi-tfGV_AuvFgfIk2fAfCZqX0mPnGogPhYIdKUSDxNcPaw-23f9CuGm0pasf4bADPJYISjoQjSAeuxkwWMjvqgbxBeqJ05ck1oSarCnMpPPmIljAyBVB-Ms5VIxnnpFQZkUNT12iPGyhjTgDuIUmmHGPiyg6rFhrwsqIH7bzQ3pEjpJnMYpxfPzlSWyixId0t9giFseDkW1RMmLimwrynl_E0_sAzOh8ifG4WvgoKWCwx0nowI5RkZkVBkezAE0rtg-aIYE_bg8hK2FjT1AQau0lqiznjVBdQNLrNu9b77guQOWrfCmZ8KgtKiBBLhcIhKPRQIbxos2PpHcgTwhHg97WhYPedvg2xZ5wLAm4rrDz4s6NcfWypSu1hZ6WPUAZaknoTYCcmIloavBO5_kzkGpXEbSjVQ8p3zsocg3_eCu0KY6t2VU7pN4BH2FnjMRaGN8Hlg4gct6goGV3AAEqzC7tPazBU0xohNxoHhOJdE5i0tiLzMsMulYLxQW3E8MQe0-FF0FcodAcF4_xUFUnIpcUYPItEO5R1fPYOquwdVKXIIidVInK43jjYGfsSqnVPuRc6ArtJVivhqtUKtkYJLxVSErV6t7GPFMZmkTiNbXnTGziUR9uGjrY-wk5oH3lhgxqtoMQWzmG1VvsuVB3bHRJFO5mjxViOIZBg0TXKDy41H8ZTqTPV5AgsXTifr41Rg5KlD5-8l9r3X-dYSI3wr6ffHr6tnpar59-eVsvbxfPTN-pdoT_FNhb_vYHr9uEYilZUaF3Xf_zTk_fQ8saQpTHga-l30iHc_vM-8FwXTAfK7NDGLk8NQSli5GNpBXF3IfhbtL7Lu1N5UnsD4vAVQUAUGVI2z4mVe8JOaLirDGqCeV8l49PnUrq2FDKjCyUzH0eOfoNrvFTyjQxr-waVUqbsJcXEVdSsqfkYGyuW2tlppXRmPGrqX7TGYqgTki9D-fRbtrIo8n2oUNFylXQ9aT1WaBwqdG5wlhHOCy8zaKT2KV_FrjSEOPuM6bNcLg_j0vnirVDd2uFonAzvH45rWxvilhVVlvUxSVsV_CbsP5m22scfEpG8koGnE1t8PgoJv6niqJzzoWB8enDgtT_aWfSN1b2RMGi6ZONh4DvTs_4BPZMF_O9cEePTCO_pyy9pzi5oBosb6TzaYxA9VnUIYs-Q9LDjnHyj2TxMP-kthH_vbDsMkbAzO-HE3U7X73HwT-YufoG7wowiPbHMcNgfDc8K7jAmHslB6EBKPSJqjzIlqjAeWWKdfs7TNE8HC4LpF8Q6TiJyE05I7eD8wYD6htZ8xCqB3F5ktK0_LQTvhHImNN-W8o7Ddk9Sb_Im6jlO3F8hFn4kloQ-l4iFwwmz9BZfJIdWDR-GzV9nE94REu8XO_-CiG7_8pMGHd5HsYvo0-QiR_FLHPUZSV20-xOq4JdI6hiNQmTe2BCOyw5-WPPRyvSBpQ-wWq0bqbzUq8y_MT6NkmPdB096aJ6-7eRCnz7OF37X44vk-Ac9vsSs0X-e_ExiPefV0A7H43H_AmM0_osYNf0Ko747a3-fUNuj9wfn67jpic5y7fJSuDAXvT_SM87_c-33NYbztLGVCAfN2-Xt4y-D7xFZ2nJTkiRhSLpEZCkcF9OQNPo6kaXfIZW0RyrL5eXRJ_09tJL2aSX9SpGlP5FW0k8y_mjnSflHLBYw5Cd80sPy46Wf8EukmIsbvwvFT-Sb9If4pjPysOHnz3F_Cb4f3as-gkfn22N5uNLo7p27i5PuqprUbky-NsoPjN0wPhtc5fM0n6UzcYXz4SSZTiaT8Wh8Vc5HPB1m2bqY3QxRTG-wEGKWDvNJMVmPhzwpruScJ3yUcD4acj4ejQfJLM9HOEmKfITFmiMbJVgJqQZKbSvSdyWda3A-5dPJ-EqJNSoX7tQ517iD8JLiML6_snPac71uNo6NEiWdd0cpXnqFc6K4X_vDHM1z3dVV70bdoX9X2v3KZnx21Vg1L72vw216oKWN9GWzHmSmYnxJ-ts_17U1_8XMM74MVjvGl8Gr_wcAAP__0aYaIA">