<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/86813>86813</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
Optimization on "test if bit N is set" pattern ((C >> x) & 1)
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
Explorer09
</td>
</tr>
</table>
<pre>
It should be a common code pattern for "checking if bit N of a value is set"
(See these web pages
https://stackoverflow.com/questions/523724
https://www.geeksforgeeks.org/check-whether-k-th-bit-set-not/ )
And I have at least 3 ways to do the thing:
```c
/* 1. */ ((value >> count) & 1)
/* 2. */ !!(value & (1 << count))
/* 3. */
/* 'bit_reversed_value' is a constant */
(SignedType)(bit_reversed_value << count) < 0
```
For a compiler, it's better to canonicalize the patterns so that they all generate the same, optimized code for the target architectures.
## Sample code
The example is modified from a real case:
https://github.com/htop-dev/htop/blob/484f029d8b0e0dbb5df37e3aff0dbf059fd857a1/linux/LinuxProcessTable.c#L96
```c
#include <stdbool.h>
#include <stdint.h>
bool my_isxdigit_1(unsigned char ch) {
uint32_t mask1 = 0x03FF007E;
if (!((mask1 >> (ch & 0x1F)) & 1))
return false;
uint32_t mask2 = 0x58;
if (!((mask2 >> (ch >> 4)) & 1))
return false;
return true;
}
bool my_isxdigit_2(unsigned char ch) {
uint32_t mask1 = 0x03FF007E;
if (!(mask1 & (1L << (ch & 0x1F))))
return false;
uint32_t mask2 = 0x58;
if (!(mask2 & (1L << (ch >> 4))))
return false;
return true;
}
bool my_isxdigit_3(unsigned char ch) {
uint32_t mask1 = 0x7E00FFC0;
if (!((mask1 << (ch & 0x1F)) >> 31))
return false;
uint32_t mask2 = 0x1A << 24;
if (!((mask2 << (ch >> 4)) >> 31))
return false;
return true;
}
```
All three functions do the same thing. Sometimes `my_isxdigits_1` and `my_isxdigits_2` generate the same code, and for some target architectures they do not. (`my_isxdigits_2` is typically larger according to my testing.)
The `my_isxdigits_3` case may require transformation of the constants, but for some targets (ARM and RISC-V) and when the function is inlined in a conditional like "`if (my_isxdigits_3(ch)) { putchar(ch); }`", it might generate smaller code than `my_isxdigits_1` or `my_isxdigits_2`.
(`my_isxdigits_3` utilizes the sign bit after shift to save a bitwise AND on some architectures. I found it might work for ARM, but didn't work well for x86 as I initially expected.)
The compiler should be free to choose which form of the code to emit if the three patterns are canonicalized.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJy0V11v4zoO_TXKC5HAlpPYechD-mGgwNy7i-lgXwvZomNtZSlXkpvk_voF5bhN00wHW9wBirg2JYo6PBSPhPdqaxDXbHHDFncT0YfWuvX9YaetQ5esJpWVxzVL7liyeQjgW9trCRWCgNp2nTVQW4mwEyGgM9BYB4zzusX6WZktqAYqFeBPsA0IeBG6R1AePAbG-eCV8eIREUKLHmGPFezEFv1ga0PYeZZtGC8ZL30Q9bN9Qddou5_VtmO8_KtHH5Q1nvFywbOcz6_N3O_3sy3is2-si8-ZdVvGyxjndN9iaNFNn6ehnVYqTD2GqbGB8RIYX53CjL8bI-EBWvGCIAJoFD5ABntx9BAsSEvbgNAqs6W1zyayZTL81eOuS8Y3kM6A8c2wUMF4MSDEsnuW3UNtexMYXwHjS0gvIjk54GcO0vg3-uBL8pkCy25Zdvvq7LqbbHTz7ivjeaXCk8MXdB7lU_TMeE4ppPwbH4QJFzOLR2KU_HHcYVyu-OjiMiZ6h-QCqPMoS-sGwu2URsf4LajAeO6hQuIdYV8LY42qhVZ_RzKNlPTgKSsi0McjCK1hiwadCMMwLzokh3YXVKf-RjkQmogccyncFgMIV7cqYB16h372HsCM8QweRbfTGOeeW3-0CHgYbMpDZ6VqFEponO1AgEOhoRYeX9nynrdbFdq-OlG9DXY3lfhy-pfxstK2YrycF_Mm4StZVAkmsqoWsslyzETTJLJqksWqkcUiFynjpVamPzBefqPnv52t0fsfotI4qxnPvq2Wn3M2U6bWvYwJ9EFW1upZS2y9blcmnJvjL82B7vik_EGqrQpPRNnexGNIQt0KB3UbOZHfDDMAemVCxp8CdMI_E6PvIDkkWVkmSX7PstdxqhnKKB2KaRwdi4nxom5jVSSHtBwK4a2wxqIAAHAYejrKhKa83JwHfxEKP4WyKD4Pgl8EMbzMvxzEyRhcf2bL7z6Fmf8emE9DT4fNt7Gyr6H9G3E-DfpJFO_g_n-i-ArO2Rdxzu-TpCxvk1_T-acAjzvN_gFKp5txJT7_Nbl_ivYXQ_oc-GsdYqM1hNYhQtObOiqCsR3TCT_05Bk82g6D6tADWyZnafNPKVsmIIz8YOBk-NAyhpOe38Yp1Cy87a53i6HvSAvGhllE79oCykM47qh_6SNocuNA1LV1kmRUsNAdIZDUMdvZRQunHnPpMyOf1FigE0dw-FevHEJwwpAE6gQBRJqMtjP2cU_bqfpwuR1PQW--_xG3-v3h8Xb6H8osve1bNNHHCDrtQxmtiPzKDCJBKrIIDVo9ky6h_Q48ugiZ-DPSJr-BXR-ofl4_ZzdAHFgm5CJKAOjUtg1vyfGd0Brd0MFDK8z1LJNGvZKDi7b-IU8R0z4oUhh-IILamqhvRUMqxLeqCZQrH_UhWfbKI2z-vANrBkjfCwl4gMb2Rr7tZW_dc0zA5vsfYz6kkobx_GTco9ZxxKFYgvDwAMqooCJx8LDDOqC8RpFRPZ1p-IbqhbRTay2J71bVLbnu3qgh4wDsVKDaH9QtzXpVV8LhO-0lTzBO5DqTq2wlJrhO8zRdLPh8UUzatVhWYlHLZNkkRZ6KfF5gUawqvuIpNskqn6g1T_g8yXieLtJ5WsxyTJM5NgViXTeyStk8wU4oPdP6pSMlP1He97gulkWaTbSoUPt4oeHc4B6ikTizuJu4Nc2ZVv3Ws3milQ_-zUtQQeP6X4MQPJWIIcJS4b1dZV7vL6-3nuEwvB0Pu8N5V5_0Tq8_UXW0-ukx3Tn7X6zp3hFjpjtN3NP_AgAA__8F8gvE">