<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/76265>76265</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [X86] Matching bit test and bit set patterns appear not to work
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            backend:X86,
            llvm:codegen,
            missed-optimization,
            llvm
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          Bryce-MW
      </td>
    </tr>
</table>

<pre>
    I've been profiling some code that uses a large array of bits to test if a 32 bit value has been seen before or not. I noticed that neither Clang nor GCC compiles my bit testing and setting code as I expect. Clang produces better code overall (yay!) including here.

For the test, it compiles to a mov from memory and a BT even though the load should be able to be folded into the BT. I didn't look too far into the why for that. Based on a quick look at uops.info, it could be that doing the BT from memory is slower. Here's a [quick example](https://godbolt.org/z/e8vqq1545). What I find more interesting is the set.

The set doesn't get converted to a BTS at all. [Here's that example](https://godbolt.org/z/Wvo9nWMGv). As far as I can tell, this is supposed to happen. I've never looked at backend code before but it seems like [this section](https://github.com/llvm/llvm-project/blob/52b7045fbb70571e09c0ad3be7bd3f0c1acccffa/llvm/lib/Target/X86/X86InstrCompiler.td#L1906-L1930) should be detecting this pattern. It seems like there is also [some code](https://github.com/llvm/llvm-project/blob/04c473bea3e0f135432698fcaafab52e1fe1b5ec/llvm/lib/Target/X86/X86ISelDAGToDAG.cpp#L719-L746) to make sure that a load isn't folded in too early. In this case, it's a RMW so I would expect the load/store to be folded at some point even if the replacement is made.

Both of these have the same result if you and the bit index by 63 which is expected and the replacement should be able to see through that as well.

Interestingly enough, Intel defines intrinsics for these, `_bittest64` and `_bittestandset64`. Clang accepts these with `-fms-extensions` for some reason and produces inline assembly in the IR output. There are some _interesting_ consequences of this and of  trying to do it with inline assembly yourself but that's for another issue.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJykVktv4zgT_DX0pRFBoiwrPviQB5zPwOQyEyDfbcBHy-KGIhWScqL59YumnNfuHnawF9mgxO7q6q4iRYzm6BB3rLlmze1KTKn3YXcdZoUX948r6fW8OzDenhAkooMx-M5Y444Q_YCgvEZIvUgwRYwgwIpwRBAhiBl8B9KkCMlDwpjAdCCg5rQIJ2EnhF7EJWykh8TOBwQfwPlUwIF-jEK9JHBoUo8BbqxwR3A-wN3NDSg_jMZihGHOcSkRoRNOQ8SU_2eQIsIB8HVElYpzjDF4PSkkCClhWL7zJwzCWmD8chYz4xXjWzBO2UlTrB4DFqy8ZeXV8tz7AKnHnJjxGzDpA1PyIGDwJ-iCH2DAwYc5IxNw_QB4Qgep99OxzxGsFxpi7yerQSIIaZEiSITOW40ajCMme4TrB2JHG-0YbxNY758geQ-dCB8fvfQzdBmcSAVci4gavAMBz5NRT8sm6psfY2Fc59_Bn_Nn0rWnopecX6owEaL1LxgK-B8GZLyl7rPmeomOr2IYLbLmlvHLPqUxsvqK8T3j-6PX0ttU-HBkfP-L8T1enp6fq2bdML4t4JHyHqAzTsNA82BcwnBuq4kZTMT0pQkPyxpoj3Hh5IhUijthSDRAPlP-gwoW1hYE9B12LvS3AD-e_NY93t-dMuCrmInPA6aEg4TWEpmpNzHzNI2jjwuKXowjugLOknJ4wpBbgZqwSaGe0OllEs9ykFOivkTEIYI1T0jgc-yIKhnv_hGzSf0kC-UHxvfWnt5-Lsbg_0CVGN9L6yXj-4bLtlw3nZRt2bQVlltVCl1LbKWuu1JVQinVdeJTHEP7HkjoFOf_l5vleXAxhZtl9kORNOP1t2pbbi6-Vdu6JBl9DLfGRODzbJkIoyABEi9fCiW9I1EobPRU9rvn_Leay7Vat7VEUWPZVXWzrvlme9kpITohG45Vh5VsUP2Lmn-gvb26e_C3V3eFGkequa22F9_a9YYqTh4G8YQQp3BWlFh0bs5z-i7tLGAUwc4FHNxCixIRF1me9fX9_hGihwO8ZCIXO3s3D8b3MdHIfHENkRarHr1xaTEd0-U9AUcrFA7oEpE8CP3V2q596snEU4-RvPqEi_jEQHvjZLOlz37KnkavyIGN0_gKcoZNDS-9UT3FXpASmvOXn3P_3fMiUqpwtkZiLcILWvsF3uHDF-wM6OhrYovWLWjsjMNI7hGMi0bFsxviwinblD-lSWTbmzXblBnZp0XhdMTl1dt5IZTCMcUzHy8m9bThohviBb4mdNF4FykUJcqcBxSRPNfpj8PGOGscnUcRB2nn3Poe4fAd_JTGKRXwkMdeBFyC_PxkgD_J1CI-T-goVm4O6cNp-g8pzFlTHrQn08gY_5pw9lOIaLvsLMRuHi7CLJzPR6yJccJipXe13tZbscJd1ZY1bzfbdbXqd2VTKXm51a2WVckvO9U2YruRTVd1uKmaamV2vOR1xTnnJd-WVYGy1ZqXncAKtxXXbF3iIIwtSF5krKucctdu-KZZWSHRxnwl4fxsiay-yqLjjN8wzrMs6yuygiO69-XBxIj6wo_JDOaXyO74ZQvndMsJu-wLcjpGti6tiSl-IEkm2XwfonzNLdyLpHpi9e16kdmW2ZLTm3NFIGMX-eZC9L_48LSagt39tkVlIiLj-8zFnwEAAP__tbw6SA">