<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/62703>62703</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            Clang WebAssembly - Wrong optimization with rotates
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          HailToDodongo
      </td>
    </tr>
</table>

<pre>
    I'm currently writing a bytes-wap function in C++ targeting WebAssembly, since WASM doesn't have one built in.
I noticed that clang (with O3) will detect a byteswap (in multiple variants) and inserts the following function:

```wat
  (func (;240;) (type 6) (param i32) (result i32)
    local.get 0
    i32.const 24
 i32.shl
    local.get 0
    i32.const 65280
    i32.and
 i32.const 8
    i32.shl
    i32.or
    local.get 0
    i32.const 8
    i32.shr_u
    i32.const 65280
    i32.and
    local.get 0
    i32.const 24
    i32.shr_u
    i32.or
 i32.or)
```

I wanted to try out a different, shorter solution i found using rotates:
```wat
  (func (export "bswap") (param i32) (result i32)
    (i32.or
    (local.get 0)
    (i32.const 0x00FF00FF)
    (i32.and)
    (i32.rotr (i32.const 8))

 (local.get 0)
    (i32.const 0xFF00FF00)
    (i32.and)
    (i32.rotl (i32.const 8))))
```

However when i write the C++ version of it:
```c++
inline u32 bswap(u32 x) {
    return __builtin_rotateleft32((x & 0xFF00FF00), 8) 
        | __builtin_rotateright32((x & 0x00FF00FF), 8);
  }
```

It will still be replaced with the longer version from before.
This seems to be a general issue with rotates of masked inputs, since a single rotate will also be "optimized" into a way longer version.
For example:
```c++
inline u32 bswap(u32 x) {
  return __builtin_rotateleft32((x & 0xFF00FF00), 8);
}
```

Turns into:
```wat
  (func (;240;) (type 6) (param i32) (result i32)
 local.get 0
    i32.const 65280
    i32.and
    i32.const 8
 i32.shl
    local.get 0
    i32.const 24
    i32.shr_u
 i32.or)
```
instead of:
```wat
  (func (;240;) (type 6) (param i32) (result i32)
    local.get 0
    i32.const -16711936
 i32.and
    i32.const 8
    i32.rotl))
```

If i disable optimizations for this function with `[[clang::optnone]]`, it will keep the roate.

Is this a bug with the optimizer?
And is there a way to keep the rotate but still optimize the rest of the function?
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJy8Vk1v4zgM_TXKhWjgyPFHDjkkDYLpYbGHHWCOhWzTtnZkyZDoptlfv5DsfLTNdFp0sUBq1zRNPfGRTxTOyUYjrlmyZcluJgZqjV1_E1J9NztTGd2YWWGq4_qB8ayDcrAWNakjHKwkqRsQUBwJ3d1B9FAPuiRpNEgN94xvGd8CCdtg8PyBxcY57Ap1ZPwenNQlwo_NX39AZdBpxjOCVjwhGI1QDFIRSD1n0Y5FmwfQhmSJFVArCEoldAOM5wdJLfwZM76Cg1QKKiQs6QTKY2I8lxq6QZHsFcKTsFJocv4LoSuQ2qElB9Qi1EYpc_BQTxth8WZcf7qm0fg7CBot4ON7b39n8ZYvIxZvfXDGczr2COn00AsrOpAxn54tusHvMBhOwQCUKYWaN0gQXYwy5vPSaEfAl5PVm1yrPvxhmvD81Quhq6tgo1v-0uXFCt5g7IdXfBPKPg6fRvaJjPx6nTPo6eGU7zOd1xw_wEFo8oVmgOwRzODLqZJ1jb7yQ-W2xhJacEYNY7lDbQZdweB88VhDgtBdauf9qsHn3lgCxnnhC5Zx_rmK8QX-ihnG8-us3fAecxc9R9F-7_9u-XgWbpitIfsyTO7dzkkd_T8OYQQQ3fT5NQR1G8I1kFvsfjMHfEILhxY9bV7EMPT-Sa6e0DpPqalB0lsKy9FttEqtpEYYYg4Tdbn__znwlW0vsC3SYDU8PgZVk_pxLBGFNXkyc8bzZ2A8fZUNfh_2BZdAIQfZ_ZtIVjbtm1DX3E6hvDadKjDbvdcFNOqpI38tECz2Snj9DYrrM6aMbtCeE1Zb00GBtbE4afb3VjpwiJ3zvVQgCGhQoxUKpHMDjqGmbvEJ74T7iV6S-8EL9OmIEP7eKJxcR2BCuRCTcW56kp38ByvGOUhNBgQcxPEVwAnU3ljAZ9H1Cv8bdr_G7ZmQ9-n4Pljtwt4-qipfPou-cp7cPAc-e2K9J-zvC7nUjlBUYOr_LVu_387dIs0Wi1WcXm3hNxmbjF7tfitsDzVIqKQThUKYOkL4w8lBbSyQ78XzeBYazwfwQ982DFM-U_HG9KSNRpbs_C-NfJnKSQp-Ivah8a0RdOrxaXE3LiCgGJqLQpwa07J4Pzpu_MQVhi2LU5eSuY4cGrwYaBKeU4TxLTryMhFGtfOAtp9V67haxSsxw_UizZdRmkR5OmvXWVwnZYrFqs7qJMqqJVarhUh4lCYZltFiJtc84nGULJaLRbLg6TxP40WaLLGK8jhbCc6WEXZCqrlST93c2GYWhGud8iyKZ0oUqFyYnDnXeBhVzZ_fyW5m1_6bu2JoHFtGSjpylygkSeH6PgyxV1Mx3MEPa3Tzgr8XKjkbrFq3RH2YLvie8X0jqR2KeWk6xvd-hel211vzN5bE-D7gcozvA-5_AwAA__9hwWf1">