<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/144237>144237</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            Very poor performance in std::bernoulli_distribution
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          Disservin
      </td>
    </tr>
</table>

<pre>
    Fortunately no godbolt reproducer as of now

While working on the c++ data loder of
https://github.com/official-stockfish/nnue-pytorch/blob/master/training_data_loader.cpp
I noticed a 2x performance difference between latest clang and gcc.

Running a perf profile on this showed `__ieee754_logl` at the very top which is no where to be seen with gcc, assuming this function somehow didn't get properly optimized ?
Taking a look at the flamegraph shows it comes from the `std::bernoulli_distribution` call.

https://github.com/official-stockfish/nnue-pytorch/blob/e1f4c5fbd50b37b4f5315f5b364b502c061a8576/training_data_loader.cpp#L922


![Image](https://github.com/user-attachments/assets/bddc9daf-6da4-4600-98dd-01c8ac746f7a)
![Image](https://github.com/user-attachments/assets/df2eae56-917e-4dbd-9684-2e694aa014d6)

https://godbolt.org/z/6TMYYsW3e

I haven't been able to create a small standalone example as of yet which reproduces this, so if someone wants to compile the above example, then get the file from godbolt and run

`clang++ -march=native test.cpp -O3 -o loader && ./loader test77-jan2022-2tb7p.high-simple-eval-1k.min-v2.binpack` 
The mentioned file can be downloaded from here https://huggingface.co/datasets/official-stockfish/master-smallnet-binpacks/tree/main

If you compile directly with libc++ instead of libstdc++, the program will be another 1.5x slower

```
clang++-21 libc++ 10.0457s
clang++-21 libstdc++ 5.43586s
g++-15 3.56669s
```
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJysVU-L3D4S_TTqS2Ejy__ahz707NAwsMvC8mNDTkNZKtvKyJKR5O50Pv0it2eSsCG_S8AwnnZJ76neeyUMQY-W6MTqJ1Y_H3CNk_OnZx0C-au2h96p--nifFwtRjJ3sA5Gp3pnInhavFOrJA8YwA1g3Y3xM-PnT5M2BDfn37QdwVmIE4Fk4omJJ1AYEYxT5MENjJ-nGJfAyjMTFyYuo47T2ufSzUxc3DBoqdFkITr5NugwMXGxdqVsuUfnZfq3N65n4jJjiOSZuESP2mo7viacV-NQkc_lsjB-fgHropakAEF8hYX84PyMVhIoPQzkKb32FG9EFgxGChGkQTsCWgWjlPnjfP9ZbYIA3PaAxbshHXg7qA4QJncjBazhr6-aiNq6ejVuNKzhgHFrxpX8HaJb4DZpOYEOqbG3iTxBdNAThEThpuOUYJn4B2AI65xAN4hhtTJqZyG4mSZ3A6WVZaKNMFJMhBby5g5uiXrW3xKZ8sL4-S98e_A2zr29cxkMzjR6XKaNeQAdQbqZAgzezVsJa3iIKolUnnvy1q3G6FelQ_S6XxORdDaJxuwd-jOiUjFUsh56VfO-bPtqqMuiHuq-bKq-5kLypsBj3Ta_k12U_-yEeLBKjyhY_fQy40isfmbi-BumayCfYYwop5lsDExcMATaXnqlZKdwyBqFVVY1nGfdUamMF_KIsq2aoUUmuj-IqAZBSHWTdUVLWaV6lXXNscoENV2FyItKNTvi__X_kdjc-ZGJyzcmLs1f__r8OXwq6VH-AhNe6WGgPjkPe7M5UXrCSIAQZjQGQkSr0DhLQF9xXgzt0b9T3K38MRXC5tRk3eBAD5tT08Ib2hi2vd28pNgkg2Hvrh97pjVxIrt5eXNoKtvM-D57Uh79andNG76ldB8w2YzJReWzxaivBCnGyQqQ_buEzMHDHMBEw0QDOROX_ZdU2LbZF7SCC5GJ2LdLPulxyoJOvDK6osmKt3zWNruKvNd2QfmWrJ-yNREk0bSzpB6UJdqUZeVudoNQj0NsMf9ZoWkdR23HASXl0iW1MeIu_C8D85h32SaLpZjtXMIWBaKtQu_9eRng7taPfivtSUZzf4wXo_v3yaxtiIQq6Wl0H6LaP-x6pLEyepzhpo1J50Lr4kQeirz-CsG4G_kPQfaHn39QJhPFj3AFz3lVt-GXRd_Roc6rsj42qe69pqihzOumabrwI9xBnUrVlR0e6FS0VdeVVSPqw3Qq6q44qorqQsleVapo6ooUp6ptVKnK7qBPgouaN0UleN3yOu9qXhNvjlJIpEZyVnGaUZvcmOucgnTQIax0KqpKlO3BYE8mbNeoEJZusH1lQqRb1Z_Soqxfx8AqbnSI4fs2UUdDp_-m-2Bxzv90K2kLfzN2D6s3p98MlASz_8kW776QjExcNnLJKjv760n8LwAA__-BnKH_">