<table border="1" cellspacing="0" cellpadding="8">

    <tr>

        <th>Issue</th>

        <td>

            <a href=https://github.com/llvm/llvm-project/issues/56450>56450</a>

        </td>

    </tr>

    <tr>

        <th>Summary</th>

        <td>

            Invalid optimisation: `atomicrmw <op>, 0` is never the same as `load atomic`

        </td>

    </tr>

    <tr>

      <th>Labels</th>

      <td>

            new issue

      </td>

    </tr>

    <tr>

      <th>Assignees</th>

      <td>

      </td>

    </tr>

    <tr>

      <th>Reporter</th>

      <td>

          cbeuw

      </td>

    </tr>

</table>

<pre>

    Credits: @taiki-e first spotted this optimisation https://github.com/crossbeam-rs/crossbeam/issues/859#issuecomment-1178539492 and @RalfJung pointed out that this may be a miscompilation https://github.com/crossbeam-rs/crossbeam/issues/859#issuecomment-1179355070

[[InstCombine] Optimize `atomicrmw <op>, 0` into `load atomic` when possible](https://reviews.llvm.org/D57854) assumed that an identity Monotonic (aka Relaxed) or Acquire RMW operation (["read-dont-modify-write"](https://www.hpl.hp.com/techreports/2012/HPL-2012-68.pdf)) is the same as an atomic load. This is never the case for any atomic ordering because an RMW always sees the latest value. Atomic loads do not provide this guarantee.

https://timsong-cpp.github.io/cppwp/n4868/atomics#order-10

> Atomic read-modify-write operations shall always read the last value (in the modification order) written before the write associated with the read-modify-write operation[.](https://timsong-cpp.github.io/cppwp/n4868/atomics#order-10.sentence-1)

These equivalent Rust and C++ programs panic on Apple Silicon M1 Pro with any optimisation enabled.

```rust

use std::sync::atomic::AtomicU64;

use std::sync::atomic::Ordering::Relaxed;

use std::thread;

#[no_mangle]

pub fn writer(gte: &AtomicU64, lte: &AtomicU64) {

    for i in 0.. {

        // Since there is only one writer, at any point of the program execution,

        // the latest value of gte is either equal or 1 greater than lte.

        gte.store(i, Relaxed);

        lte.store(i, Relaxed);

    }

}

#[no_mangle]

pub fn reader(gte: &AtomicU64, lte: &AtomicU64) {

    loop {

        let smaller = lte.fetch_add(0, Relaxed);

        // The writer could increment lte and gte by any number of times

        // between the two fetch_add(0). This does not matter since smaller is still <= larger

        let larger = gte.fetch_add(0, Relaxed);

        assert!(larger >= smaller, "the latest gte value {larger} is less than the latest or earlier lte value {smaller}, this is not possible at any point in the program!");

    }

}

pub fn main() {

    static GTE: AtomicU64 = AtomicU64::new(0);

    static LTE: AtomicU64 = AtomicU64::new(0);

    let t_writer = thread::spawn(|| {

        writer(&GTE, &LTE);

    });

    let t_reader = thread::spawn(|| {

        reader(&GTE, &LTE);

    });

    t_writer.join().unwrap();

    t_reader.join().unwrap();

}

```

```

andy@Andys-MacBook-Pro src % rustc --edition 2021 -C opt-level=1 main.rs && ./main

thread '<unnamed>' panicked at 'the latest gte value 45278 is less than the latest or earlier lte value 45312, this is not possible at any point in the program!', main.rs:22:9

note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

```

```cpp

#include <atomic>

#include <cstdint>

#include <cassert>

#include <thread>

void writer(std::atomic_uint64_t& gte, std::atomic_uint64_t& lte) {

    for (uint64_t i=0;;i++) {

        // Since there is only one writer, at any point of the program execution,

        // the latest value of gte is either equal or 1 greater than lte.

        gte.store(i, std::memory_order_relaxed);

        lte.store(i, std::memory_order_relaxed);

    }

}

void reader(std::atomic_uint64_t& gte, std::atomic_uint64_t& lte) {

    while (true) {

        uint64_t smaller = lte.fetch_add(0, std::memory_order_relaxed);

        // The writer could increment lte and gte by any number of times

        // between the two fetch_add(0). This does not matter since smaller is still <= larger

        uint64_t larger = gte.fetch_add(0, std::memory_order_relaxed);

        assert(larger >= smaller);

    }

}

int main() {

    std::atomic_uint64_t gte(0);

    std::atomic_uint64_t lte(0);

    std::thread t_writer(writer, std::ref(gte), std::ref(lte));

    std::thread t_reader(reader, std::ref(gte), std::ref(lte));

    t_writer.join();

    t_reader.join();

    return 0;

}

```

```

andy@Andys-MacBook-Pro src % clang++ -O2 -Wall -Wextra -std=c++20 main.cpp && ./a.out

Assertion failed: (larger >= smaller), function reader, file main.cpp, line 21.

```

On Cortex-A72 (Raspberry Pi 4 Model B), some padding is needed to prevent the two variables from ending up in the same cache line: https://gist.github.com/cbeuw/e5cb76b1d01b51a56fc35ff40248ab27#file-padded-rs

```

andy@pi:~/rust/rdmw$ rustc --edition 2021 -C opt-level=3 padded.rs && ./padded

thread '<unnamed>' panicked at 'the latest gte value 40533 is less than the latest or earlier lte value 40602, this is not possible at any point in the program!', padded.rs:22:9

note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

```

`fetch_add(0, relaxed)` is incorreclty compiled to a plain `ldr` and `mov` on arm and x86. If you change the above programs to use `fetch_add(0, acquire)`, it is still a plain `mov` on x86, but becomes a `ldar` on arm64 - which is still incorrect and panics on Cortex-A72 (though not on Apple Silicon).

The patch needs to be reverted. There are some optimisations (https://www.hpl.hp.com/techreports/2012/HPL-2012-68.pdf, Section 7) that can be done on read-dont-modify-write, though it is not trivial.

</pre>

<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzdWdly47gV_Rr5BUUWRWp98IMse5JOxtVdbk_1owskQQljkmAAUGrl63MuQGqx5a17ulKJSrYkALzruQsvU5XvLpda5NKaQbJgg1FkuXyUgWCF1MYy0yhrRc7sWhqmGisrabiVqmZraxu6ZhD_hvdK2nWbhpmq8CPTyphU8CrQ5vgnvktjWkGLs_F8ECfuJy6qRG2D4XA6Gyfz0TxmvM5JljteFv9o6xVrlKxJDNVaiMKtl6fiO5YKxhmEApFGlr9QtHkyHkfTaBBdD6JF9398hfen2tilqlJZi8H4mn12Rvq3YINJxK2qZKarLRskS9UMkptBvGQRdhj0UXSkVDxn_hwtb9eihrbGyLQkcoN4dqqNFhsptiYsy00VKr3C0vUYdhsN4jnjkLly3oKJeM1kDuGl3bFbVSurapkxEOSPnN2Jkn8XOV2kNFtk_2qlFuzu9hucLLQ3I46ShnGsBc-DXMEOlcplsQu2WlqBjXMCbrfbcN2U-OtMbkW21qJR2pJx42gY4-PvX34P6GswmYVNXkAOEgU-tWvBDK_gVUMaeMswslLI7snpeNdiI7Q7mXEDpEIDXu_6s0rnQkuAJhUZb7EPMqQYL7d8Z5gRwnMBWAQQvuFlK0K2ODAyLFcM9mKNVhuY0INt1XLNAUIRHkPgVHd43qh6FWRNE3awk4pg1jTbBp_1aDaZ4dNLCnMkTthg2MMquekFcTY_NvfBMdBhzcuyV4hOdgr16pDrZO0WHQmZeY86bmRoomiBtFTAeMId9EyAIJVJTrG2hQJu5xVRgI_wHAh-3BChAWRFnYlgSKA4MvX9WsCZAkiFjjjE7lpjXaZYDmLA9Ir8tdK8MqzhBHUovGiaUrCvspQZft0O2RetvGIEmJN8JmqOkMt7704i_9Zg4pcISsbmpGOyMLs689-62HXfve_-mIwGydW7L_rc4dX_6gPzLAG7Jl_s97r_cQI31Oqh4vXKJw233rQpK2rvVzh9tkLIUoqPJwcxkY3Kc8tzNph2TBheFGESKYtFYXi6Qy_vclgZXiPAAFBULOoSFq7FXoAlc0lp55M5U4UDV-czJr6LrHWQipdnyT8NWSIAlYiVkMSVoMFLSmdDtoKdrMsRCH5oGJ6SxHWhsUA-xQlJdkiHe-P2Z8t3nh1MO7sfvrzHPeTQn3VPqVTz3C2lQPmukChgh0Fy7RQphM3WDzyH9LPoDcU7u9_3uUGzTLVlDhhkWlBVJIou_sgN6c75tm6rFCfJubJCNT1HMRV2K4RPT3ar2BOh5l2izxUSNaXhilvibhy-eo1wwFiJLIjK6rTjegU7PrOAX3cGWH3QAMiFQttBjEQ025O5IUqdEHQ9quARMskSXQaeXnUiTa9J2FIY4-F4dBxYFVyXEpTL4yt7-kASWNi-7FFJ6lqD01jqcn0XS07i-L347HBYcVlTwX-KLWORHjP2t_sbguEeg86gR-mOslMttp0HTxh3FH7_cQrkR_vQgZAu6_OgS6oN3zrJp0u8n4fBPv8hhEgL57MJSXPGQEdLT9n7QP0w-318_zD7XvPwT9W7KGzrreaN_3FCpZfzzcMHFPSV7iRpnS4iyHdoxxf4MMEtz66UegyolBpNHeWYUZXMWBDQjQQV0ziKhyxYUoUNSnRrJcw2dBgLtSED4M1CJAMHO8fDGxV7U0R0W9foAXPXME99OX8U1CfT_tl4G43j6exjgTYaJ9SL_mCATenKTiPgAAGXLLp-BXRc0tZt7ZsNGPLuj6_3D1eL5T_v7xZLSiJDavdFvZFa1S6bbriW1IIw3Bjk0jQl7m44S3n2aDXPxHu8hfZqX3SQLcsWzSus2XcbN-c2M3QXUPGl3S4Jnt_tA-HmWJqNkvkh6va9ixfioQWvyejBEgCo5MGIrx2h8neuGwHl_hiTsGZEwE6upO8En13yf9Km7C1ViUrp3YNrmhHx729ePkThpZLhPLxPbL_Aw9u1LN1NjNXt81167Z3_ZofzYZv97_c9e-O81fx82DZ9S_RSP_Q-_FBcvdxvnIeKh9K57uKF8-Xz82eu6qpOX2NxxSEP7A9pUXQNOo0pnm2U3cYLgu1Z7COm__IzLF7tD95qCc5Q0cK2Gjd4v6hByErc_XS36cHnmAXfaIIRfBPfUdxY4NS9zvyBOPJ1FdXsuFXgoWq7W_GFgyE1GgVHpsj9HdLLoIRBi7bO3BUH4xeUZHpO7l5Log7Ew6cjgGO9P9dsqbQV34PFNCaed9w0iHy9Y18kG7FblYuSXfVeVBVKBwKOxlFucCVyGs8p1BN0RbXd54K--BtWaIUyU7tL2qbvPNxILOMZVRcaNELhp1NOY8PTUWcqWrTTv4lxlk4n6TCPhul4yMeTIkvGRTGK4tGMp9THJGSJgOQUOc1F3_RzI4nv9IbmkTQfwUdegdfoXY1gwjyrJ62gX_xrmsFonCQfbAajSfSTzeBeq_9-O_gs0x_ldBo-G6pmSmuRlXbH_OzcA5MzcIKGNJrONR12s_hJVKkN_YJPua7c4vfZJGSfCrZTLctg45UfJPJUbcRhFgeaNMM6JxT3Y2cvFC1Ie6hvR4IcWIMlnUtbS9NdRJfBOScq1wfpcHMZUA-RrQ_ken39xNCBiJq_J9Fs16pdrZ3vn04PqUA_GUeCDDRyUe30TGlWukFmEm5YTR0mx5_LAsezRoL9XzM1X7Kvwqe1KVVSN_fPOM110UYgmXX57tzwnpDulPVGJ5WtlhvJy_Aiv0zyeTLnF1baUlx-qhEjaPmOdXAp9_UnHMeD-n6k_-yJx0Wry8tXHtjQc47uIwCm_oS6x89qxpPROLpYX47mk3mUjos0GcFAkzxN5_OsyObDbBwXxXR6UfJUlObSP9CoxZY5Ev4ZxoW8RJ6Ko2k0i5N4Go3CuRDzYp5MiuEcy6MZcp5ApSj3j10u9KUTKW1XBpslsu_hmcwF2iS5AiwcO9DnLWytL11GvnCML53g_wEajWI4">