<div dir="ltr">Great writeup, thanks!<div><br></div><div><div style="color:rgb(34,34,34);font-family:sans-serif;font-size:13px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration-style:initial;text-decoration-color:initial">I'll respond to the rest in a bit, but I have a digression first...</div><div style="color:rgb(34,34,34);font-family:sans-serif;font-size:13px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration-style:initial;text-decoration-color:initial"><br></div><div style="color:rgb(34,34,34);font-family:sans-serif;font-size:13px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration-style:initial;text-decoration-color:initial">One detail I hadn't noticed in my previous investigation is that the "compare_exchange_weak" (or our IR "cmpxchg weak") operation is theoretically unsound -- and NECESSARILY so.</div><div style="color:rgb(34,34,34);font-family:sans-serif;font-size:13px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration-style:initial;text-decoration-color:initial"><br></div><div style="color:rgb(34,34,34);font-family:sans-serif;font-size:13px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration-style:initial;text-decoration-color:initial">As mentioned in this thread, there are architectural guarantees for forward progress of an LLSC loop adhering to certain constraints. If your loop does not meet those constraints, it <span style="color:rgb(34,34,34);font-family:sans-serif;font-size:13px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">may well livelock vs another CPU executing the same loop, or even spuriously fail deterministically every time it's executed on its own. T</span>his thread is all about a proposal to ensure that LLVM emits LLSC loops such that they're guaranteed to meet the architectural guarantees and avoid the possibility of those bad situations. </div><div style="color:rgb(34,34,34);font-family:sans-serif;font-size:13px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration-style:initial;text-decoration-color:initial"><br></div><div style="color:rgb(34,34,34);font-family:sans-serif;font-size:13px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration-style:initial;text-decoration-color:initial">There is not to my knowledge any architecture which makes any guarantees that an LLSC standing alone, without a containing loop, will not spuriously fail. It may in fact fail every time it's executed. Of course, the typical usage of compare_exchange_weak is to embed it within a loop containing other user-code, but given that it contains arbitrary user code, there's simply no way we can guarantee that this larger-loop meets the LLSC-loop construction requirements. Therefore, it's entirely possible for some particular compare_exchange_weak-based loop to livelock or to deterministically spuriously fail. That seems poor, and exactly the kind of thing that we'd very much like to avoid...</div><div style="color:rgb(34,34,34);font-family:sans-serif;font-size:13px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration-style:initial;text-decoration-color:initial"><br></div><div style="color:rgb(34,34,34);font-family:sans-serif;font-size:13px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:normal;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration-style:initial;text-decoration-color:initial">So, now I wonder -- is there actually a measurable performance impact (on, say, ARM) if we were to just always emit the "strong" cmpxchg loop? I'm feeling rather inclined to say we should not implement the weak variant at all. (And, if others agree with my diagnosis here, also propose to deprecate it from the C/C++ languages...)</div></div></div>