[PATCH] D141748: [WoA] Use fences for sequentially consistent stores/writes

Wed Jan 18 13:10:23 PST 2023

efriedma added a comment.

In D141748#4063352 <https://reviews.llvm.org/D141748#4063352>, @mnadeem wrote:

> In D141748#4060636 <https://reviews.llvm.org/D141748#4060636>, @efriedma wrote:
>
>> The explanation makes sense.
>>
>> Can we make this specifically apply to _Interlocked* functions, instead of all atomics on Windows targets?  I'd prefer not to impose this performance penalty on other users of atomics if we can avoid it.
>
> IIUC we must add a fence after every sequentially consistent write to memory.

The scenario I wanted to avoid is that someone has an existing codebase that isn't using Microsoft's STL, and is working correctly, but we pessimize the performance to fix a bug that never affected it in the first place.  But I guess there are two issues with that:

- To maintain sequential consistency between all atomic ops, we'd need to ensure that nothing in the process is using the Microsoft STL <atomic>, which maybe isn't practical.
- It looks like MSVC is going to implement _Atomic the same way (see https://devblogs.microsoft.com/cppblog/c11-atomics-in-visual-studio-2022-version-17-5-preview-2/); in that case, there are no intrinsics involved, so I guess we need to precisely copy whatever MSVC does.

Maybe we can expose the optimal sequence as a command-line option for codebases that don't need ABI compatibility.

>> It looks like for atomic rmw ops, MSVC generates one barrier, not two; can we do the same?
>
> They generate acquire/release instructions with implicit fences + one trailing explicit fence, while this patch generates normal monotonic st/ld, so two explicit fences are needed but the fences are outside the loop.
> We could do something similar to MSVC if it helps performance.

I assume fewer fences helps performance, although I haven't tried it.  Using the acquire/release instructions also means we do the right thing if there's code using the normal armv8 atomic sequences.

> 2. Add something like `TLI->shouldInsertTrailingFenceForAtomic(I)` to use in `AtomicExpandPass` and generate a Trailing Fence while keeping the original memory ordering. With the current patch we set the ordering to Monotonic and thus also need a leading fence.

This seems fine.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141748/new/

https://reviews.llvm.org/D141748