[PATCH] D141748: [WoA] Use fences for sequentially consistent stores/writes

Wed Jan 18 12:49:05 PST 2023

mnadeem added a comment.

In D141748#4060636 <https://reviews.llvm.org/D141748#4060636>, @efriedma wrote:

> The explanation makes sense.
>
> Can we make this specifically apply to _Interlocked* functions, instead of all atomics on Windows targets?  I'd prefer not to impose this performance penalty on other users of atomics if we can avoid it.

IIUC we must add a fence after every sequentially consistent write to memory.

> It looks like for atomic rmw ops, MSVC generates one barrier, not two; can we do the same?

They generate acquire/release instructions with implicit fences + one trailing explicit fence, while this patch generates normal monotonic st/ld, so two explicit fences are needed but the fences are outside the loop.
We could do something similar to MSVC if it helps performance. It would only involve adding a trailing fence while keeping the current acquire/release instructions.

I can think of a few possible ways to do this:

1. Modify clang to add a `fence seq_cst` after the interlock intrinsics, I think we would also need to modify any other builtins that end up generating seq_cst writes to memory, so it might get messy.
2. Add something like `TLI->shouldInsertTrailingFenceForAtomic(I)` to use in `AtomicExpandPass` and generate a Trailing Fence while keeping the original memory ordering. With the current patch we set the ordering to Monotonic and thus also need a leading fence.
3. Add a fence during lowering. Not sure how much effort this will take.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D141748/new/

https://reviews.llvm.org/D141748