[PATCH] D141748: [WoA] Use fences for sequentially consistent stores/writes
Usman Nadeem via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Jan 18 12:49:05 PST 2023
mnadeem added a comment.
In D141748#4060636 <https://reviews.llvm.org/D141748#4060636>, @efriedma wrote:
> The explanation makes sense.
>
> Can we make this specifically apply to _Interlocked* functions, instead of all atomics on Windows targets? I'd prefer not to impose this performance penalty on other users of atomics if we can avoid it.
IIUC we must add a fence after every sequentially consistent write to memory.
> It looks like for atomic rmw ops, MSVC generates one barrier, not two; can we do the same?
They generate acquire/release instructions with implicit fences + one trailing explicit fence, while this patch generates normal monotonic st/ld, so two explicit fences are needed but the fences are outside the loop.
We could do something similar to MSVC if it helps performance. It would only involve adding a trailing fence while keeping the current acquire/release instructions.
I can think of a few possible ways to do this:
1. Modify clang to add a `fence seq_cst` after the interlock intrinsics, I think we would also need to modify any other builtins that end up generating seq_cst writes to memory, so it might get messy.
2. Add something like `TLI->shouldInsertTrailingFenceForAtomic(I)` to use in `AtomicExpandPass` and generate a Trailing Fence while keeping the original memory ordering. With the current patch we set the ordering to Monotonic and thus also need a leading fence.
3. Add a fence during lowering. Not sure how much effort this will take.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D141748/new/
https://reviews.llvm.org/D141748
More information about the llvm-commits
mailing list