[PATCH] D141748: [WoA] Use fences for sequentially consistent stores/writes
Eli Friedman via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Jan 18 13:10:23 PST 2023
efriedma added a comment.
In D141748#4063352 <https://reviews.llvm.org/D141748#4063352>, @mnadeem wrote:
> In D141748#4060636 <https://reviews.llvm.org/D141748#4060636>, @efriedma wrote:
>
>> The explanation makes sense.
>>
>> Can we make this specifically apply to _Interlocked* functions, instead of all atomics on Windows targets? I'd prefer not to impose this performance penalty on other users of atomics if we can avoid it.
>
> IIUC we must add a fence after every sequentially consistent write to memory.
The scenario I wanted to avoid is that someone has an existing codebase that isn't using Microsoft's STL, and is working correctly, but we pessimize the performance to fix a bug that never affected it in the first place. But I guess there are two issues with that:
- To maintain sequential consistency between all atomic ops, we'd need to ensure that nothing in the process is using the Microsoft STL <atomic>, which maybe isn't practical.
- It looks like MSVC is going to implement _Atomic the same way (see https://devblogs.microsoft.com/cppblog/c11-atomics-in-visual-studio-2022-version-17-5-preview-2/); in that case, there are no intrinsics involved, so I guess we need to precisely copy whatever MSVC does.
Maybe we can expose the optimal sequence as a command-line option for codebases that don't need ABI compatibility.
>> It looks like for atomic rmw ops, MSVC generates one barrier, not two; can we do the same?
>
> They generate acquire/release instructions with implicit fences + one trailing explicit fence, while this patch generates normal monotonic st/ld, so two explicit fences are needed but the fences are outside the loop.
> We could do something similar to MSVC if it helps performance.
I assume fewer fences helps performance, although I haven't tried it. Using the acquire/release instructions also means we do the right thing if there's code using the normal armv8 atomic sequences.
> 2. Add something like `TLI->shouldInsertTrailingFenceForAtomic(I)` to use in `AtomicExpandPass` and generate a Trailing Fence while keeping the original memory ordering. With the current patch we set the ordering to Monotonic and thus also need a leading fence.
This seems fine.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D141748/new/
https://reviews.llvm.org/D141748
More information about the llvm-commits
mailing list