[llvm-dev] [IR] [CodeGen] Volatile causes i128 load/store to tear?

Steve (Numerics) Canon via llvm-dev llvm-dev at lists.llvm.org
Thu May 6 09:24:11 PDT 2021


Everything Joshua said, but also please note that there’s no “direction of not tearing” w.r.t. int128 on x86_64. The architecture guarantees that 1, 2, 4, and 8 byte accesses to normal memory that do not cross a cache line are [single-copy] atomic, but makes no mention of 16 byte or wider accesses (section 8.1.1 in volume 3A of the SDM). The only architecturally guaranteed atomic 16B access in x86_64 is CMPXCHG16B.

– Steve

> On May 6, 2021, at 11:35 AM, Cranmer, Joshua via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> 
> The semantics of `volatile` in the C11/C++11 memory model are emphatically orthogonal to requirements for atomic (non-tearing) loads/stores, so you cannot and should not rely on any assumption that volatile will guarantee non-tearing if it can be done.
> 
> The reason why the `volatile` causes load tearing is that the x86 backend does not accept i128 as a legal type. Consequently, loads and stores for i128 are always broken up into two i64 loads/stores instead. However, there is a DAG combine that will merge two adjacent i64 loads/stores into an i128 load/store, which doesn't kick in for volatile loads/stores because that means optimizing a volatile load/store.
> 
> Note that if you change the i128 type to one that is legal--say <2 x double>, you indeed do get both the volatile and non-volatile version implemented as an xmm mov instruction.
> 
>> -----Original Message-----
>> From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of Itay
>> Bookstein via llvm-dev
>> Sent: Thursday, May 6, 2021 4:46
>> To: llvm-dev <llvm-dev at lists.llvm.org>
>> Subject: [llvm-dev] [IR] [CodeGen] Volatile causes i128 load/store to tear?
>> 
>> Hey all,
>> 
>> I've encountered a codegen peculiarity on both X86-64 and PPC64LE on top
>> of trunk:
>> 
>> void foo(__uint128_t *p, __uint128_t *q) { *p = *q; } void bar(volatile
>> __uint128_t *p, __uint128_t *q) { *p = *q; }
>> 
>> On gcc trunk x86-64 -O3, both of these compile to movdqa, movaps, ret
>> (https://clang.godbolt.org/z/xvs8x646T).
>> On clang trunk x86-64 -O3, the first compiles to movaps, movaps, ret, and the
>> second tears into 4 mov-s (https://clang.godbolt.org/z/zfM9MMrbM).
>> On clang trunk power64le, the first compiles to lxvd2x, stxvd2x, blr, and the
>> second tears into 2x ld, 2x std, blr (https://clang.godbolt.org/z/7E7zG4Yfz).
>> 
>> I'm a bit surprised by this, since I'd expect volatile to at least "nudge the
>> compiler along" in the direction of not tearing, rather than the other way
>> around (e.g. how Linux uses volatile to implement
>> READ_ONCE/WRITE_ONCE).
>> 
>> I realize that the semantics of volatile might be a bit fuzzier when applied to
>> non-standard types such as __uint128_t (at the level of clang), but as far as I
>> can tell at the IR level these two just compile to load/store (volatile) i128.
>> Would this be considered a CodeGen issue?
>> 
>> Thanks,
>> ~Itay
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev



More information about the llvm-dev mailing list