[llvm-dev] [IR] [CodeGen] Volatile causes i128 load/store to tear?

Sun May 16 02:51:04 PDT 2021

I think I've been imprecise with respect to specifying what I'm trying
to accomplish, by calling it "not tearing" which is borrowing from CPU
arch terminology (memory transactions), but for what I really want
it's sufficient to 'force' loads/stores of particular sizes just at
the ISA level.
What I wanted to accomplish is a way to write non-target-specific
C/C++ code which will map to the load/store sizes I specify all the
way to the target architecture's ISA, without getting broken down into
smaller sizes.
I have a compute kernel which I'm trying to modify. Instead of e.g.
loading offsets i + 0, i + 2, and i + 3 from an array of floats (i.e.
3 x 32-bit loads), I prefer a single 128-bit load and bitcasts.
Even after rewriting the code in terms of __uint128_t pointers and
doing the required bitcasts, I find that these large loads often get
broken down into smaller ones, which is highly undesirable for my
use-case.
I'm wondering whether there's a way to achieve this semi-portably,
without resorting to arch-specific code and/or lockless atomic
loads/stores (which are undesirable for me because they assert
alignment IIUC).

Thanks for the replies!
~Itay