[PATCH] D56534: [Verifier] Add verification of unaligned atomic load/store

Thu Jan 17 20:12:47 PST 2019

jyknight added a comment.

>> I'm not sure I agree: those are unordered memsets, which are basically @reames' memsets. I think it's up to him to give them semantics that are useful for his target.
> 
> Thank you. I was about to say pretty much the same. In particular, trying to claim that a transform is incorrect and needs reverted based on a proposed reading of a change to the LangRef which has not been posted for review, much less accepted seems to be going a definite step too far.

I must apologize for having given the impression I was actually calling for a revert now. I did not intend that -- we most definitely should not be hasty here! I meant "need to be reverted" in a more general sense.

That said, it's not the case that unordered memset/memcpy should have semantics defined to as "whatever's useful to reames". :) They need well-defined semantics at the IR level, compatible with other atomics' semantics. And they have them! -- they are intended to have the same behavior as if you had written a series of <len> / <elementsize> unordered atomic load/store instructions with size <elementsize>.

So, then, the question is: what is safe to do with atomic unordered instructions? Shall we merge these two instructions into one 32-bit store instruction?

  store atomic i16 4, i16* getelementptr ([2 x i16], [2 x i16]* @Z, i32 0, i32 0) unordered, align 4
  store atomic i16 3, i16* getelementptr ([2 x i16], [2 x i16]* @Z, i32 0, i32 1) unordered, align 2

We currently do not. On X86, for unordered atomics of those sizes and alignments, I'm pretty sure the above would in fact be a safe and beneficial optimization.

Similarly, merging two 32-bit stores into a (properly 8-byte-aligned) 64-bit store should also be safe -- but only on the Pentium and later. Otherwise, that's be an incorrect transformation, because the 8-byte atomic op ends up needing to do a libcall, which may not be atomic w.r.t. the 4-byte atomic read/write machine instruction.

My proposal is that at the IR level, we should document that it's unsafe to mix and match sizes on atomic operations, and behave as such in all our IR optimization passes. But, that doesn't prohibit teaching SelectionDAG load/store merging to do the above optimization in target configurations where it's known to be safe to do so.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D56534/new/

https://reviews.llvm.org/D56534