[llvm-dev] Load combine pass

Sanjoy Das via llvm-dev llvm-dev at lists.llvm.org
Thu Sep 29 11:16:59 PDT 2016


Hi Artur,

Artur Pilipenko wrote:
 >> On 29 Sep 2016, at 21:01, Sanjoy Das<sanjoy at playingwithpointers.com> 
  wrote:
 >>
 >> Hi Artur,
 >>
 >> Artur Pilipenko wrote:
 >>
 >>> BTW, do we really need to emit an atomic load if all the individual
 >>> components are bytes?
 >> Depends -- do you mean at the at the hardware level or at the IR
 >> level?
 >>
 >> If you mean at the IR level, then I think yes; since otherwise it is
 >> legal to do transforms that break byte-wise atomicity in the IR, e.g.:
 >>
 >>   i32* ptr = ...
 >>   i32  val = *ptr
 >>
 >> =>   // Since no threads can be legally racing on *ptr
 >>
 >>   i32* ptr = ...
 >>   i32 val0 = *ptr
 >>   i32 val1 = *ptr
 >>   i32 val = (val0&  1) | (val1&  ~1);
 >>
 >>
 >> If you're talking about the hardware level, then I'm not sure; and my
 >> guess is that the answer is almost certainly arch-dependent.
 > I meant the case when we have a load by bytes pattern like this:
 > i8* p = ...
 > i8 b0 = *p++;
 > i8 b1 = *p++;
 > i8 b2 = *p++;
 > i8 b3 = *p++;
 > i32 result = b0<<  24 | b1<<  16 | b2<<  8 | b<<  0;
 >
 > When we fold it to a i32 load, should this load be atomic?

If we do fold it to a non-atomic i32 load, then it would be legal for
LLVM to do the IR transform I mentioned above.  That breaks the
byte-wise atomicity you had in the original program.

That is, in:

   i8* p = ...
   i8 b0 = *p++;
   i8 b1 = *p++;
   i8 b2 = *p++;
   i8 b3 = *p++;
   // Note: I changed this to be little endian, and I've assumed
   // that we're compiling for a little endian system
   i32 result = b3<< 24 | b2<<  16 | b1<<  8 | b0<<  0;

say all of p[0..3] are 0, and you have a thread racing to set b0 to
-1.  Then result can either be 0 or 255.

However, say you first transform this to a non-atomic i32 load:

   i8* p = ...
   i32* p.i32 = (i32*)p
   i32 result = *p.i32

and we do the transform above

   i8* p = ...
   i32* p.i32 = (i32*)p
   i32 result0 = *p.i32
   i32 result1 = *p.i32
   i32 result = (result0 & 1) | (result1 & ~1);

then it is possible for result to be 254 (by result0 observing 0 and
result observing 255).

-- Sanjoy


More information about the llvm-dev mailing list