[llvm-dev] Load combine pass

Thu Sep 29 11:25:07 PDT 2016

> On 29 Sep 2016, at 21:16, Sanjoy Das <sanjoy at playingwithpointers.com> wrote:
> 
> Hi Artur,
> 
> Artur Pilipenko wrote:
> >> On 29 Sep 2016, at 21:01, Sanjoy Das<sanjoy at playingwithpointers.com>  wrote:
> >>
> >> Hi Artur,
> >>
> >> Artur Pilipenko wrote:
> >>
> >>> BTW, do we really need to emit an atomic load if all the individual
> >>> components are bytes?
> >> Depends -- do you mean at the at the hardware level or at the IR
> >> level?
> >>
> >> If you mean at the IR level, then I think yes; since otherwise it is
> >> legal to do transforms that break byte-wise atomicity in the IR, e.g.:
> >>
> >>   i32* ptr = ...
> >>   i32  val = *ptr
> >>
> >> =>   // Since no threads can be legally racing on *ptr
> >>
> >>   i32* ptr = ...
> >>   i32 val0 = *ptr
> >>   i32 val1 = *ptr
> >>   i32 val = (val0&  1) | (val1&  ~1);
> >>
> >>
> >> If you're talking about the hardware level, then I'm not sure; and my
> >> guess is that the answer is almost certainly arch-dependent.
> > I meant the case when we have a load by bytes pattern like this:
> > i8* p = ...
> > i8 b0 = *p++;
> > i8 b1 = *p++;
> > i8 b2 = *p++;
> > i8 b3 = *p++;
> > i32 result = b0<<  24 | b1<<  16 | b2<<  8 | b<<  0;
> >
> > When we fold it to a i32 load, should this load be atomic?
> 
> If we do fold it to a non-atomic i32 load, then it would be legal for
> LLVM to do the IR transform I mentioned above.  That breaks the
> byte-wise atomicity you had in the original program.
> 
> That is, in:
> 
>  i8* p = ...
>  i8 b0 = *p++;
>  i8 b1 = *p++;
>  i8 b2 = *p++;
>  i8 b3 = *p++;
>  // Note: I changed this to be little endian, and I've assumed
>  // that we're compiling for a little endian system
>  i32 result = b3<< 24 | b2<<  16 | b1<<  8 | b0<<  0;
> 
> say all of p[0..3] are 0, and you have a thread racing to set b0 to
> -1.  Then result can either be 0 or 255.
> 
> However, say you first transform this to a non-atomic i32 load:
> 
>  i8* p = ...
>  i32* p.i32 = (i32*)p
>  i32 result = *p.i32
> 
> and we do the transform above
> 
>  i8* p = ...
>  i32* p.i32 = (i32*)p
>  i32 result0 = *p.i32
>  i32 result1 = *p.i32
>  i32 result = (result0 & 1) | (result1 & ~1);
> 
> then it is possible for result to be 254 (by result0 observing 0 and
> result observing 255).
I see. For some reason I was assuming byte-wise atomicity for non-atomic loads.

So, if any of the components are atomic, the resulting load must be atomic as well.

Artur 
> 
> -- Sanjoy