[llvm-dev] Load combine pass
Artur Pilipenko via llvm-dev
llvm-dev at lists.llvm.org
Thu Sep 29 11:25:07 PDT 2016
> On 29 Sep 2016, at 21:16, Sanjoy Das <sanjoy at playingwithpointers.com> wrote:
>
> Hi Artur,
>
> Artur Pilipenko wrote:
> >> On 29 Sep 2016, at 21:01, Sanjoy Das<sanjoy at playingwithpointers.com> wrote:
> >>
> >> Hi Artur,
> >>
> >> Artur Pilipenko wrote:
> >>
> >>> BTW, do we really need to emit an atomic load if all the individual
> >>> components are bytes?
> >> Depends -- do you mean at the at the hardware level or at the IR
> >> level?
> >>
> >> If you mean at the IR level, then I think yes; since otherwise it is
> >> legal to do transforms that break byte-wise atomicity in the IR, e.g.:
> >>
> >> i32* ptr = ...
> >> i32 val = *ptr
> >>
> >> => // Since no threads can be legally racing on *ptr
> >>
> >> i32* ptr = ...
> >> i32 val0 = *ptr
> >> i32 val1 = *ptr
> >> i32 val = (val0& 1) | (val1& ~1);
> >>
> >>
> >> If you're talking about the hardware level, then I'm not sure; and my
> >> guess is that the answer is almost certainly arch-dependent.
> > I meant the case when we have a load by bytes pattern like this:
> > i8* p = ...
> > i8 b0 = *p++;
> > i8 b1 = *p++;
> > i8 b2 = *p++;
> > i8 b3 = *p++;
> > i32 result = b0<< 24 | b1<< 16 | b2<< 8 | b<< 0;
> >
> > When we fold it to a i32 load, should this load be atomic?
>
> If we do fold it to a non-atomic i32 load, then it would be legal for
> LLVM to do the IR transform I mentioned above. That breaks the
> byte-wise atomicity you had in the original program.
>
> That is, in:
>
> i8* p = ...
> i8 b0 = *p++;
> i8 b1 = *p++;
> i8 b2 = *p++;
> i8 b3 = *p++;
> // Note: I changed this to be little endian, and I've assumed
> // that we're compiling for a little endian system
> i32 result = b3<< 24 | b2<< 16 | b1<< 8 | b0<< 0;
>
> say all of p[0..3] are 0, and you have a thread racing to set b0 to
> -1. Then result can either be 0 or 255.
>
> However, say you first transform this to a non-atomic i32 load:
>
> i8* p = ...
> i32* p.i32 = (i32*)p
> i32 result = *p.i32
>
> and we do the transform above
>
> i8* p = ...
> i32* p.i32 = (i32*)p
> i32 result0 = *p.i32
> i32 result1 = *p.i32
> i32 result = (result0 & 1) | (result1 & ~1);
>
> then it is possible for result to be 254 (by result0 observing 0 and
> result observing 255).
I see. For some reason I was assuming byte-wise atomicity for non-atomic loads.
So, if any of the components are atomic, the resulting load must be atomic as well.
Artur
>
> -- Sanjoy
More information about the llvm-dev
mailing list