[llvm-dev] Load combine pass

Thu Sep 29 17:16:33 PDT 2016


On 09/29/2016 11:25 AM, Artur Pilipenko wrote:
>> On 29 Sep 2016, at 21:16, Sanjoy Das <sanjoy at playingwithpointers.com> wrote:
>>
>> Hi Artur,
>>
>> Artur Pilipenko wrote:
>>>> On 29 Sep 2016, at 21:01, Sanjoy Das<sanjoy at playingwithpointers.com>  wrote:
>>>>
>>>> Hi Artur,
>>>>
>>>> Artur Pilipenko wrote:
>>>>
>>>>> BTW, do we really need to emit an atomic load if all the individual
>>>>> components are bytes?
>>>> Depends -- do you mean at the at the hardware level or at the IR
>>>> level?
>>>>
>>>> If you mean at the IR level, then I think yes; since otherwise it is
>>>> legal to do transforms that break byte-wise atomicity in the IR, e.g.:
>>>>
>>>>    i32* ptr = ...
>>>>    i32  val = *ptr
>>>>
>>>> =>   // Since no threads can be legally racing on *ptr
>>>>
>>>>    i32* ptr = ...
>>>>    i32 val0 = *ptr
>>>>    i32 val1 = *ptr
>>>>    i32 val = (val0&  1) | (val1&  ~1);
>>>>
>>>>
>>>> If you're talking about the hardware level, then I'm not sure; and my
>>>> guess is that the answer is almost certainly arch-dependent.
>>> I meant the case when we have a load by bytes pattern like this:
>>> i8* p = ...
>>> i8 b0 = *p++;
>>> i8 b1 = *p++;
>>> i8 b2 = *p++;
>>> i8 b3 = *p++;
>>> i32 result = b0<<  24 | b1<<  16 | b2<<  8 | b<<  0;
>>>
>>> When we fold it to a i32 load, should this load be atomic?
>> If we do fold it to a non-atomic i32 load, then it would be legal for
>> LLVM to do the IR transform I mentioned above.  That breaks the
>> byte-wise atomicity you had in the original program.
>>
>> That is, in:
>>
>>   i8* p = ...
>>   i8 b0 = *p++;
>>   i8 b1 = *p++;
>>   i8 b2 = *p++;
>>   i8 b3 = *p++;
>>   // Note: I changed this to be little endian, and I've assumed
>>   // that we're compiling for a little endian system
>>   i32 result = b3<< 24 | b2<<  16 | b1<<  8 | b0<<  0;
>>
>> say all of p[0..3] are 0, and you have a thread racing to set b0 to
>> -1.  Then result can either be 0 or 255.
>>
>> However, say you first transform this to a non-atomic i32 load:
>>
>>   i8* p = ...
>>   i32* p.i32 = (i32*)p
>>   i32 result = *p.i32
>>
>> and we do the transform above
>>
>>   i8* p = ...
>>   i32* p.i32 = (i32*)p
>>   i32 result0 = *p.i32
>>   i32 result1 = *p.i32
>>   i32 result = (result0 & 1) | (result1 & ~1);
>>
>> then it is possible for result to be 254 (by result0 observing 0 and
>> result observing 255).
> I see. For some reason I was assuming byte-wise atomicity for non-atomic loads.
>
> So, if any of the components are atomic, the resulting load must be atomic as well.
We've talked about the need for element wise atomic vectors in other 
contexts.  This sounds like maybe we need a element wise atomic notion 
on non-vectors as well.  The "element type" is merely a byte.  
Alternatively, we could re-frame widening as producing a vector or 
struct type which is element wise atomic, but that seems like a lot of 
complexity.

Philip