[LLVMdev] LLVM Concurrency and Undef

Mon Aug 22 17:40:47 PDT 2011

On Mon, Aug 22, 2011 at 5:22 PM, Jianzhou Zhao <jianzhou at seas.upenn.edu> wrote:
> On Mon, Aug 22, 2011 at 7:35 PM, Jeffrey Yasskin <jyasskin at google.com> wrote:
>> On Mon, Aug 22, 2011 at 4:20 PM, Jianzhou Zhao <jianzhou at seas.upenn.edu> wrote:
>>> On Mon, Aug 22, 2011 at 6:49 PM, Eli Friedman <eli.friedman at gmail.com> wrote:
>>>> On Mon, Aug 22, 2011 at 3:40 PM, Jianzhou Zhao <jianzhou at seas.upenn.edu> wrote:
>>>>> On Mon, Aug 22, 2011 at 6:08 PM, Eli Friedman <eli.friedman at gmail.com> wrote:
>>>>>> On Mon, Aug 22, 2011 at 2:49 PM, Santosh Nagarakatte
>>>>>> <santosh.nagarakatte at gmail.com> wrote:
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I have been trying to understand the use of undef in both sequential
>>>>>>> and concurrent programs.
>>>>>>>
>>>>>>> >From the LLVM Language Reference Manual, I see the following
>>>>>>> definition of undef.
>>>>>>> "Undef can be used anywhere a constant is expected, and indicates that
>>>>>>> the user of the value may receive an unspecified bit-pattern".
>>>>>>>  LLVM Language Reference manual also demonstrates how optimizers can
>>>>>>> use these undef values to  optimize the program.
>>>>>>>
>>>>>>> However, on the other hand, with the LLVM Atomics and Concurrency
>>>>>>> Guide states that
>>>>>>> If code accesses a memory location from multiple threads at the same
>>>>>>> time, the resulting loads return 'undef'.
>>>>>>> This is different from the C++ memory model, which provides undefined
>>>>>>> behavior. What is the rationale for returning an undef on racing
>>>>>>> reads?
>>>>>>>
>>>>>>> LLVM Atomics and Concurrency guide also states the following
>>>>>>> "Note that speculative loads are allowed; a load which is part of a
>>>>>>> race returns undef, but does not have undefined behavior"
>>>>>>>
>>>>>>> If the speculative loads returns an undef and the returned value is
>>>>>>> used, then it results in an undefined behavior. Am I correct?
>>>>>>
>>>>>> It behaves like any other undef value... which do often lead to
>>>>>> undefined behavior.
>>>>>>
>>>>>>> If so, what is the purpose of returning an undef with a speculative load?
>>>>>>> Is it to ensure that the subsequent uses of the value of the
>>>>>>> speculatively introduced load is caught/detected by the optimization?
>>>>>>
>>>>>> The point is primarily to allow optimizations like LICM to introduce
>>>>>> loads whose value is never used.  It also keeps consistent semantics
>>>>>> through CodeGen, where some targets widen loads.
>>>>>>
>>>>>>> Is it possible to separate the "undef" in a sequential setting and
>>>>>>> "undef" with speculative loads in a concurrent setting with separate
>>>>>>> undefs?
>>>>>>
>>>>>> The intention is that they should have the same semantics.
>>>>>
>>>>> Suppose there are three threads T1, T2 and T3,
>>>>> T1 (S1 )stores to a location l as non-atomic,
>>>>> T2 then (S2)stores to l as SC-atomic,
>>>>> later T3 (L3)loads from l as SC-atomic.
>>>>>
>>>>> I think the load @ T3 should return undef, since it can see both
>>>>> writes from T1 T2. Then the question is if the SC store @ T2 --- S2
>>>>> and the SC load @ T3 --- L3 introduces an acq/rel (synchronized-with)
>>>>> edge.
>>>>>
>>>>> This will affect if later conflicting accesses are ordered or not, and
>>>>> whether memory accesses are ordered makes load return undef or not.
>>>>>
>>>>> If the S2 and L3 still introduces an SW edge, the next question is
>>>>> suppose there is a later SC-load L3' @ T3, does it also return undef?
>>>>> But this potentially makes the SC atomic sequence S2 L3 L3'
>>>>> inconsistent --- later SC loads can read different writes from earlier
>>>>> loads if there are no SC-stores in between.
>>>>>
>>>>> So I think data-races SC/acq/rel atomics cannot introduce SW edges.
>>>>> Intuitively if the locations designed for locks are used by non-atomic
>>>>> memory accesses, the locks cannot behave correctly. Is it correct?
>>>>
>>>> Yes, that is correct.  I am now convinced that it is worthwhile to
>>>> include the definition of how SW edges form in LangRef.
>>>
>>> Some of the descriptions in LangRef and
>>> http://llvm.org/docs/Atomics.html are like
>>> 1) monotonic atomic to the same location are totally ordered
>>> 2) all SC atomic to to any location are totally ordered
>>> 3) monotonic loads should see monotonic stores monotonically
>>> ...
>>>
>>> However, these conditions can be broken if any of them are
>>> 'interferenced' by non-atomic memory accesses, since that can turn
>>> some of the atomic loads to be undef (with any values). So it is
>>> unlikely forming a total order any more.
>>>
>>> So, do these properties hold only for data race free atomics?  For example,
>>>
>>> T1:                                         T2:
>>> a:  sc store 1 to location x
>>>                                              b: non-atomic store 2 to x
>>>                                              c: sc store 3 to y
>>> d:  r1 = sc load from y
>>> e:  r2 = sc load from x
>>>
>>> Here, r2 still returns undef, since e can see a non-atomic store
>>> although from happens-before. Then a c d e may not be totally ordered.
>>
>> Say r1==3, so we're guaranteed that 'e' happens after 'b'. Since
>> neither a nor b happens before the other, and at least one of them is
>> not atomic, r2==undef. The seq_cst ordering a, c, d, e is consistent
>> with the observable results. That's true whether printing r2 prints 1,
>> 2, or 47 since undef can be observed as any value.
>
> My confusion is the difference between SC and AcquireRelease. The text says
>
> ...
> SequentiallyConsistent
>
> SequentiallyConsistent (seq_cst in IR) provides Acquire semantics for
> loads and Release semantics for stores. Additionally, it guarantees
> that a total ordering exists between all SequentiallyConsistent
> operations.
> ...
>
> Here, what does the total ordering mean? Does it mean given a sequence
> of SC atomic in the total order, an SC load should return the recent
> SC store from the sequence? This is like the meaning of Sequential
> consistent memory model---any SC stores should be visible to all SC
> loads immediately in the same order.
>
> In the above program, we have the SC atomics
>
> sc store 1 to x
> sc store 3 to y
> sc load from y = 3
> sc load from x = 47 or undef
>
> Here, no matter how we order them, there is no way to place a store 47
> to x before load x with 47. Did I make a wrong assumption?

You lose all the atomic guarantees when there's a non-atomic store
involved.  See http://llvm.org/docs/LangRef.html#memmodel ; in
particular, "if R is atomic, and all the writes Rbyte may see are
atomic, it chooses one of the values written [...] otherwise Rbyte
returns undef."

-Eli