[LLVMdev] LLVM Concurrency and Undef

Mon Aug 22 17:29:47 PDT 2011

On Mon, Aug 22, 2011 at 7:27 PM, Jeffrey Yasskin <jyasskin at google.com> wrote:
> On Mon, Aug 22, 2011 at 3:49 PM, Eli Friedman <eli.friedman at gmail.com> wrote:
>> On Mon, Aug 22, 2011 at 3:40 PM, Jianzhou Zhao <jianzhou at seas.upenn.edu> wrote:
>>> On Mon, Aug 22, 2011 at 6:08 PM, Eli Friedman <eli.friedman at gmail.com> wrote:
>>>> On Mon, Aug 22, 2011 at 2:49 PM, Santosh Nagarakatte
>>>> <santosh.nagarakatte at gmail.com> wrote:
>>>>> Hi all,
>>>>>
>>>>> I have been trying to understand the use of undef in both sequential
>>>>> and concurrent programs.
>>>>>
>>>>> >From the LLVM Language Reference Manual, I see the following
>>>>> definition of undef.
>>>>> "Undef can be used anywhere a constant is expected, and indicates that
>>>>> the user of the value may receive an unspecified bit-pattern".
>>>>>  LLVM Language Reference manual also demonstrates how optimizers can
>>>>> use these undef values to  optimize the program.
>>>>>
>>>>> However, on the other hand, with the LLVM Atomics and Concurrency
>>>>> Guide states that
>>>>> If code accesses a memory location from multiple threads at the same
>>>>> time, the resulting loads return 'undef'.
>>>>> This is different from the C++ memory model, which provides undefined
>>>>> behavior. What is the rationale for returning an undef on racing
>>>>> reads?
>>>>>
>>>>> LLVM Atomics and Concurrency guide also states the following
>>>>> "Note that speculative loads are allowed; a load which is part of a
>>>>> race returns undef, but does not have undefined behavior"
>>>>>
>>>>> If the speculative loads returns an undef and the returned value is
>>>>> used, then it results in an undefined behavior. Am I correct?
>>>>
>>>> It behaves like any other undef value... which do often lead to
>>>> undefined behavior.
>>>>
>>>>> If so, what is the purpose of returning an undef with a speculative load?
>>>>> Is it to ensure that the subsequent uses of the value of the
>>>>> speculatively introduced load is caught/detected by the optimization?
>>>>
>>>> The point is primarily to allow optimizations like LICM to introduce
>>>> loads whose value is never used.  It also keeps consistent semantics
>>>> through CodeGen, where some targets widen loads.
>>>>
>>>>> Is it possible to separate the "undef" in a sequential setting and
>>>>> "undef" with speculative loads in a concurrent setting with separate
>>>>> undefs?
>>>>
>>>> The intention is that they should have the same semantics.
>>>
>>> Suppose there are three threads T1, T2 and T3,
>>> T1 (S1 )stores to a location l as non-atomic,
>>> T2 then (S2)stores to l as SC-atomic,
>>> later T3 (L3)loads from l as SC-atomic.
>>>
>>> I think the load @ T3 should return undef, since it can see both
>>> writes from T1 T2. Then the question is if the SC store @ T2 --- S2
>>> and the SC load @ T3 --- L3 introduces an acq/rel (synchronized-with)
>>> edge.
>>>
>>> This will affect if later conflicting accesses are ordered or not, and
>>> whether memory accesses are ordered makes load return undef or not.
>>>
>>> If the S2 and L3 still introduces an SW edge, the next question is
>>> suppose there is a later SC-load L3' @ T3, does it also return undef?
>>> But this potentially makes the SC atomic sequence S2 L3 L3'
>>> inconsistent --- later SC loads can read different writes from earlier
>>> loads if there are no SC-stores in between.
>>>
>>> So I think data-races SC/acq/rel atomics cannot introduce SW edges.
>>> Intuitively if the locations designed for locks are used by non-atomic
>>> memory accesses, the locks cannot behave correctly. Is it correct?
>>
>> Yes, that is correct.  I am now convinced that it is worthwhile to
>> include the definition of how SW edges form in LangRef.
>
> It's already there:
>
> "We define a happens-before partial order as the least partial order that
>
> * Is a superset of single-thread program order, and
> * When a synchronizes-with b, includes an edge from a to b.
> Synchronizes-with pairs are introduced by platform-specific
> techniques, like pthread locks, thread creation, thread joining, etc.,
> and by atomic instructions. (See also Atomic Memory Ordering
> Constraints)."
>
> ...
>
> "acquire - In addition to the guarantees of monotonic, if this
> operation reads a value written by a release atomic operation, it
> synchronizes-with that operation."
>
> (Strictly, the release operation synchronizes-with the acquire, not
> the other way around.)
>
> Since atomic/non-atomic races are defined to return undef from the
> load, even if the load has seq_cst ordering, the load never reads a
> value written, so none of the stores synchronize with the load.

A undef can be replaced by any concrete value. If the undef returned
from the racy SC load happens to be instantiated by a value of the
latest SC store, does it consider as "... reads a value written by
..."? I think our answer is still no, right?

>
> The text does say that all seq_cst loads and stores participate in the
> global seq_cst ordering that's compatible with the happens-before
> ordering, but that doesn't imply that happens-before is a superset of
> the seq_cst ordering. I'm not sure whether any paradoxes arise from
> allowing racy seq_cst loads to return undef, but I haven't seen any
> examples so far.
>
> I don't object to clarifying the text, of course.
>
> Jeffrey
>

-- 
Jianzhou