[llvm-dev] [RFC] IR-level Region Annotations

Thu Jan 19 22:12:58 PST 2017

Hi Vikram,

On Thu, Jan 19, 2017 at 9:27 PM, Adve, Vikram Sadanand
<vadve at illinois.edu> wrote:
> Hi Sanjoy,
>
> Yes, that's exactly what we have been looking at recently here, but
> the region tags seem to make it possible to express the control flow
> as well, so I think we could start with reg ions+metadata, as Hal and

I'm not yet convinced that region tags are sufficient to model exotic
control flow.

(I don't know OpenMP so this is a copy-pasted-edited example)

Say we have:

void main() {
  #pragma omp parallel num_threads(4)
  {
    int i = omp_get_thread_num();
    int val;
    compute_something_into_val(&val, i);
    a[i] = val;
  }
}

I presume the (eventual) intended lowering is something like this (if
the intended lowering is different than this, and avoids the issue I'm
trying to highlight then my point is moot):

void main() {
  tok = llvm.experimental.intrinsic_a();

  int i = omp_get_thread_num();
  i32* val = alloca i32
  compute_something_into_val(val, i);
  a[i] = val;

  llvm.experimental.intrinsic_b(tok);
}

However, LLVM is free to hoist the alloca to the entry block:

void main() {
  i32* val = alloca i32
  tok = llvm.experimental.intrinsic_a();

  int i = omp_get_thread_num();
  compute_something_into_val(val, i);
  a[i] = val;

  llvm.experimental.intrinsic_b(tok);
}

and now you have a race between the four parallel forks.

The problem here is that nothing in the IR expresses that we have four
copies of the region running "at the same time".  In fact, such a
control flow is alien to LLVM today.

For instance, another evil optimization may turn:

void main() {
  int a[4];
  #pragma omp parallel num_threads(4)
  {
    int i = omp_get_thread_num();
    int val = compute_something_into_val(i);
    a[i] = val;
  }

  return a[0] + a[1];
}

to

void main() {
  int a[4];
  #pragma omp parallel num_threads(4)
  {
    int i = omp_get_thread_num();
    int val = compute_something_into_val(i);
    a[i] = val;
  }

  return undef;
}

since a[i] = val could have initialized at most one element in a.

Now you could say that the llvm.experimental.intrinsic_a and
llvm.experimental.intrinsic_b intrinsics are magic, and even such
"obvious" optimizations are not allowed to happen across them; but then
calls to these intrinsics is pretty fundamentally different from
"normal" calls, and are probably best modeled as new instructions.
You're going to have to do the same kind of auditing of passes either
way, and the only extra cost of a new instruction is the extra bitcode
reading / writing code.

I hope I made sense.

-- Sanjoy

> Xinmin proposed, and then figure out what needs to be first class
> instructions.

>
> --Vikram Adve
>
>
>
>> On Jan 19, 2017, at 11:03 PM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote:
>>
>> Hi,
>>
>> My bias is to use both (b) and (d), since they have complementary
>> strengths.  We should use (b) for expressing concepts that can't be
>> semantically modeled as a call or invoke (this branch takes both its
>> successors), and (d) for expressing things that can be (this call may
>> never return), and annotation like things (this region (denoted by
>> def-use of a token) is a reduction).
>>
>> I don't grok OpenMP, but perhaps we can come with one or two
>> "generalized control flow"-type instructions that can be used to model
>> the non-call/invoke like semantics we'd like LLVM to know about, and
>> model the rest with (d)?
>>
>> -- Sanjoy
>>
>> On Thu, Jan 19, 2017 at 8:28 PM, Hal Finkel via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>>>
>>> On 01/19/2017 03:36 PM, Mehdi Amini via llvm-dev wrote:
>>>
>>>
>>> On Jan 19, 2017, at 1:32 PM, Daniel Berlin <dberlin at dberlin.org> wrote:
>>>
>>>
>>>
>>>> On Thu, Jan 19, 2017 at 1:12 PM, Mehdi Amini <mehdi.amini at apple.com> wrote:
>>>>
>>>>
>>>> On Jan 19, 2017, at 12:04 PM, Daniel Berlin <dberlin at dberlin.org> wrote:
>>>>
>>>>
>>>>
>>>> On Thu, Jan 19, 2017 at 11:46 AM, Mehdi Amini via llvm-dev
>>>> <llvm-dev at lists.llvm.org> wrote:
>>>>>
>>>>>
>>>>>> On Jan 19, 2017, at 11:36 AM, Adve, Vikram Sadanand via llvm-dev
>>>>>> <llvm-dev at lists.llvm.org> wrote:
>>>>>>
>>>>>> Hi Johannes,
>>>>>>
>>>>>>> I am especially curious where you get your data from. Tapir [0] (and
>>>>>>> to
>>>>>>> some degree PIR [1]) have shown that, counterintuitively, only a few
>>>>>>> changes
>>>>>>> to LLVM passes are needed. Tapir was recently used in an MIT class
>>>>>>> with a
>>>>>>> lot of students and it seemed to work well with only minimal changes
>>>>>>> to analysis and especially transformation passes.
>>>>>>
>>>>>> TAPIR is an elegant, small extension and, in particular, I think the
>>>>>> idea of asymmetric parallel tasks and control flow is a clever way to
>>>>>> express parallelism with serial semantics, as in Cilk.  Encoding the control
>>>>>> flow extensions as explicit instructions is orthogonal to that, though
>>>>>> arguably more elegant than using region tags + metadata.
>>>>>>
>>>>>> However, Cilk is a tiny language compared with the full complexity of
>>>>>> other languages, like OpenMP.  To take just one example, TAPIR cannot
>>>>>> express the ORDERED construct of OpenMP.  A more serious concern, IMO, is
>>>>>> that TAPIR (like Cilk) requires serial semantics, whereas there are many
>>>>>> parallel languages, OpenMP included, that do not obey that restriction.
>>>>>> Third, OpenMP has *numerous* clauses, e.g., REDUCTION or PRIVATE,  that are
>>>>>> needed because without that, you’d be dependent on fundamentally hard
>>>>>> compiler analyses to extract the same information for satisfactory parallel
>>>>>> performance; realistic applications cannot depend on the success of such
>>>>>> analyses.
>>>>>
>>>>> I agree with this, but I’m also wondering if it needs to be first class
>>>>> in the IR?
>>>>> For example we know our alias analysis is very basic, and C/C++ have a
>>>>> higher constraint thanks to their type system, but we didn’t inject this
>>>>> higher level information that helps the optimizer as first class IR
>>>>> constructs.
>>>>
>>>>
>>>> FWIW, while i agree with the general point, i wouldn't use this example.
>>>> Because we pretty much still suffer to this day because of it (both in AA,
>>>> and devirt, and ...)  :)
>>>> We can't always even tell fields apart
>>>>
>>>>
>>>> Is it inherent to the infrastructure, i.e. using metadata instead of first
>>>> class IR construct or is it just a “quality of implementation” issue?
>>>
>>> Not to derail this conversation:
>>>
>>> IMHO, At some point there is no real difference :)
>>>
>>> Because otherwise, everything is a QOI issue.
>>>
>>> IE if it's super tricky to get metadata that works well and works right,
>>> doesn't get lost, etc, and that's inherent to using metadata, that to me is
>>> not a QOI issue.
>>>
>>> So could it be done with metadata? Probably?
>>> But at the same time,  if it had been done with more first class constructs,
>>> it would have happened years ago  and been much lower cost.
>>>
>>>
>>> This is what I meant by “inherent to the infrastructure”, thanks for
>>> clarifying.
>>>
>>>
>>> To clarify, we were proposing metadata that is used as arguments to the
>>> region-annotation intrinsics. This metadata has the nice property that it
>>> does not get dropped (so it is just being used as a way of encoding whatever
>>> data structures are necessary without predefining a syntactic schema).
>>>
>>> -Hal
>>>
>>>
>>> —
>>> Mehdi
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>>
>>> --
>>> Hal Finkel
>>> Lead, Compiler Technology and Programming Languages
>>> Leadership Computing Facility
>>> Argonne National Laboratory
>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>