[llvm-dev] RFC: An Extension Mechanism for Parallel Compilers Based on LLVM

Mon Oct 22 20:20:06 PDT 2018

>>>>>> Hi,
>>>>>> We have experience with a similar approach in our compiler at Cray -- we represent parallel regions with begin/end intrinsic operators.  This approach does potentially enable some optimization opportunities, but we've also experienced some real challenges in getting this approach to work properly.

It would be interesting to learn more.  Perhaps we could have you join one of our upcoming hangouts?

>>>>>> The proposal seems to touch on many of the concerns we would raise, but we still have some questions.

>>>>>> The main challenges we have experienced are: (1) maintaining the required region structure and nesting; (2) preventing illegal motion across region boundaries; and (3) establishing and maintaining proper object allocation/lifetime.  It seems that a region will begin as a single-entry, single-exit block of code; the single-entry property is required to be maintained throughout transformations, but is it correct that the single-exit property is allowed to be relaxed?

This is not yet entirely clear, and we decided to punt and put it in the requirement until we can decide the best course of action. OpenMP seems to need single-exit regions; Tapir and HPVM do not, and the parallel IR we are planning to develop (which will be a combination of Tapir of HPVM) should not either.

>>>>>> If so, then I assume that the multiple exit points would still need to "collectively post dominate" the single entry point?  That is, any path leaving a region must pass through an exit point (and that exit point must be dominated by the entry point).  Is this correct?

Correct.

>>>>>> Do you have thoughts about how to handle issues that arise from unreachable code or unusual control flow?  For example: an infinite loop in a region could result in the end markr being deleted, creating a region with an entry but no exit.  Is this going to be allowed (and handled), or prohibited (and if so, how)?  What about control flow that might exit from a region, like exception handling?  Will this be prohibited, or handled in some manner?

We did discuss these issues, and an initial consensus was that the extension mechanism wouldn’t cause *new* problems in these scenarios, over and above what the parallel IR needs to deal with.  In particular, we couldn’t think of a legal reason that an LLVM pass would introduce any of these situations, e.g., an infinite loop or program exit if it didn’t exist before.  Of course, any existing cases of such behavior must be handled by the parallel IR according to whatever semantics any parallel language defines (or allows by omission) for it.  A pass might certainly delete a marker that might be unreachable, but the PREPARE phase would be responsible for detecting that at construction and could prevent it in a number of ways.

>>>>>> Are region entry/exit intrinsics allowed to be reordered (with respect to other region entry/exit intrinsics)?  If so, is there any mechanism that would prevent two regions from becoming interleaved (i.e., overlapped but not properly nested)?

Regions are structured and properly nested.  I don’t know any legal transformation that could convert that into being *not* properly nested – are there any?

>>>>>> The proposal states that SSA values are not allowed to be defined in one region and used in another.  Is a nested region considered the same or a different region?  That is, can an SSA value defined in one region be referenced in a nested region?

No.

>>>>>> What mechanisms would prevent an SSA value from flowing out of a region?  The language about dominator relations and direct SSA references in C(i) in the SOUNDNESS section seems to suggest that a region will have an implicit control-flow path from the start of the region to the exit (i.e., around the body of the region).  Is this correct?

Yes.  See the Tapir and HPVM examples in Section “PROPOSED EXTENSION MECHANISM.”

>>>>>> (If not, what prevents an SSA value defined in one region to be directly referenced in another region?)  Also, the proposal mentions that enforcing property C(i) would require changes to PHI-node construction.  Will there be additional checks/asserts added to ensure that future transformations are not allowed to introduce these prohibited PHI nodes?

This is open to discussion.  A parallel compiler built on top of this could choose how best to do this.  The Verify pass can be augmented to detect such cases.  The EXTRACT pass of course will do so, as well.

>>>>>> Is the intent that all necessary privatization will be performed by the PREPARE translation?  Or, is it possible for a region to contain "shared" accesses with corresponding meta-data indicating they will eventually become private accesses?  The big challenge with the latter is that the IR cannot be interpreted without also considering the meta-data, since the meaning is entirely different.  For example, the address of an object will change when the object is converted from shared to private.

We are trying to avoid *requiring* reliance on metadata for any correctness requirements, so Yes, the PREPARE translator must perform several such transformations.  Note that a parallel compiler may choose to do something less robust or more ad hoc.

>>>>>> Is the proposed framework intended/expected to support heterogeneous compilation?  That is, could different regions be compiled for different targets?

Yes, to both.  Both Intel’s OpenMP compiler and the HPVM compiler support offloading to GPUs, and we are currently working on offloading to FPGAs and ML accelerators in the HPVM project.  (HPVM is yet to be ported to these intrinsics, but that would not affect the HPVM functionality itself.)

>>>>>> Has there been any thought about adopting a model that represents regions as outlined functions (similar to how OpenMP is represented today), but abstracts the "fork" call into a general LLVM intrinsic?  I believe a lot or all of the same optimizations could be performed with this representation, though the analysis and transformations would necessarily be inter-procedural (and probably require writing new passes, rather than leveraging existing passes).  But, the main advantage is that it would be less invasive to the standard LLVM IR and infrastructure.

One of our major goals was that parallel compilers should benefit from standard scalar LLVM analyses and optimizations.  Rewriting all those to be interprocedural seems to be a very large effort. The changes we require to the LLVM infrastructure seem small in comparison, and in many cases can be engineered so they do not affect other clients (e.g., by enabling a parallel compiler to pass in extra pass options to identify when certain restrictions should be enforced).

>>>>>> Is there a middle ground, with first-class support for nested functions or lambdas?  This would allow outlining a region into a nested function or lambda.  Optimizations on this form would still require inter-procedural analysis and transformation, but it could be limited to a smaller scope (i.e., not an entire module, but perhaps a single function plus a set of nested functions or associated lambdas).

Same answer.

>>>>>> I believe Cray will have a representative attending the BoF on this topic (David Greene), and we are certainly interested in following this proposal and participating in future discussions.

That would be great.  Perhaps we can have you or someone else join a near-future conference call, and continue the discussion there.

Regards,
Jeff

Thanks for the great feedback.

—Vikram Adve

// Interim Head, Department of Computer Science
// Donald B. Gillies Professor of Computer Science
// University of Illinois at Urbana-Champaign
// Admin Assistant: Amy Simons - aoboyle at illinois.edu<mailto:ajfoley2 at illinois.edu>
// Google Hangouts: vikram.s.adve at gmail.com<mailto:vikram.s.adve at gmail.com> || Skype: vikramsadve
// Research page: http://vikram.cs.illinois.edu<http://vikram.cs.illinois.edu/>
// 10 Commandments for Email Survival: http://www.emailcharter.org/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20181023/deb8f2d1/attachment.html>