[llvm-dev] RFC: An Extension Mechanism for Parallel Compilers Based on LLVM

Thu Oct 18 10:16:17 PDT 2018

Hi,

We have experience with a similar approach in our compiler at Cray -- we represent parallel regions with begin/end intrinsic operators.  This approach does potentially enable some optimization opportunities, but we've also experienced some real challenges in getting this approach to work properly.  The proposal seems to touch on many of the concerns we would raise, but we still have some questions.

The main challenges we have experienced are: (1) maintaining the required region structure and nesting; (2) preventing illegal motion across region boundaries; and (3) establishing and maintaining proper object allocation/lifetime.

It seems that a region will begin as a single-entry, single-exit block of code; the single-entry property is required to be maintained throughout transformations, but is it correct that the single-exit property is allowed to be relaxed?  If so, then I assume that the multiple exit points would still need to "collectively post dominate" the single entry point?  That is, any path leaving a region must pass through an exit point (and that exit point must be dominated by the entry point).  Is this correct?

Do you have thoughts about how to handle issues that arise from unreachable code or unusual control flow?  For example: an infinite loop in a region could result in the end marker being deleted, creating a region with an entry but no exit.  Is this going to be allowed (and handled), or prohibited (and if so, how)?  What about control flow that might exit from a region, like exception handling?  Will this be prohibited, or handled in some manner?

Are region entry/exit intrinsics allowed to be reordered (with respect to other region entry/exit intrinsics)?  If so, is there any mechanism that would prevent two regions from becoming interleaved (i.e., overlapped but not properly nested)?

The proposal states that SSA values are not allowed to be defined in one region and used in another.  Is a nested region considered the same or a different region?  That is, can an SSA value defined in one region be referenced in a nested region?

What mechanisms would prevent an SSA value from flowing out of a region?  The language about dominator relations and direct SSA references in C(i) in the SOUNDNESS section seems to suggest that a region will have an implicit control-flow path from the start of the region to the exit (i.e., around the body of the region).  Is this correct?  (If not, what prevents an SSA value defined in one region to be directly referenced in another region?)  Also, the proposal mentions that enforcing property C(i) would require changes to PHI-node construction.  Will there be additional checks/asserts added to ensure that future transformations are not allowed to introduce these prohibited PHI nodes?

Is the intent that all necessary privatization will be performed by the PREPARE translation?  Or, is it possible for a region to contain "shared" accesses with corresponding meta-data indicating they will eventually become private accesses?  The big challenge with the latter is that the IR cannot be interpreted without also considering the meta-data, since the meaning is entirely different.  For example, the address of an object will change when the object is converted from shared to private.

Is the proposed framework intended/expected to support heterogeneous compilation?  That is, could different regions be compiled for different targets?

Has there been any thought about adopting a model that represents regions as outlined functions (similar to how OpenMP is represented today), but abstracts the "fork" call into a general LLVM intrinsic?  I believe a lot or all of the same optimizations could be performed with this representation, though the analysis and transformations would necessarily be inter-procedural (and probably require writing new passes, rather than leveraging existing passes).  But, the main advantage is that it would be less invasive to the standard LLVM IR and infrastructure.

Is there a middle ground, with first-class support for nested functions or lambdas?  This would allow outlining a region into a nested function or lambda.  Optimizations on this form would still require inter-procedural analysis and transformation, but it could be limited to a smaller scope (i.e., not an entire module, but perhaps a single function plus a set of nested functions or associated lambdas).

I believe Cray will have a representative attending the BoF on this topic (David Greene), and we are certainly interested in following this proposal and participating in future discussions.

Regards,
Jeff