[cfe-dev] RFC: clacc: translating OpenACC to OpenMP in clang
Hal Finkel via cfe-dev
cfe-dev at lists.llvm.org
Fri Dec 8 15:39:44 PST 2017
On 12/08/2017 09:58 AM, Joel E. Denny wrote:
> Hi Hal,
> Thanks for your feedback. It sounds like we're basically in
> agreement, but I've added a few thoughts inline below.
> On Wed, Dec 6, 2017 at 4:02 AM, Hal Finkel <hfinkel at anl.gov
> <mailto:hfinkel at anl.gov>> wrote:
> On 12/05/2017 01:06 PM, Joel E. Denny wrote:
>> We are working on a new project, clacc, that extends clang with
>> OpenACC support. Clacc's approach is to translate OpenACC (a
>> descriptive language) to OpenMP (a prescriptive language) and
>> thus to build on clang's existing OpenMP support. While we plan
>> to develop clacc to support our own research, an important goal
>> is to contribute clacc as a production-quality component of
>> upstream clang.
>> We have begun implementing an early prototype of clacc. Before
>> we get too far into the implementation, we would like to get
>> feedback from the LLVM community to help ensure our design would
>> ultimately be acceptable for contribution. For that purpose,
>> below is an analysis of several high-level design alternatives we
>> have considered and their various features. We welcome any feedback.
>> Joel E. Denny
>> Future Technologies Group
>> Oak Ridge National Laboratory
>> Design Alternatives
>> We have considered three design alternatives for the clacc compiler:
>> 1. acc src --parser--> omp AST --codegen--> LLVM IR + omp rt calls
> I don't think that we want this option because, if nothing else,
> it will preclude builting source-level tooling for OpenACC.
>> 2. acc src --parser--> acc AST --codegen-->
>> LLVM IR + omp rt calls
>> 3. acc src --parser--> acc AST --ttx--> omp AST --codegen-->
>> LLVM IR + omp rt calls
> My recommendation: We should think about the very best way we
> could refactor the code to implement (2), and if that is too ugly
> (or otherwise significantly degrades maintainability of the OpenMP
> code), then we should choose (3).
> I started out with design 2 in the early prototype I'm experimenting
> with. Eventually I figured out some possibilities for how to
> implement the ttx component above (I'd be happy to discuss that)
That's probably a good idea. Please share some details on this front.
> , and I switched to design 3. So far, I'm finding design 3 to be
> easier to implement. Moreover, I can use -ast-print combined with a
> custom option to print either OpenACC source, OpenMP source, or both
> with one commented out. I like that capability. However, I think
> it's clear that design 3 has greater potential for running into
> difficulties as I move forward to more complex OpenACC constructs.
It is this last part that is potentially concerning. If you try it,
however, and it sounds like you are, then we'll know for sure soon enough.
Obviously the most efficient way to write some piece of code, and the
way to write it to maximize maintainability and ease of extension, may
be different. To the extent that they're the same, in terms of upstream
functionality, we'll learn something.
>> In the above diagram:
>> * acc src = C source code containing acc constructs.
>> * acc AST = a clang AST in which acc constructs are represented by
>> nodes with acc node types. Of course, such node types do not
>> already exist in clang's implementation.
>> * omp AST = a clang AST in which acc constructs have been lowered
>> to omp constructs represented by nodes with omp node types. Of
>> course, such node types do already exist in clang's
>> * parser = the existing clang parser and semantic analyzer,
>> extended to handle acc constructs.
>> * codegen = the existing clang backend that translates a clang AST
>> to LLVM IR, extended if necessary (depending on which design is
>> chosen) to perform codegen from acc nodes.
>> * ttx (tree transformer) = a new clang component that transforms
>> acc to omp in clang ASTs.
>> Design Features
>> There are several features to consider when choosing among the
>> in the previous section:
>> 1. acc AST as an artifact -- Because they create acc AST nodes,
>> designs 2 and 3 best facilitate the creation of additional acc
>> source-level tools (such as pretty printers, analyzers, lint-like
>> tools, and editor extensions). Some of these tools, such as
>> printing, would be available immediately or as minor extensions of
>> tools that already exist in clang's ecosystem.
>> 2. omp AST/source as an artifact -- Because they create omp AST
>> nodes, designs 1 and 3 best facilitate the use of source-level
>> tools to help an application developer discover how clacc has
>> mapped his acc to omp, possibly in order to debug a mapping
>> specification he has supplied. With design 2 instead, an
>> application developer has to examine low-level LLVM IR + omp rt
>> calls. Moreover, with designs 1 and 3, permanently migrating an
>> application's acc source to omp source can be automated.
>> 3. omp AST for mapping implementation -- Designs 1 and 3 might
>> also make it easier for the compiler developer to reason about and
>> implement mappings from acc to omp. That is, because acc and omp
>> syntax is so similar, implementing the translation at the level of
>> a syntactic representation is probably easier than translating to
>> LLVM IR.
>> 4. omp AST for codegen -- Designs 1 and 3 simplify the
>> compiler implementation by enabling reuse of clang's existing omp
>> support for codegen. In contrast, design 2 requires at least some
>> extensions to clang codegen to support acc nodes.
>> 5. Full acc AST for mapping -- Designs 2 and 3 potentially
>> enable the compiler to analyze the entire source (as opposed to
>> just the acc construct currently being parsed) while choosing the
>> mapping to omp. It is not clear if this feature will prove
>> but it might enable more optimizations and compiler research
> We'll end up doing this, but most of this falls within the scope
> of the "parallel IR" designs that many of us are working on. Doing
> this kind of analysis in the frontend is hard (because it
> essentially requires it to do inlining, simplification, and
> analysis akin to what the optimizer itself does).
> I agree. However, before the parallel IR efforts mature, I need to
> make progress. Also, I want to keep my options open, especially at
> this early stage, so I can experiment with different possibilities.
You're free to prototype things however you'd like :-)
>> 6. No acc node classes -- Design 1 simplifies the compiler
>> implementation by eliminating the need to implement many acc node
>> classes. While we have so far found that implementing these
>> classes is mostly mechanical, it does take a non-trivial amount of
>> 7. No omp mapping -- Design 2 does not require acc to be mapped to
>> omp. That is, it is conceivable that, for some acc constructs,
>> there will prove to be no omp syntax to capture the semantics we
>> wish to implement.
> I'm fairly certain that not everything maps exactly. They'll be
> some things we need to deal with explicitly in CodeGen.
>> It is also conceivable that we might one day
>> want to represent some acc constructs directly as extensions to
>> LLVM IR, where some acc analyses or optimizations might be more
>> feasible to implement. This possibility dovetails with recent
>> discussions in the LLVM community about developing LLVM IR
>> extensions for various parallel programming models.
>> Because of features 4 and 6, design 1 is likely the fastest design to
>> implement, at least at first while we focus on simple acc
>> features and
>> simple mappings to omp. However, we have so far found no advantage
>> that design 1 has but that design 3 does not have except for feature
>> 6, which we see as the least important of the above features in the
>> long term.
>> The only advantage we have found that design 2 has but that design 3
>> does not have is feature 7. It should be possible to choose design 3
>> as the default but, for certain acc constructs or scenarios where
>> feature 7 proves important (if any), incorporate design 2. In other
>> words, if we decide not to map a particular acc construct to any omp
>> construct, ttx would leave it alone, and we would extend codegen to
>> handle it directly.
> This makes sense to me, and I think is most likely to leave the
> CodeGen code easiest to maintain (and has good separation of
> concerns). Nevertheless, I think we should go through the mental
> refactoring exercise for (2) to decide on the value of (3).
> At this moment, I'm finding that the easiest way to explore is to just
> push forward with design 3. Even so, if developers who have a deeper
> understanding than I do of clang's OpenMP implementation would like to
> have an email discussion on the refactoring exercise for design 2, I
> agree that would be helpful.
> Thanks again,
>> For the above reasons, and because design 3 offers the cleanest
>> separation of concerns, we have chosen design 3 with the possibility
>> of incorporating design 2 where it proves useful.
>> Because of the immutability of clang's AST, the design of our
>> ttx component requires careful consideration. To shorten this
>> email, we have omitted those details for now, but we will be happy to
>> include them as the discussion progresses.
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the cfe-dev