[cfe-dev] RFC: clacc: translating OpenACC to OpenMP in clang

Fri Dec 8 07:58:15 PST 2017

Hi Hal,

Thanks for your feedback.  It sounds like we're basically in agreement, but
I've added a few thoughts inline below.

On Wed, Dec 6, 2017 at 4:02 AM, Hal Finkel <hfinkel at anl.gov> wrote:

>
> On 12/05/2017 01:06 PM, Joel E. Denny wrote:
>
> Hi,
>
> We are working on a new project, clacc, that extends clang with OpenACC
> support.  Clacc's approach is to translate OpenACC (a descriptive language)
> to OpenMP (a prescriptive language) and thus to build on clang's existing
> OpenMP support.  While we plan to develop clacc to support our own
> research, an important goal is to contribute clacc as a production-quality
> component of upstream clang.
>
>
> Great.
>
>
> We have begun implementing an early prototype of clacc.  Before we get too
> far into the implementation, we would like to get feedback from the LLVM
> community to help ensure our design would ultimately be acceptable for
> contribution.  For that purpose, below is an analysis of several high-level
> design alternatives we have considered and their various features.  We
> welcome any feedback.
>
> Thanks.
>
> Joel E. Denny
> Future Technologies Group
> Oak Ridge National Laboratory
>
>
> Design Alternatives
> -------------------
>
> We have considered three design alternatives for the clacc compiler:
>
> 1. acc src  --parser-->                     omp AST  --codegen-->  LLVM
> IR + omp rt calls
>
>
> I don't think that we want this option because, if nothing else, it will
> preclude builting source-level tooling for OpenACC.
>

Agreed.

> 2. acc src  --parser-->  acc AST                     --codegen-->  LLVM IR
> + omp rt calls
> 3. acc src  --parser-->  acc AST  --ttx-->  omp AST  --codegen-->  LLVM IR
> + omp rt calls
>
>
> My recommendation: We should think about the very best way we could
> refactor the code to implement (2), and if that is too ugly (or otherwise
> significantly degrades maintainability of the OpenMP code), then we should
> choose (3).
>

I started out with design 2 in the early prototype I'm experimenting with.
Eventually I figured out some possibilities for how to implement the ttx
component above (I'd be happy to discuss that), and I switched to design
3.  So far, I'm finding design 3 to be easier to implement.  Moreover, I
can use -ast-print combined with a custom option to print either OpenACC
source, OpenMP source, or both with one commented out.  I like that
capability.  However, I think it's clear that design 3 has greater
potential for running into difficulties as I move forward to more complex
OpenACC constructs.

>
>
> In the above diagram:
>
> * acc src = C source code containing acc constructs.
> * acc AST = a clang AST in which acc constructs are represented by
>   nodes with acc node types.  Of course, such node types do not
>   already exist in clang's implementation.
> * omp AST = a clang AST in which acc constructs have been lowered
>   to omp constructs represented by nodes with omp node types.  Of
>   course, such node types do already exist in clang's
>   implementation.
> * parser = the existing clang parser and semantic analyzer,
>   extended to handle acc constructs.
> * codegen = the existing clang backend that translates a clang AST
>   to LLVM IR, extended if necessary (depending on which design is
>   chosen) to perform codegen from acc nodes.
> * ttx (tree transformer) = a new clang component that transforms
>   acc to omp in clang ASTs.
>
> Design Features
> ---------------
>
> There are several features to consider when choosing among the designs
> in the previous section:
>
> 1. acc AST as an artifact -- Because they create acc AST nodes,
>    designs 2 and 3 best facilitate the creation of additional acc
>    source-level tools (such as pretty printers, analyzers, lint-like
>    tools, and editor extensions).  Some of these tools, such as pretty
>    printing, would be available immediately or as minor extensions of
>    tools that already exist in clang's ecosystem.
>
> 2. omp AST/source as an artifact -- Because they create omp AST
>    nodes, designs 1 and 3 best facilitate the use of source-level
>    tools to help an application developer discover how clacc has
>    mapped his acc to omp, possibly in order to debug a mapping
>    specification he has supplied.  With design 2 instead, an
>    application developer has to examine low-level LLVM IR + omp rt
>    calls.  Moreover, with designs 1 and 3, permanently migrating an
>    application's acc source to omp source can be automated.
>
> 3. omp AST for mapping implementation -- Designs 1 and 3 might
>    also make it easier for the compiler developer to reason about and
>    implement mappings from acc to omp.  That is, because acc and omp
>    syntax is so similar, implementing the translation at the level of
>    a syntactic representation is probably easier than translating to
>    LLVM IR.
>
> 4. omp AST for codegen -- Designs 1 and 3 simplify the
>    compiler implementation by enabling reuse of clang's existing omp
>    support for codegen.  In contrast, design 2 requires at least some
>    extensions to clang codegen to support acc nodes.
>
> 5. Full acc AST for mapping -- Designs 2 and 3 potentially
>    enable the compiler to analyze the entire source (as opposed to
>    just the acc construct currently being parsed) while choosing the
>    mapping to omp.  It is not clear if this feature will prove useful,
>    but it might enable more optimizations and compiler research
>    opportunities.
>
>
> We'll end up doing this, but most of this falls within the scope of the
> "parallel IR" designs that many of us are working on. Doing this kind of
> analysis in the frontend is hard (because it essentially requires it to do
> inlining, simplification, and analysis akin to what the optimizer itself
> does).
>

I agree.  However, before the parallel IR efforts mature, I need to make
progress.  Also, I want to keep my options open, especially at this early
stage, so I can experiment with different possibilities.

>
> 6. No acc node classes -- Design 1 simplifies the compiler
>    implementation by eliminating the need to implement many acc node
>    classes.  While we have so far found that implementing these
>    classes is mostly mechanical, it does take a non-trivial amount of
>    time.
>
> 7. No omp mapping -- Design 2 does not require acc to be mapped to
>    omp.  That is, it is conceivable that, for some acc constructs,
>    there will prove to be no omp syntax to capture the semantics we
>    wish to implement.
>
>
> I'm fairly certain that not everything maps exactly. They'll be some
> things we need to deal with explicitly in CodeGen.
>
> It is also conceivable that we might one day
>    want to represent some acc constructs directly as extensions to
>    LLVM IR, where some acc analyses or optimizations might be more
>    feasible to implement.  This possibility dovetails with recent
>    discussions in the LLVM community about developing LLVM IR
>    extensions for various parallel programming models.
>
>
> +1
>
>
> Because of features 4 and 6, design 1 is likely the fastest design to
> implement, at least at first while we focus on simple acc features and
> simple mappings to omp.  However, we have so far found no advantage
> that design 1 has but that design 3 does not have except for feature
> 6, which we see as the least important of the above features in the
> long term.
>
> The only advantage we have found that design 2 has but that design 3
> does not have is feature 7.  It should be possible to choose design 3
> as the default but, for certain acc constructs or scenarios where
> feature 7 proves important (if any), incorporate design 2.  In other
> words, if we decide not to map a particular acc construct to any omp
> construct, ttx would leave it alone, and we would extend codegen to
> handle it directly.
>
>
> This makes sense to me, and I think is most likely to leave the CodeGen
> code easiest to maintain (and has good separation of concerns).
> Nevertheless, I think we should go through the mental refactoring exercise
> for (2) to decide on the value of (3).
>

At this moment, I'm finding that the easiest way to explore is to just push
forward with design 3.  Even so, if developers who have a deeper
understanding than I do of clang's OpenMP implementation would like to have
an email discussion on the refactoring exercise for design 2, I agree that
would be helpful.

> Thanks again,
> Hal
>

Thanks.

Joel

>
> Conclusions
> -----------
>
> For the above reasons, and because design 3 offers the cleanest
> separation of concerns, we have chosen design 3 with the possibility
> of incorporating design 2 where it proves useful.
>
> Because of the immutability of clang's AST, the design of our proposed
> ttx component requires careful consideration.  To shorten this initial
> email, we have omitted those details for now, but we will be happy to
> include them as the discussion progresses.
>
>
> --
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20171208/05b2cda4/attachment.html>