[llvm-dev] [RFC] IR-level Region Annotations

Fri Jan 13 09:57:04 PST 2017

I have been following the discussion and this is something we are looking
for for years. I am glad that you have the patch that at least we can use.
I however, have several comments/requests based on our experience:

1. The idea of creating a set of representations for language-neutral
parallel constructs and then allows for extending for language-specific
representation and passes sounds very well, and it is definitely worth to
give a try as next step I think. We did a survey of multiple threading
programming interfaces (https://www.hpcwire.com/2015/
03/02/a-comparison-of-heterogeneous-and-manycore-programming-models) and it
is obvious that parallel interfaces (even including inter-node model such
as PGAS, APGAS) share some common mechanisms for representing parallelism,
data/affinity, synchronization and mutual exclusion.

2. There are APIs calls or typical statements that are meant for
parallelism, but the language-based IR extensions are not able to represent
them. E.g. pthread_create/join (or other runtime calls such as C++
thread/async, etc) are fork/join parallelism. Frontend can be enhanced to
recognize those calls and create/append PIR info to those calls. It however
would be nicer if we have a meta approach, e.g. providing a file that tells
the frontend that a parallel IR should be used for specific API calls.

3. SPMD divergence such as the following, or if statement inside vector
loop body

   if (omp_get_thread_num() == 4) { /* same for UPC or MPI internode/PGAS
as we use this to different computation for each thread/proc*/

   } else {

   }

Do your patch have support for that? Basically I can imagine it needs to
add some metadata/token/tags to branching IR.

Thank you!

Yonghong Yan
Assistant Professor
Department of Computer Science and Engineering
School of Engineering and Computer Science
Oakland University
Office: EC 534
Phone: 248-370-4087 <(248)%20370-4087>
Email: yan at oakland.edu
www.secs.oakland.edu/~yan

On Fri, Jan 13, 2017 at 1:59 AM, Tian, Xinmin via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> Thank you all David, Hongbin, Reid, Mehdi, Daniel, Vikram for your review
> and constructive feedback for this RFC. We will update our Clang FE patch
> to use Token and Tags suggested by David, Hongbin, et.al. instead of
> using metadata and function arguments for IR-annotation intrinsic function
> calls to see how it goes to preserve all necessary information for our LLVM
> middle-end / back-end transformation.  Going with Token and Tag approach,
> the changes need to be made in our W-Region framework is relative small as
> well.
>
>
>
> Vikram, many points you made below are well-taken.  Hal and I had a long
> discussion at SC'16 on how to build an practical infrastructure for people
> to experiment with and study all pros and cons for IR extensions for
> expressing parallelism.  optimization parallel code, and many other usage
> for directive/pragma information. Personally, I would agree, eventually,
> the solution likely could be a combination of b and d when we go with
> parallel IR, after the community finally agreed on what are the most common
> to be represented as LLVM instructions.
>
>
>
> Having said that, this RFC serves as the first step, the intrinsics we
> proposed are language neutral, but, "tag" or "metadata" are specific to
> language constructs/directive/pragma...we are expecting more and more
> feedback and discussion on this work. Thank you all again.
>
>
>
>
>
> *From:* llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] *On Behalf Of *David
> Majnemer via llvm-dev
> *Sent:* Wednesday, January 11, 2017 2:18 PM
> *To:* Hal Finkel <hfinkel at anl.gov>
> *Cc:* llvm-dev <llvm-dev at lists.llvm.org>
> *Subject:* Re: [llvm-dev] [RFC] IR-level Region Annotations
>
>
>
> FWIW, we needed to maintain single entry-multiple exit regions for WinEH
> and we accomplished it via a different mechanism.
>
>
>
> We had an instruction which produces a value of type Token (
> http://llvm.org/docs/LangRef.html#token-type) which let us establish the
> region and another instruction to exit the region by consuming it. The
> dominance rules allowed us to avoid situations where the compiler might
> trash the regions in weird ways and made sure that regions would be left
> unharmed.
>
>
>
> AFAIK, a similar approach using Token could work here. I think it would
> reduce the amount of stuff you'd need LLVM to maintain.
>
>
>
>
>
> On Wed, Jan 11, 2017 at 2:02 PM, Hal Finkel via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> A Proposal for adding an experimental IR-level region-annotation
> infrastructure
> =============================================================================
>
> Hal Finkel (ANL) and Xinmin Tian (Intel)
>
> This is a proposal for adding an experimental infrastructure to support
> annotating regions in LLVM IR, making use of intrinsics and metadata, and
> a generic analysis to allow transformations to easily make use of these
> annotated regions. This infrastructure is flexible enough to support
> representation of directives for parallelization, vectorization, and
> offloading of both loops and more-general code regions. Under this scheme,
> the conceptual distance between source-level directives and the region
> annotations need not be significant, making the incremental cost of
> supporting new directives and modifiers often small. It is not, however,
> specific to those use cases.
>
> Problem Statement
> =================
> There are a series of discussions on LLVM IR extensions for representing
> region
> and loop annotations for parallelism, and other user-guided
> transformations,
> among both industrial and academic members of the LLVM community.
> Increasing
> the quality of our OpenMP implementation is an important motivating use
> case,
> but certainly not the only one. For OpenMP in particular, we've discussed
> having an IR representation for years. Presently, all OpenMP pragmas are
> transformed directly into runtime-library calls in Clang, and outlining
> (i.e.
> extracting parallel regions into their own functions to be invoked by the
> runtime library) is done in Clang as well. Our implementation does not
> further
> optimize OpenMP constructs, and a lot of thought has been put into how we
> might
> improve this. For some optimizations, such as redundant barrier removal, we
> could use a TargetLibraryInfo-like mechanism to recognize
> frontend-generated
> runtime calls and proceed from there. Dealing with cases where we lose
> pointer-aliasing information, information on loop bounds, etc. we could
> improve
> by improving our inter-procedural-analysis capabilities. We should do that
> regardless. However, there are important cases where the underlying scheme
> we
> want to use to lower the various parallelism constructs, especially when
> targeting accelerators, changes depending on what is in the parallel
> region.
> In important cases where we can see everything (i.e. there aren't arbitrary
> external calls), code generation should proceed in a way that is very
> different
> from the general case. To have a sensible implementation, this must be done
> after inlining. When using LTO, this should be done during the link-time
> phase.
> As a result, we must move away from our purely-front-end based lowering
> scheme.
> The question is what to do instead, and how to do it in a way that is
> generally
> useful to the entire community.
>
> Designs previously discussed can be classified into four categories:
>
> (a) Add a large number of new kinds of LLVM metadata, and use them to
> annotate
>     each necessary instruction for parallelism, data attributes, etc.
> (b) Add several new LLVM instructions such as, for parallelism, fork,
> spawn,
>     join, barrier, etc.
> (c) Add a large number of LLVM intrinsics for directives and clauses, each
>     intrinsic representing a directive or a clause.
> (d) Add a small number of LLVM intrinsics for region or loop annotations,
>     represent the directive/clause names using metadata and the remaining
>     information using arguments.
>
> Here we're proposing (d), and below is a brief pros and cons analysis
> based on
> these discussions and our own experiences of supporting region/loop
> annotations
> in LLVM-based compilers. The table below shows a short summary of our
> analysis.
>
> Various commercial compilers (e.g. from Intel, IBM, Cray, PGI), and GCC
> [1,2],
> have IR-level representations for parallelism constructs. Based on
> experience
> from these previous developments, we'd like a solution for LLVM that
> maximizes
> optimization enablement while minimizing the maintenance costs and
> complexity
> increase experienced by the community as a whole.
>
> Representing the desired information in the LLVM IR is just the first
> step. The
> challenge is to maintain the desired semantics without blocking useful
> optimizations. With options (c) and (d), dependencies can be preserved
> mainly
> based on the use/def chain of the arguments of each intrinsic, and a
> manageable
> set LLVM analysis and transformations can be made aware of certain kinds of
> annotations in order to enable specific optimizations. In this regard,
> options (c) and (d) are close with respect to maintenance efforts. However,
> based on our experiences, option (d) is preferable because it is easier to
> extend to support new directives and clauses in the future without the
> need to
> add new intrinsics as required by option (c).
>
> Table 1. Pros/cons summary of LLVM IR experimental extension options
>
> --------+----------------------+-----------------------------------------------
>
> Options |         Pros         | Cons
> --------+----------------------+-----------------------------------------------
>
> (a)     | No need to add new   | LLVM passes do not always maintain
> metadata.
>         | instructions or      | Need to educate many passes (if not all)
> to
>         | new intrinsics       | understand and handle them.
> --------+----------------------+-----------------------------------------------
>
> (b)     | Parallelism becomes  | Huge effort for extending all LLVM passes
> and
>         | first class citizen  | code generation to support new
> instructions.
>         |                      | A large set of information still needs to
> be
>         |                      | represented using other means.
> --------+----------------------+-----------------------------------------------
>
> (c)     | Less impact on the   | A large number of intrinsics must be
> added.
>         | exist LLVM passes.   | Some of the optimizations need to be
>         | Fewer requirements   | educated to understand them.
>         | for passes to        |
>         | maintain metadata.   |
> --------+----------------------+-----------------------------------------------
>
> (d)     | Minimal impact on    | Some of the optimizations need to be
>         | existing LLVM        | educated to understand them.
>         | optimizations passes.| No requirements for all passes to maintain
>         | directive and clause | large set of metadata with values.
>         | names use metadata   |
>         | strings.             |
> --------+----------------------+-----------------------------------------------
>
>
> Regarding (a), LLVM already uses metadata for certain loop information
> (e.g.
> annotations directing loop transformations and assertions about
> loop-carried
> dependencies), but there is no natural or consistent way to extend this
> scheme
> to represent necessary data-movement or region information.
>
>
> New Intrinsics for Region and Value Annotations
> ==============================================
> The following new (experimental) intrinsics are proposed which allow:
>
> a) Annotating a code region marked with directives / pragmas,
> b) Annotating values associated with the region (or loops), that is, those
>    values associated with directives / pragmas.
> c) Providing information on LLVM IR transformations needed for the
> annotated
>    code regions (or loops).
>
> These can be used both by frontends and also by transformation passes (e.g.
> automated parallelization). The names used here are similar to those used
> by
> our internal prototype, but obviously we expect a community bikeshed
> discussion.
>
> def int_experimental_directive : Intrinsic<[], [llvm_metadata_ty],
>                                    [IntrArgMemOnly],
> "llvm.experimental.directive">;
>
> def int_experimental_dir_qual : Intrinsic<[], [llvm_metadata_ty],
> [IntrArgMemOnly],
> "llvm.experimental.dir.qual">;
>
> def int_experimental_dir_qual_opnd : Intrinsic<[],
> [llvm_metadata_ty, llvm_any_ty],
> [IntrArgMemOnly],
> "llvm.experimental.dir.qual.opnd">;
>
> def int_experimental_dir_qual_opndlist : Intrinsic<
>                                         [],
> [llvm_metadata_ty, llvm_vararg_ty],
> [IntrArgMemOnly],
> "llvm.experimental.dir.qual.opndlist">;
>
> Note that calls to these intrinsics might need to be annotated with the
> convergent attribute when they represent fork/join operations, barriers,
> and
> similar.
>
> Usage Examples
> ==============
>
> This section shows a few examples using these experimental intrinsics.
> LLVM developers who will use these intrinsics can defined their own
> MDstring.
> All details of using these intrinsics on representing OpenMP 4.5
> constructs are described in [1][3].
>
>
> Example I: An OpenMP combined construct
>
> #pragma omp target teams distribute parallel for simd
>   loop
>
> LLVM IR
> -------
> call void @llvm.experimental.directive(metadata !0)
> call void @llvm.experimental.directive(metadata !1)
> call void @llvm.experimental.directive(metadata !2)
> call void @llvm.experimental.directive(metadata !3)
>   loop
> call void @llvm.experimental.directive(metadata !6)
> call void @llvm.experimental.directive(metadata !5)
> call void @llvm.experimental.directive(metadata !4)
>
> !0 = metadata !{metadata !DIR.OMP.TARGET}
> !1 = metadata !{metadata !DIR.OMP.TEAMS}
> !2 = metadata !{metadata !DIR.OMP.DISTRIBUTE.PARLOOP.SIMD}
>
> !6 = metadata !{metadata !DIR.OMP.END.DISTRIBUTE.PARLOOP.SIMD}
> !5 = metadata !{metadata !DIR.OMP.END.TEAMS}
> !4 = metadata !{metadata !DIR.OMP.END.TARGET}
>
> Example II: Assume x,y,z are int variables, and s is a non-POD variable.
>             Then, lastprivate(x,y,s,z) is represented as:
>
> LLVM IR
> -------
> call void @llvm.experimental.dir.qual.opndlist(
>                 metadata !1, %x, %y, metadata !2, %a, %ctor, %dtor, %z)
>
> !1 = metadata !{metadata !QUAL.OMP.PRIVATE}
> !2 = metadata !{metadata !QUAL.OPND.NONPOD}
>
> Example III: A prefetch pragma example
>
> // issue vprefetch1 for xp with a distance of 20 vectorized iterations
> ahead
> // issue vprefetch0 for yp with a distance of 10 vectorized iterations
> ahead
> #pragma prefetch x:1:20 y:0:10
> for (i=0; i<2*N; i++) { xp[i*m + j] = -1; yp[i*n +j] = -2; }
>
> LLVM IR
> -------
> call void @llvm.experimental.directive(metadata !0)
> call void @llvm.experimental.dir.qual.opnslist(metadata !1, %xp, 1, 20,
>                                                metadata !1, %yp, 0, 10)
>   loop
> call void @llvm.experimental.directive(metadata !3)
>
> References
> ==========
>
> [1] LLVM Framework and IR extensions for Parallelization, SIMD
> Vectorization
>     and Offloading Support. SC'2016 LLVM-HPC3 Workshop. (Xinmin Tian et.al
> .)
>     Saltlake City, Utah.
>
> [2] Extending LoopVectorizer towards supporting OpenMP4.5 SIMD and outer
> loop
>     auto-vectorization. (Hideki Saito, et.al.) LLVM Developers' Meeting
> 2016,
>     San Jose.
>
> [3] Intrinsics, Metadata, and Attributes: The Story continues! (Hal Finkel)
>     LLVM Developers' Meeting, 2016. San Jose
>
> [4] LLVM Intrinsic Function and Metadata String Interface for Directive (or
>     Pragmas) Representation. Specification Draft v0.9, Intel Corporation,
> 2016.
>
>
> Acknowledgements
> ================
> We would like to thank Chandler Carruth (Google), Johannes Doerfert
> (Saarland
> Univ.), Yaoqing Gao (HuaWei), Michael Wong (Codeplay), Ettore Tiotto,
> Carlo Bertolli, Bardia Mahjour (IBM), and all other LLVM-HPC IR Extensions
> WG
> members for their constructive feedback on the LLVM framework and IR
> extension
> proposal.
>
> Proposed Implementation
> =======================
>
> Two sets of patches of supporting these experimental intrinsics and
> demonstrate
> the usage are ready for community review.
>
> a) Clang patches that support core OpenMP pragmas using this approach.
> b) W-Region framework patches: CFG restructuring to form single-entry-
>    single-exit work region (W-Region) based on annotations, Demand-driven
>    intrinsic parsing, and WRegionInfo collection and analysis passes,
>    Dump functions of WRegionInfo.
>
> On top of this functionality, we will provide the transformation patches
> for
> core OpenMP constructs (e.g. start with "#pragma omp parallel for" loop for
> lowering and outlining, and "#pragma omp simd" to hook it up with
> LoopVectorize.cpp). We have internal implementations for many constructs
> now.
> We will break this functionality up to create a series of patches for
> community review.
>
> --
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170113/bf1981db/attachment.html>