[llvm-dev] [RFC] IR-level Region Annotations

Wed Jan 11 16:09:36 PST 2017

On Wed, Jan 11, 2017 at 3:51 PM, Reid Kleckner via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> +1, tokens are the current True Way to create single-entry multi-exit
> regions. Your example for an annotated loop would look like:
>
> %region = call token @llvm.openmp.regionstart(metadata ...) ; whatever
> parameters you need here
>   loop
> call void @llvm.openmp.regionend(token %region)
>
> If you use tokens, I would recommend proposal (c), where you introduce new
> intrinsics for every new kind of region, instead of adding one overly
> generic set of region intrinsics.
>
Maybe we can come up with several categories of regions, and create new
intrinsic for each category, instead of creating new intrinsic for every
*kind*.

Thanks
Hongbin

>
> We already have a way to form regions with real barriers, and it's tokens.
>
> On Wed, Jan 11, 2017 at 2:17 PM, David Majnemer via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> FWIW, we needed to maintain single entry-multiple exit regions for WinEH
>> and we accomplished it via a different mechanism.
>>
>> We had an instruction which produces a value of type Token (
>> http://llvm.org/docs/LangRef.html#token-type) which let us establish the
>> region and another instruction to exit the region by consuming it. The
>> dominance rules allowed us to avoid situations where the compiler might
>> trash the regions in weird ways and made sure that regions would be left
>> unharmed.
>>
>> AFAIK, a similar approach using Token could work here. I think it would
>> reduce the amount of stuff you'd need LLVM to maintain.
>>
>>
>> On Wed, Jan 11, 2017 at 2:02 PM, Hal Finkel via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> A Proposal for adding an experimental IR-level region-annotation
>>> infrastructure
>>> =============================================================================
>>>
>>> Hal Finkel (ANL) and Xinmin Tian (Intel)
>>>
>>> This is a proposal for adding an experimental infrastructure to support
>>> annotating regions in LLVM IR, making use of intrinsics and metadata, and
>>> a generic analysis to allow transformations to easily make use of these
>>> annotated regions. This infrastructure is flexible enough to support
>>> representation of directives for parallelization, vectorization, and
>>> offloading of both loops and more-general code regions. Under this
>>> scheme,
>>> the conceptual distance between source-level directives and the region
>>> annotations need not be significant, making the incremental cost of
>>> supporting new directives and modifiers often small. It is not, however,
>>> specific to those use cases.
>>>
>>> Problem Statement
>>> =================
>>> There are a series of discussions on LLVM IR extensions for representing
>>> region
>>> and loop annotations for parallelism, and other user-guided
>>> transformations,
>>> among both industrial and academic members of the LLVM community.
>>> Increasing
>>> the quality of our OpenMP implementation is an important motivating use
>>> case,
>>> but certainly not the only one. For OpenMP in particular, we've discussed
>>> having an IR representation for years. Presently, all OpenMP pragmas are
>>> transformed directly into runtime-library calls in Clang, and outlining
>>> (i.e.
>>> extracting parallel regions into their own functions to be invoked by the
>>> runtime library) is done in Clang as well. Our implementation does not
>>> further
>>> optimize OpenMP constructs, and a lot of thought has been put into how
>>> we might
>>> improve this. For some optimizations, such as redundant barrier removal,
>>> we
>>> could use a TargetLibraryInfo-like mechanism to recognize
>>> frontend-generated
>>> runtime calls and proceed from there. Dealing with cases where we lose
>>> pointer-aliasing information, information on loop bounds, etc. we could
>>> improve
>>> by improving our inter-procedural-analysis capabilities. We should do
>>> that
>>> regardless. However, there are important cases where the underlying
>>> scheme we
>>> want to use to lower the various parallelism constructs, especially when
>>> targeting accelerators, changes depending on what is in the parallel
>>> region.
>>> In important cases where we can see everything (i.e. there aren't
>>> arbitrary
>>> external calls), code generation should proceed in a way that is very
>>> different
>>> from the general case. To have a sensible implementation, this must be
>>> done
>>> after inlining. When using LTO, this should be done during the link-time
>>> phase.
>>> As a result, we must move away from our purely-front-end based lowering
>>> scheme.
>>> The question is what to do instead, and how to do it in a way that is
>>> generally
>>> useful to the entire community.
>>>
>>> Designs previously discussed can be classified into four categories:
>>>
>>> (a) Add a large number of new kinds of LLVM metadata, and use them to
>>> annotate
>>>     each necessary instruction for parallelism, data attributes, etc.
>>> (b) Add several new LLVM instructions such as, for parallelism, fork,
>>> spawn,
>>>     join, barrier, etc.
>>> (c) Add a large number of LLVM intrinsics for directives and clauses,
>>> each
>>>     intrinsic representing a directive or a clause.
>>> (d) Add a small number of LLVM intrinsics for region or loop annotations,
>>>     represent the directive/clause names using metadata and the remaining
>>>     information using arguments.
>>>
>>> Here we're proposing (d), and below is a brief pros and cons analysis
>>> based on
>>> these discussions and our own experiences of supporting region/loop
>>> annotations
>>> in LLVM-based compilers. The table below shows a short summary of our
>>> analysis.
>>>
>>> Various commercial compilers (e.g. from Intel, IBM, Cray, PGI), and GCC
>>> [1,2],
>>> have IR-level representations for parallelism constructs. Based on
>>> experience
>>> from these previous developments, we'd like a solution for LLVM that
>>> maximizes
>>> optimization enablement while minimizing the maintenance costs and
>>> complexity
>>> increase experienced by the community as a whole.
>>>
>>> Representing the desired information in the LLVM IR is just the first
>>> step. The
>>> challenge is to maintain the desired semantics without blocking useful
>>> optimizations. With options (c) and (d), dependencies can be preserved
>>> mainly
>>> based on the use/def chain of the arguments of each intrinsic, and a
>>> manageable
>>> set LLVM analysis and transformations can be made aware of certain kinds
>>> of
>>> annotations in order to enable specific optimizations. In this regard,
>>> options (c) and (d) are close with respect to maintenance efforts.
>>> However,
>>> based on our experiences, option (d) is preferable because it is easier
>>> to
>>> extend to support new directives and clauses in the future without the
>>> need to
>>> add new intrinsics as required by option (c).
>>>
>>> Table 1. Pros/cons summary of LLVM IR experimental extension options
>>>
>>> --------+----------------------+-----------------------------------------------
>>>
>>> Options |         Pros         | Cons
>>> --------+----------------------+-----------------------------------------------
>>>
>>> (a)     | No need to add new   | LLVM passes do not always maintain
>>> metadata.
>>>         | instructions or      | Need to educate many passes (if not
>>> all) to
>>>         | new intrinsics       | understand and handle them.
>>> --------+----------------------+-----------------------------------------------
>>>
>>> (b)     | Parallelism becomes  | Huge effort for extending all LLVM
>>> passes and
>>>         | first class citizen  | code generation to support new
>>> instructions.
>>>         |                      | A large set of information still needs
>>> to be
>>>         |                      | represented using other means.
>>> --------+----------------------+-----------------------------------------------
>>>
>>> (c)     | Less impact on the   | A large number of intrinsics must be
>>> added.
>>>         | exist LLVM passes.   | Some of the optimizations need to be
>>>         | Fewer requirements   | educated to understand them.
>>>         | for passes to        |
>>>         | maintain metadata.   |
>>> --------+----------------------+-----------------------------------------------
>>>
>>> (d)     | Minimal impact on    | Some of the optimizations need to be
>>>         | existing LLVM        | educated to understand them.
>>>         | optimizations passes.| No requirements for all passes to
>>> maintain
>>>         | directive and clause | large set of metadata with values.
>>>         | names use metadata   |
>>>         | strings.             |
>>> --------+----------------------+-----------------------------------------------
>>>
>>>
>>> Regarding (a), LLVM already uses metadata for certain loop information
>>> (e.g.
>>> annotations directing loop transformations and assertions about
>>> loop-carried
>>> dependencies), but there is no natural or consistent way to extend this
>>> scheme
>>> to represent necessary data-movement or region information.
>>>
>>>
>>> New Intrinsics for Region and Value Annotations
>>> ==============================================
>>> The following new (experimental) intrinsics are proposed which allow:
>>>
>>> a) Annotating a code region marked with directives / pragmas,
>>> b) Annotating values associated with the region (or loops), that is,
>>> those
>>>    values associated with directives / pragmas.
>>> c) Providing information on LLVM IR transformations needed for the
>>> annotated
>>>    code regions (or loops).
>>>
>>> These can be used both by frontends and also by transformation passes
>>> (e.g.
>>> automated parallelization). The names used here are similar to those
>>> used by
>>> our internal prototype, but obviously we expect a community bikeshed
>>> discussion.
>>>
>>> def int_experimental_directive : Intrinsic<[], [llvm_metadata_ty],
>>>                                    [IntrArgMemOnly],
>>> "llvm.experimental.directive">;
>>>
>>> def int_experimental_dir_qual : Intrinsic<[], [llvm_metadata_ty],
>>> [IntrArgMemOnly],
>>> "llvm.experimental.dir.qual">;
>>>
>>> def int_experimental_dir_qual_opnd : Intrinsic<[],
>>> [llvm_metadata_ty, llvm_any_ty],
>>> [IntrArgMemOnly],
>>> "llvm.experimental.dir.qual.opnd">;
>>>
>>> def int_experimental_dir_qual_opndlist : Intrinsic<
>>>                                         [],
>>> [llvm_metadata_ty, llvm_vararg_ty],
>>> [IntrArgMemOnly],
>>> "llvm.experimental.dir.qual.opndlist">;
>>>
>>> Note that calls to these intrinsics might need to be annotated with the
>>> convergent attribute when they represent fork/join operations, barriers,
>>> and
>>> similar.
>>>
>>> Usage Examples
>>> ==============
>>>
>>> This section shows a few examples using these experimental intrinsics.
>>> LLVM developers who will use these intrinsics can defined their own
>>> MDstring.
>>> All details of using these intrinsics on representing OpenMP 4.5
>>> constructs are described in [1][3].
>>>
>>>
>>> Example I: An OpenMP combined construct
>>>
>>> #pragma omp target teams distribute parallel for simd
>>>   loop
>>>
>>> LLVM IR
>>> -------
>>> call void @llvm.experimental.directive(metadata !0)
>>> call void @llvm.experimental.directive(metadata !1)
>>> call void @llvm.experimental.directive(metadata !2)
>>> call void @llvm.experimental.directive(metadata !3)
>>>   loop
>>> call void @llvm.experimental.directive(metadata !6)
>>> call void @llvm.experimental.directive(metadata !5)
>>> call void @llvm.experimental.directive(metadata !4)
>>>
>>> !0 = metadata !{metadata !DIR.OMP.TARGET}
>>> !1 = metadata !{metadata !DIR.OMP.TEAMS}
>>> !2 = metadata !{metadata !DIR.OMP.DISTRIBUTE.PARLOOP.SIMD}
>>>
>>> !6 = metadata !{metadata !DIR.OMP.END.DISTRIBUTE.PARLOOP.SIMD}
>>> !5 = metadata !{metadata !DIR.OMP.END.TEAMS}
>>> !4 = metadata !{metadata !DIR.OMP.END.TARGET}
>>>
>>> Example II: Assume x,y,z are int variables, and s is a non-POD variable.
>>>             Then, lastprivate(x,y,s,z) is represented as:
>>>
>>> LLVM IR
>>> -------
>>> call void @llvm.experimental.dir.qual.opndlist(
>>>                 metadata !1, %x, %y, metadata !2, %a, %ctor, %dtor, %z)
>>>
>>> !1 = metadata !{metadata !QUAL.OMP.PRIVATE}
>>> !2 = metadata !{metadata !QUAL.OPND.NONPOD}
>>>
>>> Example III: A prefetch pragma example
>>>
>>> // issue vprefetch1 for xp with a distance of 20 vectorized iterations
>>> ahead
>>> // issue vprefetch0 for yp with a distance of 10 vectorized iterations
>>> ahead
>>> #pragma prefetch x:1:20 y:0:10
>>> for (i=0; i<2*N; i++) { xp[i*m + j] = -1; yp[i*n +j] = -2; }
>>>
>>> LLVM IR
>>> -------
>>> call void @llvm.experimental.directive(metadata !0)
>>> call void @llvm.experimental.dir.qual.opnslist(metadata !1, %xp, 1, 20,
>>>                                                metadata !1, %yp, 0, 10)
>>>   loop
>>> call void @llvm.experimental.directive(metadata !3)
>>>
>>> References
>>> ==========
>>>
>>> [1] LLVM Framework and IR extensions for Parallelization, SIMD
>>> Vectorization
>>>     and Offloading Support. SC'2016 LLVM-HPC3 Workshop. (Xinmin Tian
>>> et.al.)
>>>     Saltlake City, Utah.
>>>
>>> [2] Extending LoopVectorizer towards supporting OpenMP4.5 SIMD and outer
>>> loop
>>>     auto-vectorization. (Hideki Saito, et.al.) LLVM Developers' Meeting
>>> 2016,
>>>     San Jose.
>>>
>>> [3] Intrinsics, Metadata, and Attributes: The Story continues! (Hal
>>> Finkel)
>>>     LLVM Developers' Meeting, 2016. San Jose
>>>
>>> [4] LLVM Intrinsic Function and Metadata String Interface for Directive
>>> (or
>>>     Pragmas) Representation. Specification Draft v0.9, Intel
>>> Corporation, 2016.
>>>
>>>
>>> Acknowledgements
>>> ================
>>> We would like to thank Chandler Carruth (Google), Johannes Doerfert
>>> (Saarland
>>> Univ.), Yaoqing Gao (HuaWei), Michael Wong (Codeplay), Ettore Tiotto,
>>> Carlo Bertolli, Bardia Mahjour (IBM), and all other LLVM-HPC IR
>>> Extensions WG
>>> members for their constructive feedback on the LLVM framework and IR
>>> extension
>>> proposal.
>>>
>>> Proposed Implementation
>>> =======================
>>>
>>> Two sets of patches of supporting these experimental intrinsics and
>>> demonstrate
>>> the usage are ready for community review.
>>>
>>> a) Clang patches that support core OpenMP pragmas using this approach.
>>> b) W-Region framework patches: CFG restructuring to form single-entry-
>>>    single-exit work region (W-Region) based on annotations, Demand-driven
>>>    intrinsic parsing, and WRegionInfo collection and analysis passes,
>>>    Dump functions of WRegionInfo.
>>>
>>> On top of this functionality, we will provide the transformation patches
>>> for
>>> core OpenMP constructs (e.g. start with "#pragma omp parallel for" loop
>>> for
>>> lowering and outlining, and "#pragma omp simd" to hook it up with
>>> LoopVectorize.cpp). We have internal implementations for many constructs
>>> now.
>>> We will break this functionality up to create a series of patches for
>>> community review.
>>>
>>> --
>>> Hal Finkel
>>> Lead, Compiler Technology and Programming Languages
>>> Leadership Computing Facility
>>> Argonne National Laboratory
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170111/553e3044/attachment.html>