[llvm-dev] [RFC] IR-level Region Annotations

Wed Jan 11 16:10:44 PST 2017

On Wed, Jan 11, 2017 at 4:07 PM, Hongbin Zheng <etherzhhb at gmail.com> wrote:

> I think they are not MDString, but "bundle tags" that managed
> by  LLVMContextImpl::getOrInsertBundleTag.
>
I just treat them as something like the string that returned by
Inst->getName()

>
> On Wed, Jan 11, 2017 at 4:01 PM, Tian, Xinmin <xinmin.tian at intel.com>
> wrote:
>
>> And  “map”  and “firstprivate” … are represented as MDString, right?
>> Thanks.
>>
>>
>>
>> *From:* Hongbin Zheng [mailto:etherzhhb at gmail.com]
>> *Sent:* Wednesday, January 11, 2017 3:58 PM
>>
>> *To:* Tian, Xinmin <xinmin.tian at intel.com>
>> *Cc:* David Majnemer <david.majnemer at gmail.com>; Hal Finkel <
>> hfinkel at anl.gov>; llvm-dev at lists.llvm.org
>> *Subject:* Re: [llvm-dev] [RFC] IR-level Region Annotations
>>
>>
>>
>> Yes, those are LLVM SSA values. "map" (m, n)  should be "map" (i32 m, i32
>> n) .
>>
>>
>>
>> Thanks
>>
>> Hongbin
>>
>>
>>
>> On Wed, Jan 11, 2017 at 3:47 PM, Tian, Xinmin <xinmin.tian at intel.com>
>> wrote:
>>
>> Interesting, this is similar to what we have.
>>
>>
>>
>> One more question, these stuff in the yellow, are they represented as
>> LLVM VALUEs? In other words, does the LLVM  optimizer update them? ,E.g.
>> %m is re-named %m.1 in the loop,   is the “m” in the token @..... is
>> updated as well?  In the RFC,  the “m” is argument of intrinsic call, all
>>  use-def info are used by optimizer, and optimizer updates them during
>> optimization as regular function arguments. I am trying understand if there
>> is any difference between token scheme and intrinsic scheme in this regard.
>>
>>
>>
>>
>> tail call token @llvm.directive.scope.entry() [ "target teams
>> distribute"(),  "parallel for", "simd" (), "shared" (i32 *xp, i32 *yp),
>> "linear_iv" (),  "firstprivate" (i32 m, i32 n), "map" (m, n) ] ;
>>
>>
>>
>>
>>
>> *From:* Hongbin Zheng [mailto:etherzhhb at gmail.com]
>> *Sent:* Wednesday, January 11, 2017 3:29 PM
>>
>>
>> *To:* Tian, Xinmin <xinmin.tian at intel.com>
>> *Cc:* David Majnemer <david.majnemer at gmail.com>; Hal Finkel <
>> hfinkel at anl.gov>; llvm-dev at lists.llvm.org
>> *Subject:* Re: [llvm-dev] [RFC] IR-level Region Annotations
>>
>>
>>
>> I am not an OpenMP expert, so some annotation may be wrong:
>>
>>
>>
>> // CHECK: [[ENTRY:%[a-zA-Z0-9\.]+]] = tail call token
>> @llvm.directive.scope.entry() [ "target teams distribute"(),  "parallel
>> for", "simd" (), "shared" (i32 *xp, i32 *yp), "linear_iv" (),
>>  "firstprivate" (i32 m, i32 n), "map" (m, n) ] ; notice that I use "linear_iv"
>> for linear induction variable, you may want to fix this
>>
>> #pragma omp target teams distribute parallel for simd shared(xp, yp)
>> linear(i) firstprivate(m, n) map(m, n)
>> for (i=0; i<2*N; i++) { xp[i*m + j] = -1; yp[i*n +j] = -2; }
>>
>> // CHECK: tail call void @llvm.directive.scope.exit(token [[ENTRY]])
>>
>>
>>
>>
>>
>> // CHECK: [[ENTRY:%[a-zA-Z0-9\.]+]] = tail call token
>> @llvm.directive.scope.entry() [ "prefetch"(i32 *xp, i64 1, i64 20, i32
>> *yp, i64 0, i64 10) ]
>>
>> #pragma prefetch x:1:20 y:0:10
>> for (i=0; i<2*N; i++) { xp[i*m + j] = -1; yp[i*n +j] = -2; }
>>
>> // CHECK: tail call void @llvm.directive.scope.exit(token [[ENTRY]])
>>
>>
>>
>>
>>
>> On Wed, Jan 11, 2017 at 3:19 PM, Tian, Xinmin <xinmin.tian at intel.com>
>> wrote:
>>
>> Would you send us the LLVM IR for below example using token and OpBundle.
>> So, we can understand better. Thanks.
>>
>>
>>
>> #pragma omp target teams distribute parallel for simd shared(xp, yp)
>> linear(i) firstprivate(m, n) map(m, n)
>> for (i=0; i<2*N; i++) { xp[i*m + j] = -1; yp[i*n +j] = -2; }
>>
>>
>>
>>
>>
>> #pragma prefetch x:1:20 y:0:10
>> for (i=0; i<2*N; i++) { xp[i*m + j] = -1; yp[i*n +j] = -2; }
>>
>>
>>
>> *From:* Hongbin Zheng [mailto:etherzhhb at gmail.com]
>> *Sent:* Wednesday, January 11, 2017 3:09 PM
>> *To:* Tian, Xinmin <xinmin.tian at intel.com>
>> *Cc:* David Majnemer <david.majnemer at gmail.com>; Hal Finkel <
>> hfinkel at anl.gov>; llvm-dev at lists.llvm.org
>>
>>
>> *Subject:* Re: [llvm-dev] [RFC] IR-level Region Annotations
>>
>>
>>
>> We are experimenting similar thing on SESE regions. We introduce an
>> intrinsic to produce a token and another to consume the token. These two
>> intrinsics mark the region, and we annotate extra information as OpBundle
>> of the intrinsic that produce the token.
>>
>>
>>
>> Thanks
>>
>> Hongbin
>>
>>
>>
>> On Wed, Jan 11, 2017 at 2:53 PM, Tian, Xinmin via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>> David, one quick question, is there a way to preserve and associate a set
>> of “properties, value info/attr ” to the given region using Token?
>>
>>
>>
>> Thanks,
>>
>> Xinmin
>>
>>
>>
>> *From:* llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] *On Behalf Of *David
>> Majnemer via llvm-dev
>> *Sent:* Wednesday, January 11, 2017 2:18 PM
>> *To:* Hal Finkel <hfinkel at anl.gov>
>> *Cc:* llvm-dev <llvm-dev at lists.llvm.org>
>> *Subject:* Re: [llvm-dev] [RFC] IR-level Region Annotations
>>
>>
>>
>> FWIW, we needed to maintain single entry-multiple exit regions for WinEH
>> and we accomplished it via a different mechanism.
>>
>>
>>
>> We had an instruction which produces a value of type Token (
>> http://llvm.org/docs/LangRef.html#token-type) which let us establish the
>> region and another instruction to exit the region by consuming it. The
>> dominance rules allowed us to avoid situations where the compiler might
>> trash the regions in weird ways and made sure that regions would be left
>> unharmed.
>>
>>
>>
>> AFAIK, a similar approach using Token could work here. I think it would
>> reduce the amount of stuff you'd need LLVM to maintain.
>>
>>
>>
>>
>>
>> On Wed, Jan 11, 2017 at 2:02 PM, Hal Finkel via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>> A Proposal for adding an experimental IR-level region-annotation
>> infrastructure
>> =============================================================================
>>
>> Hal Finkel (ANL) and Xinmin Tian (Intel)
>>
>> This is a proposal for adding an experimental infrastructure to support
>> annotating regions in LLVM IR, making use of intrinsics and metadata, and
>> a generic analysis to allow transformations to easily make use of these
>> annotated regions. This infrastructure is flexible enough to support
>> representation of directives for parallelization, vectorization, and
>> offloading of both loops and more-general code regions. Under this scheme,
>> the conceptual distance between source-level directives and the region
>> annotations need not be significant, making the incremental cost of
>> supporting new directives and modifiers often small. It is not, however,
>> specific to those use cases.
>>
>> Problem Statement
>> =================
>> There are a series of discussions on LLVM IR extensions for representing
>> region
>> and loop annotations for parallelism, and other user-guided
>> transformations,
>> among both industrial and academic members of the LLVM community.
>> Increasing
>> the quality of our OpenMP implementation is an important motivating use
>> case,
>> but certainly not the only one. For OpenMP in particular, we've discussed
>> having an IR representation for years. Presently, all OpenMP pragmas are
>> transformed directly into runtime-library calls in Clang, and outlining
>> (i.e.
>> extracting parallel regions into their own functions to be invoked by the
>> runtime library) is done in Clang as well. Our implementation does not
>> further
>> optimize OpenMP constructs, and a lot of thought has been put into how we
>> might
>> improve this. For some optimizations, such as redundant barrier removal,
>> we
>> could use a TargetLibraryInfo-like mechanism to recognize
>> frontend-generated
>> runtime calls and proceed from there. Dealing with cases where we lose
>> pointer-aliasing information, information on loop bounds, etc. we could
>> improve
>> by improving our inter-procedural-analysis capabilities. We should do that
>> regardless. However, there are important cases where the underlying
>> scheme we
>> want to use to lower the various parallelism constructs, especially when
>> targeting accelerators, changes depending on what is in the parallel
>> region.
>> In important cases where we can see everything (i.e. there aren't
>> arbitrary
>> external calls), code generation should proceed in a way that is very
>> different
>> from the general case. To have a sensible implementation, this must be
>> done
>> after inlining. When using LTO, this should be done during the link-time
>> phase.
>> As a result, we must move away from our purely-front-end based lowering
>> scheme.
>> The question is what to do instead, and how to do it in a way that is
>> generally
>> useful to the entire community.
>>
>> Designs previously discussed can be classified into four categories:
>>
>> (a) Add a large number of new kinds of LLVM metadata, and use them to
>> annotate
>>     each necessary instruction for parallelism, data attributes, etc.
>> (b) Add several new LLVM instructions such as, for parallelism, fork,
>> spawn,
>>     join, barrier, etc.
>> (c) Add a large number of LLVM intrinsics for directives and clauses, each
>>     intrinsic representing a directive or a clause.
>> (d) Add a small number of LLVM intrinsics for region or loop annotations,
>>     represent the directive/clause names using metadata and the remaining
>>     information using arguments.
>>
>> Here we're proposing (d), and below is a brief pros and cons analysis
>> based on
>> these discussions and our own experiences of supporting region/loop
>> annotations
>> in LLVM-based compilers. The table below shows a short summary of our
>> analysis.
>>
>> Various commercial compilers (e.g. from Intel, IBM, Cray, PGI), and GCC
>> [1,2],
>> have IR-level representations for parallelism constructs. Based on
>> experience
>> from these previous developments, we'd like a solution for LLVM that
>> maximizes
>> optimization enablement while minimizing the maintenance costs and
>> complexity
>> increase experienced by the community as a whole.
>>
>> Representing the desired information in the LLVM IR is just the first
>> step. The
>> challenge is to maintain the desired semantics without blocking useful
>> optimizations. With options (c) and (d), dependencies can be preserved
>> mainly
>> based on the use/def chain of the arguments of each intrinsic, and a
>> manageable
>> set LLVM analysis and transformations can be made aware of certain kinds
>> of
>> annotations in order to enable specific optimizations. In this regard,
>> options (c) and (d) are close with respect to maintenance efforts.
>> However,
>> based on our experiences, option (d) is preferable because it is easier to
>> extend to support new directives and clauses in the future without the
>> need to
>> add new intrinsics as required by option (c).
>>
>> Table 1. Pros/cons summary of LLVM IR experimental extension options
>>
>> --------+----------------------+-----------------------------------------------
>>
>> Options |         Pros         | Cons
>> --------+----------------------+-----------------------------------------------
>>
>> (a)     | No need to add new   | LLVM passes do not always maintain
>> metadata.
>>         | instructions or      | Need to educate many passes (if not all)
>> to
>>         | new intrinsics       | understand and handle them.
>> --------+----------------------+-----------------------------------------------
>>
>> (b)     | Parallelism becomes  | Huge effort for extending all LLVM
>> passes and
>>         | first class citizen  | code generation to support new
>> instructions.
>>         |                      | A large set of information still needs
>> to be
>>         |                      | represented using other means.
>> --------+----------------------+-----------------------------------------------
>>
>> (c)     | Less impact on the   | A large number of intrinsics must be
>> added.
>>         | exist LLVM passes.   | Some of the optimizations need to be
>>         | Fewer requirements   | educated to understand them.
>>         | for passes to        |
>>         | maintain metadata.   |
>> --------+----------------------+-----------------------------------------------
>>
>> (d)     | Minimal impact on    | Some of the optimizations need to be
>>         | existing LLVM        | educated to understand them.
>>         | optimizations passes.| No requirements for all passes to
>> maintain
>>         | directive and clause | large set of metadata with values.
>>         | names use metadata   |
>>         | strings.             |
>> --------+----------------------+-----------------------------------------------
>>
>>
>> Regarding (a), LLVM already uses metadata for certain loop information
>> (e.g.
>> annotations directing loop transformations and assertions about
>> loop-carried
>> dependencies), but there is no natural or consistent way to extend this
>> scheme
>> to represent necessary data-movement or region information.
>>
>>
>> New Intrinsics for Region and Value Annotations
>> ==============================================
>> The following new (experimental) intrinsics are proposed which allow:
>>
>> a) Annotating a code region marked with directives / pragmas,
>> b) Annotating values associated with the region (or loops), that is, those
>>    values associated with directives / pragmas.
>> c) Providing information on LLVM IR transformations needed for the
>> annotated
>>    code regions (or loops).
>>
>> These can be used both by frontends and also by transformation passes
>> (e.g.
>> automated parallelization). The names used here are similar to those used
>> by
>> our internal prototype, but obviously we expect a community bikeshed
>> discussion.
>>
>> def int_experimental_directive : Intrinsic<[], [llvm_metadata_ty],
>>                                    [IntrArgMemOnly],
>> "llvm.experimental.directive">;
>>
>> def int_experimental_dir_qual : Intrinsic<[], [llvm_metadata_ty],
>> [IntrArgMemOnly],
>> "llvm.experimental.dir.qual">;
>>
>> def int_experimental_dir_qual_opnd : Intrinsic<[],
>> [llvm_metadata_ty, llvm_any_ty],
>> [IntrArgMemOnly],
>> "llvm.experimental.dir.qual.opnd">;
>>
>> def int_experimental_dir_qual_opndlist : Intrinsic<
>>                                         [],
>> [llvm_metadata_ty, llvm_vararg_ty],
>> [IntrArgMemOnly],
>> "llvm.experimental.dir.qual.opndlist">;
>>
>> Note that calls to these intrinsics might need to be annotated with the
>> convergent attribute when they represent fork/join operations, barriers,
>> and
>> similar.
>>
>> Usage Examples
>> ==============
>>
>> This section shows a few examples using these experimental intrinsics.
>> LLVM developers who will use these intrinsics can defined their own
>> MDstring.
>> All details of using these intrinsics on representing OpenMP 4.5
>> constructs are described in [1][3].
>>
>>
>> Example I: An OpenMP combined construct
>>
>> #pragma omp target teams distribute parallel for simd
>>   loop
>>
>> LLVM IR
>> -------
>> call void @llvm.experimental.directive(metadata !0)
>> call void @llvm.experimental.directive(metadata !1)
>> call void @llvm.experimental.directive(metadata !2)
>> call void @llvm.experimental.directive(metadata !3)
>>   loop
>> call void @llvm.experimental.directive(metadata !6)
>> call void @llvm.experimental.directive(metadata !5)
>> call void @llvm.experimental.directive(metadata !4)
>>
>> !0 = metadata !{metadata !DIR.OMP.TARGET}
>> !1 = metadata !{metadata !DIR.OMP.TEAMS}
>> !2 = metadata !{metadata !DIR.OMP.DISTRIBUTE.PARLOOP.SIMD}
>>
>> !6 = metadata !{metadata !DIR.OMP.END.DISTRIBUTE.PARLOOP.SIMD}
>> !5 = metadata !{metadata !DIR.OMP.END.TEAMS}
>> !4 = metadata !{metadata !DIR.OMP.END.TARGET}
>>
>> Example II: Assume x,y,z are int variables, and s is a non-POD variable.
>>             Then, lastprivate(x,y,s,z) is represented as:
>>
>> LLVM IR
>> -------
>> call void @llvm.experimental.dir.qual.opndlist(
>>                 metadata !1, %x, %y, metadata !2, %a, %ctor, %dtor, %z)
>>
>> !1 = metadata !{metadata !QUAL.OMP.PRIVATE}
>> !2 = metadata !{metadata !QUAL.OPND.NONPOD}
>>
>> Example III: A prefetch pragma example
>>
>> // issue vprefetch1 for xp with a distance of 20 vectorized iterations
>> ahead
>> // issue vprefetch0 for yp with a distance of 10 vectorized iterations
>> ahead
>> #pragma prefetch x:1:20 y:0:10
>> for (i=0; i<2*N; i++) { xp[i*m + j] = -1; yp[i*n +j] = -2; }
>>
>> LLVM IR
>> -------
>> call void @llvm.experimental.directive(metadata !0)
>> call void @llvm.experimental.dir.qual.opnslist(metadata !1, %xp, 1, 20,
>>                                                metadata !1, %yp, 0, 10)
>>   loop
>> call void @llvm.experimental.directive(metadata !3)
>>
>> References
>> ==========
>>
>> [1] LLVM Framework and IR extensions for Parallelization, SIMD
>> Vectorization
>>     and Offloading Support. SC'2016 LLVM-HPC3 Workshop. (Xinmin Tian
>> et.al.)
>>     Saltlake City, Utah.
>>
>> [2] Extending LoopVectorizer towards supporting OpenMP4.5 SIMD and outer
>> loop
>>     auto-vectorization. (Hideki Saito, et.al.) LLVM Developers' Meeting
>> 2016,
>>     San Jose.
>>
>> [3] Intrinsics, Metadata, and Attributes: The Story continues! (Hal
>> Finkel)
>>     LLVM Developers' Meeting, 2016. San Jose
>>
>> [4] LLVM Intrinsic Function and Metadata String Interface for Directive
>> (or
>>     Pragmas) Representation. Specification Draft v0.9, Intel Corporation,
>> 2016.
>>
>>
>> Acknowledgements
>> ================
>> We would like to thank Chandler Carruth (Google), Johannes Doerfert
>> (Saarland
>> Univ.), Yaoqing Gao (HuaWei), Michael Wong (Codeplay), Ettore Tiotto,
>> Carlo Bertolli, Bardia Mahjour (IBM), and all other LLVM-HPC IR
>> Extensions WG
>> members for their constructive feedback on the LLVM framework and IR
>> extension
>> proposal.
>>
>> Proposed Implementation
>> =======================
>>
>> Two sets of patches of supporting these experimental intrinsics and
>> demonstrate
>> the usage are ready for community review.
>>
>> a) Clang patches that support core OpenMP pragmas using this approach.
>> b) W-Region framework patches: CFG restructuring to form single-entry-
>>    single-exit work region (W-Region) based on annotations, Demand-driven
>>    intrinsic parsing, and WRegionInfo collection and analysis passes,
>>    Dump functions of WRegionInfo.
>>
>> On top of this functionality, we will provide the transformation patches
>> for
>> core OpenMP constructs (e.g. start with "#pragma omp parallel for" loop
>> for
>> lowering and outlining, and "#pragma omp simd" to hook it up with
>> LoopVectorize.cpp). We have internal implementations for many constructs
>> now.
>> We will break this functionality up to create a series of patches for
>> community review.
>>
>> --
>> Hal Finkel
>> Lead, Compiler Technology and Programming Languages
>> Leadership Computing Facility
>> Argonne National Laboratory
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>>
>>
>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170111/c71f08da/attachment-0001.html>