[cfe-dev] Comparison of 2 schemes to implement OpenMP 5.0 declare mapper codegen

Sat Jun 29 05:58:55 PDT 2019

Hi Jonas,

Sure, we are trying to do so. The public lists often reject my emails
because it is large and I cannot fill all people in that mailing list here,
though.

Thanks,
Lingda Li

On Sat, Jun 29, 2019 at 8:39 AM Jonas Hahnfeld <hahnjo at hahnjo.de> wrote:

> Hi Lingda,
>
> may I ask to start discussions about important decisions related to
> Clang's OpenMP support on the public mailing list instead of having
> private conversations? That would help to get feedback from people not
> being part of the selected circle participating in the "OpenMP / HPC in
> Clang / LLVM Multi-company Telecom".
>
> Thanks,
> Jonas
>
> On 2019-06-28 15:59, Lingda Li via cfe-dev wrote:
> > On Fri, Jun 28, 2019 at 9:49 AM Li, Lingda <lli at bnl.gov> wrote:
> >
> >> I don't think we can have the buffer allocated within the mapper
> >> function. It has to be done in the runtime, because of nested
> >> mappers.
> >> First, all mapper functions are born in the same way. We cannot
> >> make the outer most mapper function allocate memory, whether the
> >> inner one doesn't and has to use what is allocated by the outer most
> >> mapper function.
> >> I suppose we still need to allocate memory in the runtime, so the
> >> runtime can pass the pointer and size to the mapper function, and
> >> the outer mapper function can then pass them into inner ones.
> >> Again, this is just like the current implementation, except that we
> >> don't use vecter::push_back(), instead we use something like a
> >> manual implementation of vector::push_back() (because we need to use
> >> the pointer and the current index)
> >>
> >> I believe the key question here is whether it is true that (the
> >> overhead of push_back() > the overhead of precalculating the total
> >> number + the memory allocation overhead + directly memory write).
> >> This will decide whether this change is necessary. Any opinions?
> >>
> >> Thanks,
> >> Lingda Li
> >>
> >> -------------------------
> >>
> >> FROM: Alexey Bataev <Alexey.Bataev at ibm.com>
> >> SENT: Thursday, June 27, 2019 5:05 PM
> >> TO: Li, Lingda
> >> CC: Alexandre Eichenberger; Chapman, Barbara (Contact); Kevin K
> >> O'Brien; Carlo Bertolli; Deepak Eachempati; Denny, Joel E.; David
> >> Oehmke; Ettore Tiotto; fraggamuffin at gmail.com; Rokos, Georgios;
> >> Gheorghe-Teod Bercea; gregory.rodgers at amd.com; Hal Finkel; Sharif,
> >> Hashim; Cownie, James H; Sjodin, Jan; jbeyer at nvidia.com; Doerfert,
> >> Johannes Rudolf; Jones, Jeff C; josem at udel.edu; Robichaux, Joseph;
> >> Jeff Heath; khaldi.dounia at gmail.com; Kelvin Li; Bobrovsky,
> >> Konstantin S; Kotsifakou, Maria; lopezmg at ornl.org; Lopez, Matthew
> >> Graham; Menard, Lorri; Martin Kong; Sarah McNamara; Rice, Michael P;
> >> Matt Martineau; oscar at ornl.gov; Jeeva Paudel; Rao, Premanand M;
> >> Krishnaiyer, Rakesh; Narayanaswamy, Ravi; Monteleone, Robert;
> >> Lieberman, Ron; Samuel Antao; Jeffrey Sandoval; Sunita
> >> Chandrasekaran; sergey.y.ostanevich at gmail.com; Sergio Pino Gallardo;
> >> Dmitriev, Serguei N; Chan, SiuChi; Sunil Shrestha; Wilmarth, Terry
> >> L; Tianyi Zhang; vadve at illinois.edu; Wang Chen; Wael Yehia; Tian,
> >> Xinmin
> >> SUBJECT: Re: Re: Re: RE: Comparison of 2 schemes to implement OpenMP
> >> 5.0 declare mapper codegen
> >>
> >> Yes, we need 2 functions, but thw first one can be optimized very
> >> effectively. After the optimizations and inlining it will end up
> >> with just return s1+s2+s3... I think, inost cases those sizes will
> >> be constant, since the mapper maps constant number of elements. And,
> >> thus, this expression will be optimized to just a constant value.
> >> You don't need to pass these functions to runtime. We can call the
> >> directly from the compiler.
> >> 1st call: get number of elements.
> >> 2nd: allocate the buffer
> >> 3rd call: call mapper with this preallocated buffer that fills this
> >> buffer without any calls of the runtime functions.
> >> 4th call: call the runtime to pass the buffer to the runtime.
> >>
> >> Best regards,
> >> Alexey Bataev
> >>
> >> 27 июня 2019 г., в 16:53, Li, Lingda <lli at bnl.gov>
> >> написал(а):
> >>
> >>> If we precalculate the size, first, it means we need to generate
> >>> 2 functions for each mapper, rather than 1 now. One for mapping
> >>> information filling as we have, the other for size calculation
> >>> (This will not return constant values, because size depends on how
> >>> many instances we are mapping). Both these 2 functions will need
> >>> to be passed to the runtime. The runtime will need to precalculate
> >>> the number of components first, then allocate memory, then call
> >>> the mapper function to fill it up.
> >>>
> >>> Compared with the scheme 1, the differences are:
> >>> 1) An extra call to calculate the total number, while scheme 1
> >>> does not;
> >>> 2) A preallocated buffer, whose pointer and the current number
> >>> should be passed to the mapper function, then the mapper function
> >>> uses them to fill components, while scheme 1 uses push_back() to
> >>> do the same thing.
> >>>
> >>> Is there really a benefit doing this? push_back() should be
> >>> efficient enough compared with directly writing to memory.
> >>>
> >>> If people here think that, the overhead of push_back() > the
> >>> overhead of precalculating the total number + the memory
> >>> allocation overhead + directly memory write, then we can consider
> >>> this scheme.
> >>>
> >>> Thanks,
> >>> Lingda Li
> >>>
> >>> -------------------------
> >>>
> >>> FROM: Alexey Bataev <Alexey.Bataev at ibm.com>
> >>> SENT: Thursday, June 27, 2019 4:26 PM
> >>> TO: Li, Lingda
> >>> CC: Alexandre Eichenberger; Chapman, Barbara (Contact); Kevin K
> >>> O'Brien; Carlo Bertolli; Deepak Eachempati; Denny, Joel E.; David
> >>> Oehmke; Ettore Tiotto; fraggamuffin at gmail.com; Rokos, Georgios;
> >>> Gheorghe-Teod Bercea; gregory.rodgers at amd.com; Hal Finkel; Sharif,
> >>> Hashim; Cownie, James H; Sjodin, Jan; jbeyer at nvidia.com; Doerfert,
> >>> Johannes Rudolf; Jones, Jeff C; josem at udel.edu; Robichaux, Joseph;
> >>> Jeff Heath; khaldi.dounia at gmail.com; Kelvin Li; Bobrovsky,
> >>> Konstantin S; Kotsifakou, Maria; lopezmg at ornl.org; Lopez, Matthew
> >>> Graham; Menard, Lorri; Martin Kong; Sarah McNamara; Rice, Michael
> >>> P; Matt Martineau; oscar at ornl.gov; Jeeva Paudel; Rao, Premanand M;
> >>> Krishnaiyer, Rakesh; Narayanaswamy, Ravi; Monteleone, Robert;
> >>> Lieberman, Ron; Samuel Antao; Jeffrey Sandoval; Sunita
> >>> Chandrasekaran; sergey.y.ostanevich at gmail.com; Sergio Pino
> >>> Gallardo; Dmitriev, Serguei N; Chan, SiuChi; Sunil Shrestha;
> >>> Wilmarth, Terry L; Tianyi Zhang; vadve at illinois.edu; Wang Chen;
> >>> Wael Yehia; Tian, Xinmin
> >>> SUBJECT: Re: Re: RE: Comparison of 2 schemes to implement OpenMP
> >>> 5.0 declare mapper codegen
> >>>
> >>> If the functions are inlined (the ines, intended for size
> >>> precalculation). They can be optimized out very effectively since
> >>> in most cases they will return constant values.
> >>> If we could do this, we won't need vectors and oush_backs, we can
> >>> use preallocated memory and internal counter.
> >>> --------------
> >>> Best regards,
> >>> Alexey Bataev
> >>>
> >>> <graycol.gif>"Li, Lingda" ---06/27/2019 04:13:03 PM---Hi Alexey, I
> >>> think that's why we choose to use variable size storage like
> >>> std::vector to store the m
> >>>
> >>> From: "Li, Lingda" <lli at bnl.gov>
> >>> To: Alexey Bataev <Alexey.Bataev at ibm.com>, Deepak Eachempati
> >>> <deachempat at cray.com>
> >>> Cc: "Narayanaswamy, Ravi" <ravi.narayanaswamy at intel.com>,
> >>> "Alexandre Eichenberger" <alexe at us.ibm.com>, "Chapman, Barbara
> >>> (Contact)" <barbara.chapman at stonybrook.edu>, "Bobrovsky,
> >>> Konstantin S" <konstantin.s.bobrovsky at intel.com>, Carlo Bertolli
> >>> <cbertol at us.ibm.com>, "Chan, SiuChi" <siuchi.chan at amd.com>,
> >>> "Cownie, James H" <james.h.cownie at intel.com>, David Oehmke
> >>> <doehmke at cray.com>, "Denny, Joel E." <dennyje at ornl.gov>,
> >>> "Dmitriev, Serguei N" <serguei.n.dmitriev at intel.com>, "Doerfert,
> >>> Johannes Rudolf" <jdoerfert at anl.gov>, Ettore Tiotto
> >>> <etiotto at ca.ibm.com>, "fraggamuffin at gmail.com"
> >>> <fraggamuffin at gmail.com>, Gheorghe-Teod Bercea
> >>> <Gheorghe-Teod.Bercea at ibm.com>, Hal Finkel <hfinkel at anl.gov>,
> >>> "jbeyer at nvidia.com" <jbeyer at nvidia.com>, Jeeva Paudel
> >>> <pjeeva01 at ca.ibm.com>, Jeff Heath <jrheath at ca.ibm.com>, Jeffrey
> >>> Sandoval <sandoval at cray.com>, "Jones, Jeff C"
> >>> <jeff.c.jones at intel.com>, "josem at udel.edu" <josem at udel.edu>,
> >>> Kelvin Li <kli at ca.ibm.com>, "Kevin K O'Brien"
> >>> <caomhin at us.ibm.com>, "khaldi.dounia at gmail.com"
> >>> <khaldi.dounia at gmail.com>, "Kotsifakou, Maria"
> >>> <kotsifa2 at illinois.edu>, "Krishnaiyer, Rakesh"
> >>> <rakesh.krishnaiyer at intel.com>, "Lieberman, Ron"
> >>> <Ron.Lieberman at amd.com>, "Lopez, Matthew Graham"
> >>> <lopezmg at ornl.gov>, "lopezmg at ornl.org" <lopezmg at ornl.org>, Martin
> >>> Kong <martin.richard.kong at gmail.com>, Matt Martineau
> >>> <m.martineau at bristol.ac.uk>, "Menard, Lorri"
> >>> <lorri.menard at intel.com>, "Monteleone, Robert"
> >>> <robert.monteleone at intel.com>, "oscar at ornl.gov" <oscar at ornl.gov>,
> >>> "Rao, Premanand M" <premanand.m.rao at intel.com>, "Rice, Michael P"
> >>> <michael.p.rice at intel.com>, "Robichaux, Joseph"
> >>> <joseph.robichaux at intel.com>, "gregory.rodgers at amd.com"
> >>> <gregory.rodgers at amd.com>, "Rokos, Georgios"
> >>> <georgios.rokos at intel.com>, Samuel Antao <Samuel.Antao at ibm.com>,
> >>> "Sarah McNamara" <mcnamara at ca.ibm.com>,
> >>> "sergey.y.ostanevich at gmail.com" <sergey.y.ostanevich at gmail.com>,
> >>> Sergio Pino Gallardo <sergiop at udel.edu>, "Sharif, Hashim"
> >>> <hsharif3 at illinois.edu>, "Sjodin, Jan" <Jan.Sjodin at amd.com>, Sunil
> >>> Shrestha <sshrestha at cray.com>, Sunita Chandrasekaran
> >>> <schandra at udel.edu>, "Tian, Xinmin" <xinmin.tian at intel.com>,
> >>> Tianyi Zhang <tzhan18 at lsu.edu>, "vadve at illinois.edu"
> >>> <vadve at illinois.edu>, Wael Yehia <wyehia at ca.ibm.com>, Wang Chen
> >>> <wdchen at ca.ibm.com>, "Wilmarth, Terry L"
> >>> <terry.l.wilmarth at intel.com>
> >>> Date: 06/27/2019 04:13 PM
> >>> Subject: [EXTERNAL] Re: RE: Comparison of 2 schemes to implement
> >>> OpenMP 5.0 declare mapper codegen
> >>>
> >>> -------------------------
> >>>
> >>> Hi Alexey,
> >>>
> >>> I think that's why we choose to use variable size storage like
> >>> std::vector to store the mapping information at the first place,
> >>> right? It'll be costly to precalculate the total number of
> >>> components, especially in the presence of nested mappers. Besides,
> >>> a runtime function call is just a std::vector::push, so I think
> >>> it's okay to have multiple function calls.
> >>>
> >>> Thanks,
> >>> Lingda Li
> >>>
> >>> -------------------------
> >>>
> >>> FROM: Alexey Bataev <Alexey.Bataev at ibm.com>
> >>> Sent: Thursday, June 27, 2019 3:52 PM
> >>> To: Deepak Eachempati
> >>> Cc: Li, Lingda; Narayanaswamy, Ravi; Alexandre Eichenberger;
> >>> Chapman, Barbara (Contact); Bobrovsky, Konstantin S; Carlo
> >>> Bertolli; Chan, SiuChi; Cownie, James H; David Oehmke; Denny, Joel
> >>> E.; Dmitriev, Serguei N; Doerfert, Johannes Rudolf; Ettore Tiotto;
> >>> fraggamuffin at gmail.com; Gheorghe-Teod Bercea; Hal Finkel;
> >>> jbeyer at nvidia.com; Jeeva Paudel; Jeff Heath; Jeffrey Sandoval;
> >>> Jones, Jeff C; josem at udel.edu; Kelvin Li; Kevin K O'Brien;
> >>> khaldi.dounia at gmail.com; Kotsifakou, Maria; Krishnaiyer, Rakesh;
> >>> Lieberman, Ron; Lopez, Matthew Graham; lopezmg at ornl.org; Martin
> >>> Kong; Matt Martineau; Menard, Lorri; Monteleone, Robert;
> >>> oscar at ornl.gov; Rao, Premanand M; Rice, Michael P; Robichaux,
> >>> Joseph; gregory.rodgers at amd.com; Rokos, Georgios; Samuel Antao;
> >>> Sarah McNamara; sergey.y.ostanevich at gmail.com; Sergio Pino
> >>> Gallardo; Sharif, Hashim; Sjodin, Jan; Sunil Shrestha; Sunita
> >>> Chandrasekaran; Tian, Xinmin; Tianyi Zhang; vadve at illinois.edu;
> >>> Wael Yehia; Wang Chen; Wilmarth, Terry L
> >>> Subject: Re: RE: Comparison of 2 schemes to implement OpenMP 5.0
> >>> declare mapper codegen
> >>>
> >>> Lingda, can we in scheme 1 precalculate the total number of
> >>> components, allocate memory for these precalculate number of
> >>> elements, then fill it with mappers and only after that call the
> >>> runtime function (only once!) to transfer the mappings to the
> >>> runtime?
> >>>
> >>> Best regards,
> >>> Alexey Bataev
> >>>
> >>> 27 июня 2019 г., в 15:44, Deepak Eachempati
> >>> <deachempat at cray.com> написал(а):
> >>>
> >>> Got it. Thanks.
> >>>
> >>> -- Deepak
> >>>
> >>> FROM: Li, Lingda [mailto:lli at bnl.gov]
> >>> Sent: Thursday, June 27, 2019 2:41 PM
> >>> To: Deepak Eachempati <deachempat at cray.com>; Narayanaswamy, Ravi
> >>> <ravi.narayanaswamy at intel.com>; 'Alexandre Eichenberger'
> >>> <alexe at us.ibm.com>; 'Alexey Bataev' <Alexey.Bataev at ibm.com>;
> >>> Chapman, Barbara (Contact) <barbara.chapman at stonybrook.edu>;
> >>> Bobrovsky, Konstantin S <konstantin.s.bobrovsky at intel.com>; 'Carlo
> >>> Bertolli' <cbertol at us.ibm.com>; 'Chan, SiuChi'
> >>> <siuchi.chan at amd.com>; Cownie, James H <james.h.cownie at intel.com>;
> >>> David Oehmke <doehmke at cray.com>; 'Denny, Joel E.'
> >>> <dennyje at ornl.gov>; Dmitriev, Serguei N
> >>> <serguei.n.dmitriev at intel.com>; Doerfert, Johannes Rudolf
> >>> <jdoerfert at anl.gov>; 'Ettore Tiotto' <etiotto at ca.ibm.com>;
> >>> 'fraggamuffin at gmail.com' <fraggamuffin at gmail.com>; 'Gheorghe-Teod
> >>> Bercea' <Gheorghe-Teod.Bercea at ibm.com>; Hal Finkel
> >>> <hfinkel at anl.gov>; 'jbeyer at nvidia.com' <jbeyer at nvidia.com>; 'Jeeva
> >>> Paudel' <pjeeva01 at ca.ibm.com>; 'Jeff Heath' <jrheath at ca.ibm.com>;
> >>> Jeffrey Sandoval <sandoval at cray.com>; Jones, Jeff C
> >>> <jeff.c.jones at intel.com>; 'josem at udel.edu' <josem at udel.edu>;
> >>> 'Kelvin Li' <kli at ca.ibm.com>; 'Kevin K O'Brien'
> >>> <caomhin at us.ibm.com>; 'khaldi.dounia at gmail.com'
> >>> <khaldi.dounia at gmail.com>; 'Kotsifakou, Maria'
> >>> <kotsifa2 at illinois.edu>; Krishnaiyer, Rakesh
> >>> <rakesh.krishnaiyer at intel.com>; Lieberman, Ron
> >>> <Ron.Lieberman at amd.com>; 'Lopez, Matthew Graham'
> >>> <lopezmg at ornl.gov>; 'lopezmg at ornl.org' <lopezmg at ornl.org>; 'Martin
> >>> Kong' <martin.richard.kong at gmail.com>; 'Matt Martineau'
> >>> <m.martineau at bristol.ac.uk>; Menard, Lorri
> >>> <lorri.menard at intel.com>; Monteleone, Robert
> >>> <robert.monteleone at intel.com>; oscar at ornl.gov; Rao, Premanand M
> >>> <premanand.m.rao at intel.com>; Rice, Michael P
> >>> <michael.p.rice at intel.com>; Robichaux, Joseph
> >>> <joseph.robichaux at intel.com>; gregory.rodgers at amd.com; Rokos,
> >>> Georgios <georgios.rokos at intel.com>; 'samuel.antao at ibm.com'
> >>> <samuel.antao at ibm.com>; 'Sarah McNamara' <mcnamara at ca.ibm.com>;
> >>> 'sergey.y.ostanevich at gmail.com' <sergey.y.ostanevich at gmail.com>;
> >>> 'Sergio Pino Gallardo' <sergiop at udel.edu>; 'Sharif, Hashim'
> >>> <hsharif3 at illinois.edu>; Sjodin, Jan <Jan.Sjodin at amd.com>; Sunil
> >>> Shrestha <sshrestha at cray.com>; 'Sunita Chandrasekaran'
> >>> <schandra at udel.edu>; Tian, Xinmin <xinmin.tian at intel.com>; Tianyi
> >>> Zhang <tzhan18 at lsu.edu>; 'vadve at illinois.edu'
> >>> <vadve at illinois.edu>; 'Wael Yehia' <wyehia at ca.ibm.com>; 'Wang
> >>> Chen' <wdchen at ca.ibm.com>; Wilmarth, Terry L
> >>> <terry.l.wilmarth at intel.com>
> >>> Subject: Re: Comparison of 2 schemes to implement OpenMP 5.0
> >>> declare mapper codegen
> >>>
> >>> In the current scheme, all mappings within a mapper function is
> >>> done atomically by one thread. In the mapper function of the
> >>> example in the original email, <push> will just push the mapping
> >>> information into an internal data structure. Once all mapping
> >>> information is available, the runtime will do the real mapping
> >>> together. For your example, the behavior is the same as the code
> >>> below:
> >>>
> >>> ...
> >>> #pragma omp parallel num_threads(2)
> >>> {
> >>> if (omp_get_thread_num() == 0) {
> >>> #pragma omp target map(s.x, s.p[0:s.x])
> >>> {
> >>> for (int i = 0; i < s.x; i++) s.p[i] = i;
> >>> }
> >>> } else {
> >>> #pragma omp target map(other_data)
> >>> {
> >>> // work on other_data
> >>> }
> >>> }
> >>> ...
> >>>
> >>> -------------------------
> >>> FROM: Deepak Eachempati <deachempat at cray.com>
> >>> Sent: Thursday, June 27, 2019 3:34 PM
> >>> To: Li, Lingda; Narayanaswamy, Ravi; 'Alexandre Eichenberger';
> >>> 'Alexey Bataev'; Chapman, Barbara (Contact); Bobrovsky, Konstantin
> >>> S; 'Carlo Bertolli'; 'Chan, SiuChi'; Cownie, James H; David
> >>> Oehmke; 'Denny, Joel E.'; Dmitriev, Serguei N; Doerfert, Johannes
> >>> Rudolf ; 'Ettore Tiotto'; 'fraggamuffin at gmail.com'; 'Gheorghe-Teod
> >>> Bercea'; Hal Finkel; 'jbeyer at nvidia.com'; 'Jeeva Paudel'; 'Jeff
> >>> Heath'; Jeffrey Sandoval; Jones, Jeff C; 'josem at udel.edu'; 'Kelvin
> >>> Li'; 'Kevin K O'Brien'; 'khaldi.dounia at gmail.com'; 'Kotsifakou,
> >>> Maria'; Krishnaiyer, Rakesh; Lieberman, Ron ; 'Lopez, Matthew
> >>> Graham'; 'lopezmg at ornl.org'; 'Martin Kong'; 'Matt Martineau';
> >>> Menard, Lorri; Monteleone, Robert; oscar at ornl.gov; Rao, Premanand
> >>> M; Rice, Michael P; Robichaux, Joseph; gregory.rodgers at amd.com;
> >>> Rokos, Georgios; 'samuel.antao at ibm.com'; 'Sarah McNamara';
> >>> 'sergey.y.ostanevich at gmail.com'; 'Sergio Pino Gallardo'; 'Sharif,
> >>> Hashim'; Sjodin, Jan ; Sunil Shrestha; 'Sunita Chandrasekaran';
> >>> Tian, Xinmin; Tianyi Zhang; 'vadve at illinois.edu'; 'Wael Yehia';
> >>> 'Wang Chen'; Wilmarth, Terry L
> >>> Subject: RE: Comparison of 2 schemes to implement OpenMP 5.0
> >>> declare mapper codegen
> >>>
> >>> I was referring to something like this, where another thread is
> >>> not trying to map the same data:
> >>>
> >>> #pragma omp declare mapper(S s) map(s.x) map(s.p[0:s.x])
> >>> S s;
> >>> ...
> >>> #pragma omp parallel num_threads(2)
> >>> {
> >>> if (omp_get_thread_num() == 0) {
> >>> #pragma omp target map(s)
> >>> {
> >>> for (int i = 0; i < s.x; i++) s.p[i] = i;
> >>> }
> >>> } else {
> >>> #pragma omp target map(other_data)
> >>> {
> >>> // work on other_data
> >>> }
> >>> }
> >>> ...
> >>>
> >>> Since I believe you are mapping s.x and s.p as separate map
> >>> operations, it is possible that another thread could map
> >>> ‘other_data’ in between those two maps. If this happens, will
> >>> your implementation still ensure that s.x and s.p are positioned
> >>> at the right offsets with respect to the same base address (&s)?
> >>>
> >>> -- Deepak
> >>>
> >>> FROM: Li, Lingda [mailto:lli at bnl.gov]
> >>> Sent: Thursday, June 27, 2019 2:26 PM
> >>> To: Deepak Eachempati <deachempat at cray.com>; Narayanaswamy, Ravi
> >>> <ravi.narayanaswamy at intel.com>; 'Alexandre Eichenberger'
> >>> <alexe at us.ibm.com>; 'Alexey Bataev' <Alexey.Bataev at ibm.com>;
> >>> Chapman, Barbara (Contact) <barbara.chapman at stonybrook.edu>;
> >>> Bobrovsky, Konstantin S <konstantin.s.bobrovsky at intel.com>; 'Carlo
> >>> Bertolli' <cbertol at us.ibm.com>; 'Chan, SiuChi'
> >>> <siuchi.chan at amd.com>; Cownie, James H <james.h.cownie at intel.com>;
> >>> David Oehmke <doehmke at cray.com>; 'Denny, Joel E.'
> >>> <dennyje at ornl.gov>; Dmitriev, Serguei N
> >>> <serguei.n.dmitriev at intel.com>; Doerfert, Johannes Rudolf
> >>> <jdoerfert at anl.gov>; 'Ettore Tiotto' <etiotto at ca.ibm.com>;
> >>> 'fraggamuffin at gmail.com' <fraggamuffin at gmail.com>; 'Gheorghe-Teod
> >>> Bercea' <Gheorghe-Teod.Bercea at ibm.com>; Hal Finkel
> >>> <hfinkel at anl.gov>; 'jbeyer at nvidia.com' <jbeyer at nvidia.com>; 'Jeeva
> >>> Paudel' <pjeeva01 at ca.ibm.com>; 'Jeff Heath' <jrheath at ca.ibm.com>;
> >>> Jeffrey Sandoval <sandoval at cray.com>; Jones, Jeff C
> >>> <jeff.c.jones at intel.com>; 'josem at udel.edu' <josem at udel.edu>;
> >>> 'Kelvin Li' <kli at ca.ibm.com>; 'Kevin K O'Brien'
> >>> <caomhin at us.ibm.com>; 'khaldi.dounia at gmail.com'
> >>> <khaldi.dounia at gmail.com>; 'Kotsifakou, Maria'
> >>> <kotsifa2 at illinois.edu>; Krishnaiyer, Rakesh
> >>> <rakesh.krishnaiyer at intel.com>; Lieberman, Ron
> >>> <Ron.Lieberman at amd.com>; 'Lopez, Matthew Graham'
> >>> <lopezmg at ornl.gov>; 'lopezmg at ornl.org' <lopezmg at ornl.org>; 'Martin
> >>> Kong' <martin.richard.kong at gmail.com>; 'Matt Martineau'
> >>> <m.martineau at bristol.ac.uk>; Menard, Lorri
> >>> <lorri.menard at intel.com>; Monteleone, Robert
> >>> <robert.monteleone at intel.com>; oscar at ornl.gov; Rao, Premanand M
> >>> <premanand.m.rao at intel.com>; Rice, Michael P
> >>> <michael.p.rice at intel.com>; Robichaux, Joseph
> >>> <joseph.robichaux at intel.com>; gregory.rodgers at amd.com; Rokos,
> >>> Georgios <georgios.rokos at intel.com>; 'samuel.antao at ibm.com'
> >>> <samuel.antao at ibm.com>; 'Sarah McNamara' <mcnamara at ca.ibm.com>;
> >>> 'sergey.y.ostanevich at gmail.com' <sergey.y.ostanevich at gmail.com>;
> >>> 'Sergio Pino Gallardo' <sergiop at udel.edu>; 'Sharif, Hashim'
> >>> <hsharif3 at illinois.edu>; Sjodin, Jan <Jan.Sjodin at amd.com>; Sunil
> >>> Shrestha <sshrestha at cray.com>; 'Sunita Chandrasekaran'
> >>> <schandra at udel.edu>; Tian, Xinmin <xinmin.tian at intel.com>; Tianyi
> >>> Zhang <tzhan18 at lsu.edu>; 'vadve at illinois.edu'
> >>> <vadve at illinois.edu>; 'Wael Yehia' <wyehia at ca.ibm.com>; 'Wang
> >>> Chen' <wdchen at ca.ibm.com>; Wilmarth, Terry L
> >>> <terry.l.wilmarth at intel.com>
> >>> Subject: Re: Comparison of 2 schemes to implement OpenMP 5.0
> >>> declare mapper codegen
> >>>
> >>> When 2 threads try to concurrently map the same data, it behaves
> >>> the same as when 2 threads concurrently map the same data using
> >>> map clauses, and mappers don't introduce extra considerations
> >>> here. For instance, both threads use #omp target enter data
> >>> concurrently.
> >>>
> >>> When 2 threads concurrently maps the same data, my understanding
> >>> based on the current code is, it will create 2 copies of the same
> >>> data, either copy is correctly to use. It may have a problem when
> >>> both copies are mapped back if not synchronized correctly, but
> >>> this is a programming issue, not the responsibility of OpenMP.
> >>>
> >>> Thanks,
> >>> Lingda Li
> >>>
> >>> -------------------------
> >>> FROM: Deepak Eachempati <deachempat at cray.com>
> >>> Sent: Thursday, June 27, 2019 3:17 PM
> >>> To: Li, Lingda; Narayanaswamy, Ravi; 'Alexandre Eichenberger';
> >>> 'Alexey Bataev'; Chapman, Barbara (Contact); Bobrovsky, Konstantin
> >>> S; 'Carlo Bertolli'; 'Chan, SiuChi'; Cownie, James H; David
> >>> Oehmke; 'Denny, Joel E.'; Dmitriev, Serguei N; Doerfert, Johannes
> >>> Rudolf ; 'Ettore Tiotto'; 'fraggamuffin at gmail.com'; 'Gheorghe-Teod
> >>> Bercea'; Hal Finkel; 'jbeyer at nvidia.com'; 'Jeeva Paudel'; 'Jeff
> >>> Heath'; Jeffrey Sandoval; Jones, Jeff C; 'josem at udel.edu'; 'Kelvin
> >>> Li'; 'Kevin K O'Brien'; 'khaldi.dounia at gmail.com'; 'Kotsifakou,
> >>> Maria'; Krishnaiyer, Rakesh; Lieberman, Ron ; 'Lopez, Matthew
> >>> Graham'; 'lopezmg at ornl.org'; 'Martin Kong'; 'Matt Martineau';
> >>> Menard, Lorri; Monteleone, Robert; oscar at ornl.gov; Rao, Premanand
> >>> M; Rice, Michael P; Robichaux, Joseph; gregory.rodgers at amd.com;
> >>> Rokos, Georgios; 'samuel.antao at ibm.com'; 'Sarah McNamara';
> >>> 'sergey.y.ostanevich at gmail.com'; 'Sergio Pino Gallardo'; 'Sharif,
> >>> Hashim'; Sjodin, Jan ; Sunil Shrestha; 'Sunita Chandrasekaran';
> >>> Tian, Xinmin; Tianyi Zhang; 'vadve at illinois.edu'; 'Wael Yehia';
> >>> 'Wang Chen'; Wilmarth, Terry L
> >>> Subject: RE: Comparison of 2 schemes to implement OpenMP 5.0
> >>> declare mapper codegen
> >>>
> >>> Thanks.
> >>>
> >>> Is it possible for another thread to be concurrently mapped
> >>> something else while the maps from the mapper function are taking
> >>> place? If so, how do you guarantee that the allocation for each
> >>> component will get you the right addresses in device memory? Sorry
> >>> if this was covered before and I missed it.
> >>>
> >>> -- Deepak
> >>>
> >>> FROM: Li, Lingda [mailto:lli at bnl.gov]
> >>> Sent: Thursday, June 27, 2019 2:08 PM
> >>> To: Deepak Eachempati <deachempat at cray.com>; Narayanaswamy, Ravi
> >>> <ravi.narayanaswamy at intel.com>; 'Alexandre Eichenberger'
> >>> <alexe at us.ibm.com>; 'Alexey Bataev' <Alexey.Bataev at ibm.com>;
> >>> Chapman, Barbara (Contact) <barbara.chapman at stonybrook.edu>;
> >>> Bobrovsky, Konstantin S <konstantin.s.bobrovsky at intel.com>; 'Carlo
> >>> Bertolli' <cbertol at us.ibm.com>; 'Chan, SiuChi'
> >>> <siuchi.chan at amd.com>; Cownie, James H <james.h.cownie at intel.com>;
> >>> David Oehmke <doehmke at cray.com>; 'Denny, Joel E.'
> >>> <dennyje at ornl.gov>; Dmitriev, Serguei N
> >>> <serguei.n.dmitriev at intel.com>; Doerfert, Johannes Rudolf
> >>> <jdoerfert at anl.gov>; 'Ettore Tiotto' <etiotto at ca.ibm.com>;
> >>> 'fraggamuffin at gmail.com' <fraggamuffin at gmail.com>; 'Gheorghe-Teod
> >>> Bercea' <Gheorghe-Teod.Bercea at ibm.com>; Hal Finkel
> >>> <hfinkel at anl.gov>; 'jbeyer at nvidia.com' <jbeyer at nvidia.com>; 'Jeeva
> >>> Paudel' <pjeeva01 at ca.ibm.com>; 'Jeff Heath' <jrheath at ca.ibm.com>;
> >>> Jeffrey Sandoval <sandoval at cray.com>; Jones, Jeff C
> >>> <jeff.c.jones at intel.com>; 'josem at udel.edu' <josem at udel.edu>;
> >>> 'Kelvin Li' <kli at ca.ibm.com>; 'Kevin K O'Brien'
> >>> <caomhin at us.ibm.com>; 'khaldi.dounia at gmail.com'
> >>> <khaldi.dounia at gmail.com>; 'Kotsifakou, Maria'
> >>> <kotsifa2 at illinois.edu>; Krishnaiyer, Rakesh
> >>> <rakesh.krishnaiyer at intel.com>; Lieberman, Ron
> >>> <Ron.Lieberman at amd.com>; 'Lopez, Matthew Graham'
> >>> <lopezmg at ornl.gov>; 'lopezmg at ornl.org' <lopezmg at ornl.org>; 'Martin
> >>> Kong' <martin.richard.kong at gmail.com>; 'Matt Martineau'
> >>> <m.martineau at bristol.ac.uk>; Menard, Lorri
> >>> <lorri.menard at intel.com>; Monteleone, Robert
> >>> <robert.monteleone at intel.com>; oscar at ornl.gov; Rao, Premanand M
> >>> <premanand.m.rao at intel.com>; Rice, Michael P
> >>> <michael.p.rice at intel.com>; Robichaux, Joseph
> >>> <joseph.robichaux at intel.com>; gregory.rodgers at amd.com; Rokos,
> >>> Georgios <georgios.rokos at intel.com>; 'samuel.antao at ibm.com'
> >>> <samuel.antao at ibm.com>; 'Sarah McNamara' <mcnamara at ca.ibm.com>;
> >>> 'sergey.y.ostanevich at gmail.com' <sergey.y.ostanevich at gmail.com>;
> >>> 'Sergio Pino Gallardo' <sergiop at udel.edu>; 'Sharif, Hashim'
> >>> <hsharif3 at illinois.edu>; Sjodin, Jan <Jan.Sjodin at amd.com>; Sunil
> >>> Shrestha <sshrestha at cray.com>; 'Sunita Chandrasekaran'
> >>> <schandra at udel.edu>; Tian, Xinmin <xinmin.tian at intel.com>; Tianyi
> >>> Zhang <tzhan18 at lsu.edu>; 'vadve at illinois.edu'
> >>> <vadve at illinois.edu>; 'Wael Yehia' <wyehia at ca.ibm.com>; 'Wang
> >>> Chen' <wdchen at ca.ibm.com>; Wilmarth, Terry L
> >>> <terry.l.wilmarth at intel.com>
> >>> Subject: Re: Comparison of 2 schemes to implement OpenMP 5.0
> >>> declare mapper codegen
> >>>
> >>> Hi Deepak,
> >>>
> >>> Yes, it handles this case. The first part of mapper function
> >>> (initially allocate space for the whole array) is just an
> >>> optimization, not required for correctness, as suggested by you in
> >>> an early discussion.
> >>>
> >>> In your example, s.x and s.p will be allocated separately (not in
> >>> a single allocation). But Clang guarantees that their addresses
> >>> will be correct because s.x and s.p share the same base address,
> >>> which is &s.
> >>>
> >>> Thanks,
> >>> Lingda Li
> >>>
> >>> -------------------------
> >>> FROM: Deepak Eachempati <deachempat at cray.com>
> >>> Sent: Thursday, June 27, 2019 2:49 PM
> >>> To: Li, Lingda; Narayanaswamy, Ravi; 'Alexandre Eichenberger';
> >>> 'Alexey Bataev'; Chapman, Barbara (Contact); Bobrovsky, Konstantin
> >>> S; 'Carlo Bertolli'; 'Chan, SiuChi'; Cownie, James H; David
> >>> Oehmke; 'Denny, Joel E.'; Dmitriev, Serguei N; Doerfert, Johannes
> >>> Rudolf ; 'estotzer at ti.com'; 'Ettore Tiotto';
> >>> 'fraggamuffin at gmail.com'; 'Gheorghe-Teod Bercea'; Hal Finkel;
> >>> 'jbeyer at nvidia.com'; 'Jeeva Paudel'; 'Jeff Heath'; Jeffrey
> >>> Sandoval; Jones, Jeff C; 'josem at udel.edu'; 'Kelvin Li'; 'Kevin K
> >>> O'Brien'; 'khaldi.dounia at gmail.com'; 'Kotsifakou, Maria';
> >>> Krishnaiyer, Rakesh; Lieberman, Ron ; 'Lopez, Matthew Graham';
> >>> 'lopezmg at ornl.org'; 'Martin Kong'; 'Matt Martineau'; Menard,
> >>> Lorri; Monteleone, Robert; oscar at ornl.gov; Rao, Premanand M; Rice,
> >>> Michael P; Robichaux, Joseph; gregory.rodgers at amd.com; Rokos,
> >>> Georgios; 'samuel.antao at ibm.com'; 'Sarah McNamara';
> >>> 'sergey.y.ostanevich at gmail.com'; 'Sergio Pino Gallardo'; 'Sharif,
> >>> Hashim'; Sjodin, Jan ; Sunil Shrestha; 'Sunita Chandrasekaran';
> >>> Tian, Xinmin; Tianyi Zhang; 'vadve at illinois.edu'; 'Wael Yehia';
> >>> 'Wang Chen'; Wilmarth, Terry L
> >>> Subject: RE: Comparison of 2 schemes to implement OpenMP 5.0
> >>> declare mapper codegen
> >>>
> >>> For Scheme 1, it looks like you are doing separate maps for each
> >>> component when size == 1. It seems like the first and last if
> >>> statements should have “size >= 1” rather than “size > 1”.
> >>>
> >>> If the mapper is declared like this:
> >>>
> >>> struct S {
> >>> int x;
> >>> ... // other stuff
> >>> int *p;
> >>> };
> >>>
> >>> #pragma omp declare mapper(S s) map(s.x) map(s.p[0:s.x])
> >>>
> >>> And you have:
> >>>
> >>> S s;
> >>> ...
> >>> #pragma omp target map(s)
> >>> {
> >>> for (int i = 0; i < s.x; i++) s.p[i] = i;
> >>> }
> >>>
> >>> Since the target construct is just mapping a single structure of
> >>> type S, there should be one map that takes care of mapping storage
> >>> for s.x and s.p with a single allocation, and a separate map for
> >>> the array section s.p[0:s.x], and finally the pointer attachment
> >>> of s.p to s.p[0:s.x]. Does Scheme 1 handle this?
> >>>
> >>> -- Deepak
> >>>
> >>> FROM: Li, Lingda [mailto:lli at bnl.gov]
> >>> Sent: Thursday, June 27, 2019 1:07 PM
> >>> To: Narayanaswamy, Ravi <ravi.narayanaswamy at intel.com>; 'Alexandre
> >>> Eichenberger' <alexe at us.ibm.com>; 'Alexey Bataev'
> >>> <Alexey.Bataev at ibm.com>; Chapman, Barbara (Contact)
> >>> <barbara.chapman at stonybrook.edu>; Bobrovsky, Konstantin S
> >>> <konstantin.s.bobrovsky at intel.com>; 'Carlo Bertolli'
> >>> <cbertol at us.ibm.com>; 'Chan, SiuChi' <siuchi.chan at amd.com>;
> >>> Cownie, James H <james.h.cownie at intel.com>; David Oehmke
> >>> <doehmke at cray.com>; Deepak Eachempati <deachempat at cray.com>;
> >>> 'Denny, Joel E.' <dennyje at ornl.gov>; Dmitriev, Serguei N
> >>> <serguei.n.dmitriev at intel.com>; Doerfert, Johannes Rudolf
> >>> <jdoerfert at anl.gov>; 'estotzer at ti.com' <estotzer at ti.com>; 'Ettore
> >>> Tiotto' <etiotto at ca.ibm.com>; 'fraggamuffin at gmail.com'
> >>> <fraggamuffin at gmail.com>; 'Gheorghe-Teod Bercea'
> >>> <Gheorghe-Teod.Bercea at ibm.com>; Hal Finkel <hfinkel at anl.gov>;
> >>> 'jbeyer at nvidia.com' <jbeyer at nvidia.com>; 'Jeeva Paudel'
> >>> <pjeeva01 at ca.ibm.com>; 'Jeff Heath' <jrheath at ca.ibm.com>; Jeffrey
> >>> Sandoval <sandoval at cray.com>; Jones, Jeff C
> >>> <jeff.c.jones at intel.com>; 'josem at udel.edu' <josem at udel.edu>;
> >>> 'Kelvin Li' <kli at ca.ibm.com>; 'Kevin K O'Brien'
> >>> <caomhin at us.ibm.com>; 'khaldi.dounia at gmail.com'
> >>> <khaldi.dounia at gmail.com>; 'Kotsifakou, Maria'
> >>> <kotsifa2 at illinois.edu>; Krishnaiyer, Rakesh
> >>> <rakesh.krishnaiyer at intel.com>; Lieberman, Ron
> >>> <Ron.Lieberman at amd.com>; Li, Lingda <lli at bnl.gov>; 'Lopez, Matthew
> >>> Graham' <lopezmg at ornl.gov>; 'lopezmg at ornl.org' <lopezmg at ornl.org>;
> >>> 'Martin Kong' <martin.richard.kong at gmail.com>; 'Matt Martineau'
> >>> <m.martineau at bristol.ac.uk>; Menard, Lorri
> >>> <lorri.menard at intel.com>; Monteleone, Robert
> >>> <robert.monteleone at intel.com>; oscar at ornl.gov; Rao, Premanand M
> >>> <premanand.m.rao at intel.com>; Rice, Michael P
> >>> <michael.p.rice at intel.com>; Robichaux, Joseph
> >>> <joseph.robichaux at intel.com>; gregory.rodgers at amd.com; Rokos,
> >>> Georgios <georgios.rokos at intel.com>; 'samuel.antao at ibm.com'
> >>> <samuel.antao at ibm.com>; 'Sarah McNamara' <mcnamara at ca.ibm.com>;
> >>> 'sergey.y.ostanevich at gmail.com' <sergey.y.ostanevich at gmail.com>;
> >>> 'Sergio Pino Gallardo' <sergiop at udel.edu>; 'Sharif, Hashim'
> >>> <hsharif3 at illinois.edu>; Sjodin, Jan <Jan.Sjodin at amd.com>; Sunil
> >>> Shrestha <sshrestha at cray.com>; 'Sunita Chandrasekaran'
> >>> <schandra at udel.edu>; Tian, Xinmin <xinmin.tian at intel.com>; Tianyi
> >>> Zhang <tzhan18 at lsu.edu>; 'vadve at illinois.edu'
> >>> <vadve at illinois.edu>; 'Wael Yehia' <wyehia at ca.ibm.com>; 'Wang
> >>> Chen' <wdchen at ca.ibm.com>; Wilmarth, Terry L
> >>> <terry.l.wilmarth at intel.com>
> >>> Subject: Comparison of 2 schemes to implement OpenMP 5.0 declare
> >>> mapper codegen
> >>>
> >>> Hi,
> >>>
> >>> Alexey and I would like to have your attention on an ongoing
> >>> discussion of 2 schemes to implement the declare mapper in OpenMP
> >>> 5.0. The detailed discussion can be found at
> >>> https://reviews.llvm.org/D59474 [1]
> >>>
> >>> Scheme 1 (the one has been implemented by me in
> >>> https://reviews.llvm.org/D59474 [1]):
> >>> The detailed design can be found at
> >>>
> >>
> >
> https://github.com/lingda-li/public-sharing/blob/master/mapper_runtime_design.pptx
> >>> [2]
> >>> For each mapper function, the compiler generates a function like
> >>> this:
> >>>
> >>> ```
> >>> void <type>.mapper(void *base, void *begin, size_t size, int64_t
> >>> type) {
> >>> // Allocate space for an array section first.
> >>> if (size > 1 && !maptype.IsDelete)
> >>> <push>(base, begin, size*sizeof(Ty), clearToFrom(type));
> >>>
> >>> // Map members.
> >>> for (unsigned i = 0; i < size; i++) {
> >>> // For each component specified by this mapper:
> >>> for (auto c : components) {
> >>> ...; // code to generate c.arg_base, c.arg_begin, c.arg_size,
> >>> c.arg_type
> >>> if (c.hasMapper())
> >>> (*c.Mapper())(c.arg_base, c.arg_begin, c.arg_size, c.arg_type);
> >>> else
> >>> <push>(c.arg_base, c.arg_begin, c.arg_size, c.arg_type);
> >>> }
> >>> }
> >>> // Delete the array section.
> >>> if (size > 1 && maptype.IsDelete)
> >>> <push>(base, begin, size*sizeof(Ty), clearToFrom(type));
> >>> }
> >>> ```
> >>> This function is passed to the OpenMP runtime, and the runtime
> >>> will call this function to finish the data mapping.
> >>>
> >>> Scheme 2 (which Alexey proposes):
> >>> Alexey proposes to move parts of the mapper function above into
> >>> the OpenMP runtime, so the compiler will generate code below:
> >>> ```
> >>> void <type>.mapper(void *base, void *begin, size_t size, int64_t
> >>> type) {
> >>> ...; // code to generate arg_base, arg_begin, arg_size, arg_type,
> >>> arg_mapper.
> >>> auto sub_components[] = {...}; // fill in generated begin, base,
> >>> ...
> >>> __tgt_mapper(base, begin, size, type, sub_components);
> >>> }
> >>> ```
> >>>
> >>> `__tgt_mapper` is a runtime function as below:
> >>> ```
> >>> void __tgt_mapper(void *base, void *begin, size_t size, int64_t
> >>> type, auto components[]) {
> >>> // Allocate space for an array section first.
> >>> if (size > 1 && !maptype.IsDelete)
> >>> <push>(base, begin, size*sizeof(Ty), clearToFrom(type));
> >>>
> >>> // Map members.
> >>> for (unsigned i = 0; i < size; i++) {
> >>> // For each component specified by this mapper:
> >>> for (auto c : components) {
> >>> if (c.hasMapper())
> >>> (*c.Mapper())(c.arg_base, c.arg_begin, c.arg_size, c.arg_type);
> >>> else
> >>> <push>(c.arg_base, c.arg_begin, c.arg_size, c.arg_type);
> >>> }
> >>> }
> >>> // Delete the array section.
> >>> if (size > 1 && maptype.IsDelete)
> >>> <push>(base, begin, size*sizeof(Ty), clearToFrom(type));
> >>> }
> >>> ```
> >>>
> >>> Comparison:
> >>> Why to choose 1 over 2:
> >>> 1. In scheme 2, the compiler needs to generate all map types and
> >>> pass them to __tgt_mapper through sub_components. But in this
> >>> case, the compiler won't be able to generate the correct MEMBER_OF
> >>> field in map type. As a result, the runtime has to fix it using
> >>> the mechanism we already have here: __tgt_mapper_num_components.
> >>> This not only increases complexity, but also, it means the runtime
> >>> needs further manipulation of the map type, which creates locality
> >>> issues. While in the current scheme, the map type is generated by
> >>> compiler once, so the data locality will be very good in this
> >>> case.
> >>> 2. In scheme 2, sub_components includes all components that should
> >>> be mapped. If we are mapping an array, this means we need to map
> >>> many components, which will need to allocate memory for
> >>> sub_components in the heap. This creates further memory management
> >>> burden and is not an efficient way to use memory.
> >>> 3. In scheme 1, we are able to inline nested mapper functions. As
> >>> a result, the compiler can do further optimizations to optimize
> >>> the mapper function, e.g., eliminate redundant computation, loop
> >>> unrolling, and thus achieve potentially better performance. We
> >>> cannot achieve these optimizations in scheme 2.
> >>>
> >>> Why to choose 2 over 1:
> >>> 1. Less code in the mapper function codegen (I doubt this because
> >>> the codegen function of scheme 1 uses less than 200 loc)
> >>> Alexey may have other reasons.
> >>>
> >>> We will appreciate if you can share your thoughts.
> >>>
> >>> Thanks,
> >>> Lingda Li
> >>>
> >>> -------------------------
> >>> FROM: Narayanaswamy, Ravi <ravi.narayanaswamy at intel.com>
> >>> Sent: Wednesday, June 19, 2019 3:09 PM
> >>> To: 'Alexandre Eichenberger'; 'Alexey Bataev';
> >>> 'barbara.chapman at stonybrook.edu'; Bobrovsky, Konstantin S; 'Carlo
> >>> Bertolli'; 'Chan, SiuChi'; Cownie, James H; David Oehmke; Deepak
> >>> Eachempati; 'Denny, Joel E.'; Dmitriev, Serguei N; Doerfert,
> >>> Johannes Rudolf ; 'estotzer at ti.com'; 'Ettore Tiotto';
> >>> 'fraggamuffin at gmail.com'; 'Gheorghe-Teod Bercea';
> >>> 'hfinkel at anl.gov'; 'jbeyer at nvidia.com'; 'Jeeva Paudel'; 'Jeff
> >>> Heath'; Jeffrey Sandoval; Jones, Jeff C; 'josem at udel.edu'; 'Kelvin
> >>> Li'; 'Kevin K O'Brien'; 'khaldi.dounia at gmail.com'; 'Kotsifakou,
> >>> Maria'; Krishnaiyer, Rakesh; Lieberman, Ron ; 'lli at bnl.gov';
> >>> 'Lopez, Matthew Graham'; 'lopezmg at ornl.org'; 'Martin Kong'; 'Matt
> >>> Martineau'; Menard, Lorri; Monteleone, Robert; Narayanaswamy,
> >>> Ravi; 'Oscar R. Hernandez'; Rao, Premanand M; Rice, Michael P;
> >>> Robichaux, Joseph; Rodgers, Gregory; Rokos, Georgios;
> >>> 'samuel.antao at ibm.com'; 'Sarah McNamara';
> >>> 'sergey.y.ostanevich at gmail.com'; 'Sergio Pino Gallardo'; 'Sharif,
> >>> Hashim'; Sjodin, Jan ; Sunil Shrestha (sshrestha at cray.com);
> >>> 'Sunita Chandrasekaran'; Tian, Xinmin; Tianyi Zhang;
> >>> 'vadve at illinois.edu'; 'Wael Yehia'; 'Wang Chen'; Wilmarth, Terry L
> >>> Subject: OpenMP / HPC in Clang / LLVM Multi-company Telecom
> >>> Meeting Minutes June 19th 2019
> >>>
> >>> NEXT MEETING : JULY 10TH (MOVED FROM JULY 3RD)
> >>>
> >>> OPENS :
> >>> - DOCUMENTATION
> >>> - Greg : Can we have documents for libopenmp and Libomptarget.
> >>> - Alexey suggested having 3 documents: libopenmp, Libomptarget and
> >>> device plugin
> >>> - Hal will convert the existing libomptarget document. Once done
> >>> others can update document to capture the existing implementation
> >>> Future addition to libomptarget will also require update to
> >>> document.
> >>> - Next libopenmp document will be created if it does not exist or
> >>> updated if one exists.
> >>>
> >>> LTO FOR FAT BINARY LINKING
> >>> - Serguei (Intel) has an implementation which enables LTO and
> >>> doing away with linker scripts.
> >>> Everybody agreed this is a good idea, especially some linkers
> >>> don’t have support for linker scripts.
> >>> AMD is interested in enabling enabling LTO and will like to see
> >>> the code
> >>> Serguei to post the code to get feedback from all
> >>> - Hal to present in next meeting his proposal to support static
> >>> fat archives using LTO.
> >>>
> >>> OPENMP 5.0 FEATURES
> >>> - No update on setting up the public website. Johannes was out
> >>> attending ISC.
> >>> - New features added since last release (courtesy of Kelvin)
> >>> - allocate clause/allocate directive - parsing+sema, codegen
> >>> - mutexinout dependence-type for task
> >>> - user-defined mapper (declare mapper) - parsing+sema.
> >>> - omp_get_device_num() API routine
> >>>
> >>> DEVELOPMENT ACTIVITY
> >>> - ASYNC API
> >>> Support in Clang and libopenmp including lit test had been checked
> >>> in by Doru
> >>>
> >>> - MAPPER SUPPORT
> >>> Initial support for Mapper has been posted for review Lingda. Once
> >>> approved, the rest of the support will be done
> >>> Lingda : Should the old API being replaced by the new similar API
> >>> with extra mapper argument be obsoleted
> >>> Suggestion was for clang to not generated but keep the API in
> >>> libomptarget for backward compatible. In the future it can be
> >>> obsoleted
> >>>
> >>> - REQUIRED DIRECTIVES
> >>> Support for required directives has been checked in by Doru.
> >>> There was one issue with checking for requires directive and
> >>> confirming it the Declare type is TO or LINK.
> >>> Doru removed the check and added note to make sure if things
> >>> change in future need to modify this code.
> >>>
> >>> ROLL CALL :
> >>>
> >>> COMPANY
> >>> ATTENDEES
> >>>
> >>> 19-JUN
> >>>
> >>> AMD
> >>>
> >>> Greg Rodgers
> >>>
> >>> x
> >>>
> >>> Ashwin Aji
> >>>
> >>> Jan Sjodin
> >>>
> >>> x
> >>>
> >>> Ron Lieberman
> >>>
> >>> x
> >>>
> >>> sameer Sahasrabuddhe
> >>>
> >>> Andrey Kasaurov
> >>>
> >>> ANL
> >>> Hal Finkel
> >>>
> >>> x
> >>>
> >>> Johannes Doerfert
> >>>
> >>> IBM
> >>> Alexandre Eichenberger
> >>>
> >>> Carlo Bertolli
> >>>
> >>> Kelvin Li
> >>>
> >>> Doru
> >>>
> >>> x
> >>>
> >>> Alexey Bataev
> >>>
> >>> x
> >>>
> >>> INTEL
> >>> Andrey Churbanov
> >>>
> >>> Ravi Narayanaswamy
> >>>
> >>> x
> >>>
> >>> Serguei Dmitriev
> >>>
> >>> x
> >>>
> >>> Rajiv Deodhar
> >>>
> >>> Lorri Menard
> >>>
> >>> Terry Wilmarth
> >>>
> >>> Rao, Prem
> >>>
> >>> Hansang Bae
> >>>
> >>> George Rokos
> >>>
> >>> x
> >>>
> >>> CRAY
> >>> Deepak Eachempati
> >>>
> >>> x
> >>>
> >>> MICRON
> >>> John Leidel
> >>>
> >>> NVIDIA
> >>> James Beyer
> >>>
> >>> x
> >>>
> >>> ORNL
> >>> Graham Lopez
> >>>
> >>> Joel Denny
> >>>
> >>> Geoffroy Vallee
> >>>
> >>> Oscar Hernandez
> >>>
> >>> SBU/BNL
> >>> Lingda Li
> >>>
> >>> x
> >>>
> >>> Jose Monlsave
> >>>
> >>> Martin Kong
> >>>
> >>> TI
> >>> Eric Stotzer
> >>>
> >>> U OF BRISTOL
> >>> Mat Martineau
> >>>
> >>> U OF DELAWARE
> >>> Sunita Chandrasekaran
> >>>
> >>> U OF ILLINOIS
> >>> Hashim Sharif
> >>>
> >>> RICE
> >>> John Mellor-Crummey
> >>>
> >>> LSU
> >>> Tianyi Zhang
> >>>
> >>>
> >>
> >
> .........................................................................................................................................
> >>> àJoin Skype Meeting [3]
> >>>
> >>> Trouble Joining? Try Skype Web App [4]
> >>>
> >>> Join by phone
> >>> +1(916)356-2663 (or your local bridge access #) Choose bridge 5.
> >>> [5] (Global) English (United States)
> >>> Find a local number [6]
> >>>
> >>> Conference ID: 7607896966
> >>> Forgot your dial-in PIN? [6] |Help [7]
> >>>
> >>> [!OC([1033])!]
> >>>
> >>
> >
> .........................................................................................................................................
> >
> >
> > Links:
> > ------
> > [1]
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__reviews.llvm.org_D59474&d=DwMFaQ&c=aTOVZmpUfPKZuaG9NO7J7Mh6imZbfhL47t9CpZ-pCOw&r=RLUU7gQynM_GwGu2QR7zHw&m=0c8CuLZZzM3R7PecCmFPYLuPYEOtCJHYTIGjSgIPaWU&s=EVaPRpEtSzi0Y56zmjD5fXRzN87UZDOaYp5PY3TXiVQ&e=
> > [2]
> >
> https://github.com/lingda-li/public-sharing/blob/master/mapper_runtime_design.pptx
> > [3]
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__meet.intel.com_ravi.narayanaswamy_DK7943NR&d=DwMFaQ&c=aTOVZmpUfPKZuaG9NO7J7Mh6imZbfhL47t9CpZ-pCOw&r=RLUU7gQynM_GwGu2QR7zHw&m=0c8CuLZZzM3R7PecCmFPYLuPYEOtCJHYTIGjSgIPaWU&s=K4msFCmDvK4n0MdVQd7UTXRRvRkaNwLzMaP8fnX0iOg&e=
> > [4]
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__meet.intel.com_ravi.narayanaswamy_DK7943NR-3Fsl-3D1&d=DwMFaQ&c=aTOVZmpUfPKZuaG9NO7J7Mh6imZbfhL47t9CpZ-pCOw&r=RLUU7gQynM_GwGu2QR7zHw&m=0c8CuLZZzM3R7PecCmFPYLuPYEOtCJHYTIGjSgIPaWU&s=krI3wEp2z8GhcZt6feFq3WgaBjcEoTDRk-GvI1BIdO8&e=
> > [5]
> >
> tel:+1(916)356-2663%20(or%20your%20local%20bridge%20access%20#)%20Choose%20bridge%205.
> > [6]
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__dial.intel.com&d=DwMFaQ&c=aTOVZmpUfPKZuaG9NO7J7Mh6imZbfhL47t9CpZ-pCOw&r=RLUU7gQynM_GwGu2QR7zHw&m=0c8CuLZZzM3R7PecCmFPYLuPYEOtCJHYTIGjSgIPaWU&s=g2dQtoTqaRXyBMaIUpfyoPFDRTtrQbgbWbb9b90tgBg&e=
> > [7]
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__o15.officeredir.microsoft.com_r_rlidLync15-3Fclid-3D1033-26p1-3D5-26p2-3D2009&d=DwMFaQ&c=aTOVZmpUfPKZuaG9NO7J7Mh6imZbfhL47t9CpZ-pCOw&r=RLUU7gQynM_GwGu2QR7zHw&m=0c8CuLZZzM3R7PecCmFPYLuPYEOtCJHYTIGjSgIPaWU&s=6OCBXxzOIJfra2Pewq_p-l2pY3MyKnuG-TLr7M1xq-s&e=
> > _______________________________________________
> > cfe-dev mailing list
> > cfe-dev at lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190629/725df998/attachment.html>