[cfe-dev] Comparison of 2 schemes to implement OpenMP 5.0 declare mapper codegen

Lingda Li via cfe-dev cfe-dev at lists.llvm.org
Fri Jun 28 06:59:05 PDT 2019


On Fri, Jun 28, 2019 at 9:49 AM Li, Lingda <lli at bnl.gov> wrote:

> I don't think we can have the buffer allocated within the mapper function.
> It has to be done in the runtime, because of nested mappers.
> First, all mapper functions are born in the same way. We cannot make the
> outer most mapper function allocate memory, whether the inner one doesn't
> and has to use what is allocated by the outer most mapper function.
> I suppose we still need to allocate memory in the runtime, so the runtime
> can pass the pointer and size to the mapper function, and the outer mapper
> function can then pass them into inner ones.
> Again, this is just like the current implementation, except that we don't
> use vecter::push_back(), instead we use something like a manual
> implementation of vector::push_back() (because we need to use the pointer
> and the current index)
>
> I believe the key question here is whether it is true that (the overhead
> of push_back() > the overhead of precalculating the total number + the
> memory allocation overhead + directly memory write). This will decide
> whether this change is necessary. Any opinions?
>
> Thanks,
> Lingda Li
> ------------------------------
> *From:* Alexey Bataev <Alexey.Bataev at ibm.com>
> *Sent:* Thursday, June 27, 2019 5:05 PM
> *To:* Li, Lingda
> *Cc:* Alexandre Eichenberger; Chapman, Barbara (Contact); Kevin K
> O'Brien; Carlo Bertolli; Deepak Eachempati; Denny, Joel E.; David Oehmke;
> Ettore Tiotto; fraggamuffin at gmail.com; Rokos, Georgios; Gheorghe-Teod
> Bercea; gregory.rodgers at amd.com; Hal Finkel; Sharif, Hashim; Cownie,
> James H; Sjodin, Jan; jbeyer at nvidia.com; Doerfert, Johannes Rudolf;
> Jones, Jeff C; josem at udel.edu; Robichaux, Joseph; Jeff Heath;
> khaldi.dounia at gmail.com; Kelvin Li; Bobrovsky, Konstantin S; Kotsifakou,
> Maria; lopezmg at ornl.org; Lopez, Matthew Graham; Menard, Lorri; Martin
> Kong; Sarah McNamara; Rice, Michael P; Matt Martineau; oscar at ornl.gov;
> Jeeva Paudel; Rao, Premanand M; Krishnaiyer, Rakesh; Narayanaswamy, Ravi;
> Monteleone, Robert; Lieberman, Ron; Samuel Antao; Jeffrey Sandoval; Sunita
> Chandrasekaran; sergey.y.ostanevich at gmail.com; Sergio Pino Gallardo;
> Dmitriev, Serguei N; Chan, SiuChi; Sunil Shrestha; Wilmarth, Terry L;
> Tianyi Zhang; vadve at illinois.edu; Wang Chen; Wael Yehia; Tian, Xinmin
> *Subject:* Re: Re: Re: RE: Comparison of 2 schemes to implement OpenMP
> 5.0 declare mapper codegen
>
> Yes, we need 2 functions, but thw first one can be optimized very
> effectively. After the optimizations and inlining it will end up with just
> return s1+s2+s3... I think, inost cases those sizes will be constant, since
> the mapper maps constant number of elements. And, thus, this expression
> will be optimized to just a constant value.
> You don't need to pass these functions to runtime. We can call the
> directly from the compiler.
> 1st call: get number of elements.
> 2nd: allocate the buffer
> 3rd call: call mapper with this preallocated buffer that fills this buffer
> without any calls of the runtime functions.
> 4th call: call the runtime to pass the buffer to the runtime.
>
> Best regards,
> Alexey Bataev
>
> 27 июня 2019 г., в 16:53, Li, Lingda <lli at bnl.gov> написал(а):
>
> If we precalculate the size, first, it means we need to generate 2
> functions for each mapper, rather than 1 now. One for mapping information
> filling as we have, the other for size calculation (This will not return
> constant values, because size depends on how many instances we are
> mapping). Both these 2 functions will need to be passed to the runtime. The
> runtime will need to precalculate the number of components first, then
> allocate memory, then call the mapper function to fill it up.
>
> Compared with the scheme 1, the differences are:
> 1) An extra call to calculate the total number, while scheme 1 does not;
> 2) A preallocated buffer, whose pointer and the current number should be
> passed to the mapper function, then the mapper function uses them to fill
> components, while scheme 1 uses push_back() to do the same thing.
>
> Is there really a benefit doing this? push_back() should be efficient
> enough compared with directly writing to memory.
>
> If people here think that, the overhead of push_back() > the overhead of
> precalculating the total number + the memory allocation overhead + directly
> memory write, then we can consider this scheme.
>
> Thanks,
> Lingda Li
>
>
> ------------------------------
> *From:* Alexey Bataev <Alexey.Bataev at ibm.com>
> *Sent:* Thursday, June 27, 2019 4:26 PM
> *To:* Li, Lingda
> *Cc:* Alexandre Eichenberger; Chapman, Barbara (Contact); Kevin K
> O'Brien; Carlo Bertolli; Deepak Eachempati; Denny, Joel E.; David Oehmke;
> Ettore Tiotto; fraggamuffin at gmail.com; Rokos, Georgios; Gheorghe-Teod
> Bercea; gregory.rodgers at amd.com; Hal Finkel; Sharif, Hashim; Cownie,
> James H; Sjodin, Jan; jbeyer at nvidia.com; Doerfert, Johannes Rudolf;
> Jones, Jeff C; josem at udel.edu; Robichaux, Joseph; Jeff Heath;
> khaldi.dounia at gmail.com; Kelvin Li; Bobrovsky, Konstantin S; Kotsifakou,
> Maria; lopezmg at ornl.org; Lopez, Matthew Graham; Menard, Lorri; Martin
> Kong; Sarah McNamara; Rice, Michael P; Matt Martineau; oscar at ornl.gov;
> Jeeva Paudel; Rao, Premanand M; Krishnaiyer, Rakesh; Narayanaswamy, Ravi;
> Monteleone, Robert; Lieberman, Ron; Samuel Antao; Jeffrey Sandoval; Sunita
> Chandrasekaran; sergey.y.ostanevich at gmail.com; Sergio Pino Gallardo;
> Dmitriev, Serguei N; Chan, SiuChi; Sunil Shrestha; Wilmarth, Terry L;
> Tianyi Zhang; vadve at illinois.edu; Wang Chen; Wael Yehia; Tian, Xinmin
> *Subject:* Re: Re: RE: Comparison of 2 schemes to implement OpenMP 5.0
> declare mapper codegen
>
>
> If the functions are inlined (the ines, intended for size precalculation).
> They can be optimized out very effectively since in most cases they will
> return constant values.
> If we could do this, we won't need vectors and oush_backs, we can use
> preallocated memory and internal counter.
> --------------
> Best regards,
> Alexey Bataev
>
> <graycol.gif>"Li, Lingda" ---06/27/2019 04:13:03 PM---Hi Alexey, I think
> that's why we choose to use variable size storage like std::vector to store
> the m
>
> From: "Li, Lingda" <lli at bnl.gov>
> To: Alexey Bataev <Alexey.Bataev at ibm.com>, Deepak Eachempati <
> deachempat at cray.com>
> Cc: "Narayanaswamy, Ravi" <ravi.narayanaswamy at intel.com>, "Alexandre
> Eichenberger" <alexe at us.ibm.com>, "Chapman, Barbara (Contact)" <
> barbara.chapman at stonybrook.edu>, "Bobrovsky, Konstantin S" <
> konstantin.s.bobrovsky at intel.com>, Carlo Bertolli <cbertol at us.ibm.com>,
> "Chan, SiuChi" <siuchi.chan at amd.com>, "Cownie, James H" <
> james.h.cownie at intel.com>, David Oehmke <doehmke at cray.com>, "Denny, Joel
> E." <dennyje at ornl.gov>, "Dmitriev, Serguei N" <
> serguei.n.dmitriev at intel.com>, "Doerfert, Johannes Rudolf" <
> jdoerfert at anl.gov>, Ettore Tiotto <etiotto at ca.ibm.com>, "
> fraggamuffin at gmail.com" <fraggamuffin at gmail.com>, Gheorghe-Teod Bercea <
> Gheorghe-Teod.Bercea at ibm.com>, Hal Finkel <hfinkel at anl.gov>, "
> jbeyer at nvidia.com" <jbeyer at nvidia.com>, Jeeva Paudel <pjeeva01 at ca.ibm.com>,
> Jeff Heath <jrheath at ca.ibm.com>, Jeffrey Sandoval <sandoval at cray.com>,
> "Jones, Jeff C" <jeff.c.jones at intel.com>, "josem at udel.edu" <josem at udel.edu>,
> Kelvin Li <kli at ca.ibm.com>, "Kevin K O'Brien" <caomhin at us.ibm.com>, "
> khaldi.dounia at gmail.com" <khaldi.dounia at gmail.com>, "Kotsifakou, Maria" <
> kotsifa2 at illinois.edu>, "Krishnaiyer, Rakesh" <
> rakesh.krishnaiyer at intel.com>, "Lieberman, Ron" <Ron.Lieberman at amd.com>,
> "Lopez, Matthew Graham" <lopezmg at ornl.gov>, "lopezmg at ornl.org" <
> lopezmg at ornl.org>, Martin Kong <martin.richard.kong at gmail.com>, Matt
> Martineau <m.martineau at bristol.ac.uk>, "Menard, Lorri" <
> lorri.menard at intel.com>, "Monteleone, Robert" <robert.monteleone at intel.com>,
> "oscar at ornl.gov" <oscar at ornl.gov>, "Rao, Premanand M" <
> premanand.m.rao at intel.com>, "Rice, Michael P" <michael.p.rice at intel.com>,
> "Robichaux, Joseph" <joseph.robichaux at intel.com>, "gregory.rodgers at amd.com"
> <gregory.rodgers at amd.com>, "Rokos, Georgios" <georgios.rokos at intel.com>,
> Samuel Antao <Samuel.Antao at ibm.com>, "Sarah McNamara" <mcnamara at ca.ibm.com>,
> "sergey.y.ostanevich at gmail.com" <sergey.y.ostanevich at gmail.com>, Sergio
> Pino Gallardo <sergiop at udel.edu>, "Sharif, Hashim" <hsharif3 at illinois.edu>,
> "Sjodin, Jan" <Jan.Sjodin at amd.com>, Sunil Shrestha <sshrestha at cray.com>,
> Sunita Chandrasekaran <schandra at udel.edu>, "Tian, Xinmin" <
> xinmin.tian at intel.com>, Tianyi Zhang <tzhan18 at lsu.edu>, "
> vadve at illinois.edu" <vadve at illinois.edu>, Wael Yehia <wyehia at ca.ibm.com>,
> Wang Chen <wdchen at ca.ibm.com>, "Wilmarth, Terry L" <
> terry.l.wilmarth at intel.com>
> Date: 06/27/2019 04:13 PM
> Subject: [EXTERNAL] Re: RE: Comparison of 2 schemes to implement OpenMP
> 5.0 declare mapper codegen
> ------------------------------
>
>
>
> Hi Alexey,
>
> I think that's why we choose to use variable size storage like std::vector
> to store the mapping information at the first place, right? It'll be costly
> to precalculate the total number of components, especially in the presence
> of nested mappers. Besides, a runtime function call is just a
> std::vector::push, so I think it's okay to have multiple function calls.
>
> Thanks,
> Lingda Li
> ------------------------------
>
> *From:* Alexey Bataev <Alexey.Bataev at ibm.com>
> * Sent:* Thursday, June 27, 2019 3:52 PM
> * To:* Deepak Eachempati
> * Cc:* Li, Lingda; Narayanaswamy, Ravi; Alexandre Eichenberger; Chapman,
> Barbara (Contact); Bobrovsky, Konstantin S; Carlo Bertolli; Chan, SiuChi;
> Cownie, James H; David Oehmke; Denny, Joel E.; Dmitriev, Serguei N;
> Doerfert, Johannes Rudolf; Ettore Tiotto; fraggamuffin at gmail.com;
> Gheorghe-Teod Bercea; Hal Finkel; jbeyer at nvidia.com; Jeeva Paudel; Jeff
> Heath; Jeffrey Sandoval; Jones, Jeff C; josem at udel.edu; Kelvin Li; Kevin
> K O'Brien; khaldi.dounia at gmail.com; Kotsifakou, Maria; Krishnaiyer,
> Rakesh; Lieberman, Ron; Lopez, Matthew Graham; lopezmg at ornl.org; Martin
> Kong; Matt Martineau; Menard, Lorri; Monteleone, Robert; oscar at ornl.gov;
> Rao, Premanand M; Rice, Michael P; Robichaux, Joseph;
> gregory.rodgers at amd.com; Rokos, Georgios; Samuel Antao; Sarah McNamara;
> sergey.y.ostanevich at gmail.com; Sergio Pino Gallardo; Sharif, Hashim;
> Sjodin, Jan; Sunil Shrestha; Sunita Chandrasekaran; Tian, Xinmin; Tianyi
> Zhang; vadve at illinois.edu; Wael Yehia; Wang Chen; Wilmarth, Terry L
> * Subject:* Re: RE: Comparison of 2 schemes to implement OpenMP 5.0
> declare mapper codegen
>
> Lingda, can we in scheme 1 precalculate the total number of components,
> allocate memory for these precalculate number of elements, then fill it
> with mappers and only after that call the runtime function (only once!) to
> transfer the mappings to the runtime?
>
> Best regards,
> Alexey Bataev
>
> 27 июня 2019 г., в 15:44, Deepak Eachempati <*deachempat at cray.com*
> <deachempat at cray.com>> написал(а):
>
>    Got it. Thanks.
>
>       -- Deepak
>
>       *From:* Li, Lingda [*mailto:lli at bnl.gov* <lli at bnl.gov>]
> * Sent:* Thursday, June 27, 2019 2:41 PM
> * To:* Deepak Eachempati <*deachempat at cray.com* <deachempat at cray.com>>;
>       Narayanaswamy, Ravi <*ravi.narayanaswamy at intel.com*
>       <ravi.narayanaswamy at intel.com>>; 'Alexandre Eichenberger' <
>       *alexe at us.ibm.com* <alexe at us.ibm.com>>; 'Alexey Bataev' <
>       *Alexey.Bataev at ibm.com* <Alexey.Bataev at ibm.com>>; Chapman, Barbara
>       (Contact) <*barbara.chapman at stonybrook.edu*
>       <barbara.chapman at stonybrook.edu>>; Bobrovsky, Konstantin S <
>       *konstantin.s.bobrovsky at intel.com*
>       <konstantin.s.bobrovsky at intel.com>>; 'Carlo Bertolli' <
>       *cbertol at us.ibm.com* <cbertol at us.ibm.com>>; 'Chan, SiuChi' <
>       *siuchi.chan at amd.com* <siuchi.chan at amd.com>>; Cownie, James H <
>       *james.h.cownie at intel.com* <james.h.cownie at intel.com>>; David
>       Oehmke <*doehmke at cray.com* <doehmke at cray.com>>; 'Denny, Joel E.' <
>       *dennyje at ornl.gov* <dennyje at ornl.gov>>; Dmitriev, Serguei N <
>       *serguei.n.dmitriev at intel.com* <serguei.n.dmitriev at intel.com>>;
>       Doerfert, Johannes Rudolf <*jdoerfert at anl.gov* <jdoerfert at anl.gov>>;
>       'Ettore Tiotto' <*etiotto at ca.ibm.com* <etiotto at ca.ibm.com>>; '
>       *fraggamuffin at gmail.com* <fraggamuffin at gmail.com>' <
>       *fraggamuffin at gmail.com* <fraggamuffin at gmail.com>>; 'Gheorghe-Teod
>       Bercea' <*Gheorghe-Teod.Bercea at ibm.com*
>       <Gheorghe-Teod.Bercea at ibm.com>>; Hal Finkel <*hfinkel at anl.gov*
>       <hfinkel at anl.gov>>; '*jbeyer at nvidia.com* <jbeyer at nvidia.com>' <
>       *jbeyer at nvidia.com* <jbeyer at nvidia.com>>; 'Jeeva Paudel' <
>       *pjeeva01 at ca.ibm.com* <pjeeva01 at ca.ibm.com>>; 'Jeff Heath' <
>       *jrheath at ca.ibm.com* <jrheath at ca.ibm.com>>; Jeffrey Sandoval <
>       *sandoval at cray.com* <sandoval at cray.com>>; Jones, Jeff C <
>       *jeff.c.jones at intel.com* <jeff.c.jones at intel.com>>; '
>       *josem at udel.edu* <josem at udel.edu>' <*josem at udel.edu*
>       <josem at udel.edu>>; 'Kelvin Li' <*kli at ca.ibm.com* <kli at ca.ibm.com>>;
>       'Kevin K O'Brien' <*caomhin at us.ibm.com* <caomhin at us.ibm.com>>; '
>       *khaldi.dounia at gmail.com* <khaldi.dounia at gmail.com>' <
>       *khaldi.dounia at gmail.com* <khaldi.dounia at gmail.com>>; 'Kotsifakou,
>       Maria' <*kotsifa2 at illinois.edu* <kotsifa2 at illinois.edu>>;
>       Krishnaiyer, Rakesh <*rakesh.krishnaiyer at intel.com*
>       <rakesh.krishnaiyer at intel.com>>; Lieberman, Ron <
>       *Ron.Lieberman at amd.com* <Ron.Lieberman at amd.com>>; 'Lopez, Matthew
>       Graham' <*lopezmg at ornl.gov* <lopezmg at ornl.gov>>; '*lopezmg at ornl.org*
>       <lopezmg at ornl.org>' <*lopezmg at ornl.org* <lopezmg at ornl.org>>;
>       'Martin Kong' <*martin.richard.kong at gmail.com*
>       <martin.richard.kong at gmail.com>>; 'Matt Martineau' <
>       *m.martineau at bristol.ac.uk* <m.martineau at bristol.ac.uk>>; Menard,
>       Lorri <*lorri.menard at intel.com* <lorri.menard at intel.com>>;
>       Monteleone, Robert <*robert.monteleone at intel.com*
>       <robert.monteleone at intel.com>>; *oscar at ornl.gov* <oscar at ornl.gov>;
>       Rao, Premanand M <*premanand.m.rao at intel.com*
>       <premanand.m.rao at intel.com>>; Rice, Michael P <
>       *michael.p.rice at intel.com* <michael.p.rice at intel.com>>; Robichaux,
>       Joseph <*joseph.robichaux at intel.com* <joseph.robichaux at intel.com>>;
>       *gregory.rodgers at amd.com* <gregory.rodgers at amd.com>; Rokos,
>       Georgios <*georgios.rokos at intel.com* <georgios.rokos at intel.com>>; '
>       *samuel.antao at ibm.com* <samuel.antao at ibm.com>' <
>       *samuel.antao at ibm.com* <samuel.antao at ibm.com>>; 'Sarah McNamara' <
>       *mcnamara at ca.ibm.com* <mcnamara at ca.ibm.com>>; '
>       *sergey.y.ostanevich at gmail.com* <sergey.y.ostanevich at gmail.com>' <
>       *sergey.y.ostanevich at gmail.com* <sergey.y.ostanevich at gmail.com>>;
>       'Sergio Pino Gallardo' <*sergiop at udel.edu* <sergiop at udel.edu>>;
>       'Sharif, Hashim' <*hsharif3 at illinois.edu* <hsharif3 at illinois.edu>>;
>       Sjodin, Jan <*Jan.Sjodin at amd.com* <Jan.Sjodin at amd.com>>; Sunil
>       Shrestha <*sshrestha at cray.com* <sshrestha at cray.com>>; 'Sunita
>       Chandrasekaran' <*schandra at udel.edu* <schandra at udel.edu>>; Tian,
>       Xinmin <*xinmin.tian at intel.com* <xinmin.tian at intel.com>>; Tianyi
>       Zhang <*tzhan18 at lsu.edu* <tzhan18 at lsu.edu>>; '*vadve at illinois.edu*
>       <vadve at illinois.edu>' <*vadve at illinois.edu* <vadve at illinois.edu>>;
>       'Wael Yehia' <*wyehia at ca.ibm.com* <wyehia at ca.ibm.com>>; 'Wang Chen'
>       <*wdchen at ca.ibm.com* <wdchen at ca.ibm.com>>; Wilmarth, Terry L <
>       *terry.l.wilmarth at intel.com* <terry.l.wilmarth at intel.com>>
> * Subject:* Re: Comparison of 2 schemes to implement OpenMP 5.0 declare
>       mapper codegen
>
>       In the current scheme, all mappings within a mapper function is
>       done atomically by one thread. In the mapper function of the example in the
>       original email, <push> will just push the mapping information into an
>       internal data structure. Once all mapping information is available, the
>       runtime will do the real mapping together. For your example, the behavior
>       is the same as the code below:
>
>       ...
>       #pragma omp parallel num_threads(2)
>       {
>       if (omp_get_thread_num() == 0) {
>       #pragma omp target map(s.x, s.p[0:s.x])
>       {
>       for (int i = 0; i < s.x; i++) s.p[i] = i;
>       }
>       } else {
>       #pragma omp target map(other_data)
>       {
>       // work on other_data
>       }
>       }
>       ...
>       ------------------------------
>       *From:* Deepak Eachempati <*deachempat at cray.com*
>       <deachempat at cray.com>>
> * Sent:* Thursday, June 27, 2019 3:34 PM
> * To:* Li, Lingda; Narayanaswamy, Ravi; 'Alexandre Eichenberger'; 'Alexey
>       Bataev'; Chapman, Barbara (Contact); Bobrovsky, Konstantin S; 'Carlo
>       Bertolli'; 'Chan, SiuChi'; Cownie, James H; David Oehmke; 'Denny, Joel E.';
>       Dmitriev, Serguei N; Doerfert, Johannes Rudolf ; 'Ettore Tiotto'; '
>       *fraggamuffin at gmail.com* <fraggamuffin at gmail.com>'; 'Gheorghe-Teod
>       Bercea'; Hal Finkel; '*jbeyer at nvidia.com* <jbeyer at nvidia.com>';
>       'Jeeva Paudel'; 'Jeff Heath'; Jeffrey Sandoval; Jones, Jeff C; '
>       *josem at udel.edu* <josem at udel.edu>'; 'Kelvin Li'; 'Kevin K O'Brien';
>       '*khaldi.dounia at gmail.com* <khaldi.dounia at gmail.com>'; 'Kotsifakou,
>       Maria'; Krishnaiyer, Rakesh; Lieberman, Ron ; 'Lopez, Matthew Graham'; '
>       *lopezmg at ornl.org* <lopezmg at ornl.org>'; 'Martin Kong'; 'Matt
>       Martineau'; Menard, Lorri; Monteleone, Robert; *oscar at ornl.gov*
>       <oscar at ornl.gov>; Rao, Premanand M; Rice, Michael P; Robichaux,
>       Joseph; *gregory.rodgers at amd.com* <gregory.rodgers at amd.com>; Rokos,
>       Georgios; '*samuel.antao at ibm.com* <samuel.antao at ibm.com>'; 'Sarah
>       McNamara'; '*sergey.y.ostanevich at gmail.com*
>       <sergey.y.ostanevich at gmail.com>'; 'Sergio Pino Gallardo'; 'Sharif,
>       Hashim'; Sjodin, Jan ; Sunil Shrestha; 'Sunita Chandrasekaran'; Tian,
>       Xinmin; Tianyi Zhang; '*vadve at illinois.edu* <vadve at illinois.edu>';
>       'Wael Yehia'; 'Wang Chen'; Wilmarth, Terry L
> * Subject:* RE: Comparison of 2 schemes to implement OpenMP 5.0 declare
>       mapper codegen
>
>       I was referring to something like this, where another thread is not
>       trying to map the same data:
>
>       #pragma omp declare mapper(S s) map(s.x) map(s.p[0:s.x])
>       S s;
>       ...
>       #pragma omp parallel num_threads(2)
>       {
>       if (omp_get_thread_num() == 0) {
>       #pragma omp target map(s)
>       {
>       for (int i = 0; i < s.x; i++) s.p[i] = i;
>       }
>       } else {
>       #pragma omp target map(other_data)
>       {
>       // work on other_data
>       }
>       }
>       ...
>
>       Since I believe you are mapping s.x and s.p as separate map
>       operations, it is possible that another thread could map ‘other_data’ in
>       between those two maps. If this happens, will your implementation still
>       ensure that s.x and s.p are positioned at the right offsets with respect to
>       the same base address (&s)?
>
>       -- Deepak
>
>       *From:* Li, Lingda [*mailto:lli at bnl.gov* <lli at bnl.gov>]
> * Sent:* Thursday, June 27, 2019 2:26 PM
> * To:* Deepak Eachempati <*deachempat at cray.com* <deachempat at cray.com>>;
>       Narayanaswamy, Ravi <*ravi.narayanaswamy at intel.com*
>       <ravi.narayanaswamy at intel.com>>; 'Alexandre Eichenberger' <
>       *alexe at us.ibm.com* <alexe at us.ibm.com>>; 'Alexey Bataev' <
>       *Alexey.Bataev at ibm.com* <Alexey.Bataev at ibm.com>>; Chapman, Barbara
>       (Contact) <*barbara.chapman at stonybrook.edu*
>       <barbara.chapman at stonybrook.edu>>; Bobrovsky, Konstantin S <
>       *konstantin.s.bobrovsky at intel.com*
>       <konstantin.s.bobrovsky at intel.com>>; 'Carlo Bertolli' <
>       *cbertol at us.ibm.com* <cbertol at us.ibm.com>>; 'Chan, SiuChi' <
>       *siuchi.chan at amd.com* <siuchi.chan at amd.com>>; Cownie, James H <
>       *james.h.cownie at intel.com* <james.h.cownie at intel.com>>; David
>       Oehmke <*doehmke at cray.com* <doehmke at cray.com>>; 'Denny, Joel E.' <
>       *dennyje at ornl.gov* <dennyje at ornl.gov>>; Dmitriev, Serguei N <
>       *serguei.n.dmitriev at intel.com* <serguei.n.dmitriev at intel.com>>;
>       Doerfert, Johannes Rudolf <*jdoerfert at anl.gov* <jdoerfert at anl.gov>>;
>       'Ettore Tiotto' <*etiotto at ca.ibm.com* <etiotto at ca.ibm.com>>; '
>       *fraggamuffin at gmail.com* <fraggamuffin at gmail.com>' <
>       *fraggamuffin at gmail.com* <fraggamuffin at gmail.com>>; 'Gheorghe-Teod
>       Bercea' <*Gheorghe-Teod.Bercea at ibm.com*
>       <Gheorghe-Teod.Bercea at ibm.com>>; Hal Finkel <*hfinkel at anl.gov*
>       <hfinkel at anl.gov>>; '*jbeyer at nvidia.com* <jbeyer at nvidia.com>' <
>       *jbeyer at nvidia.com* <jbeyer at nvidia.com>>; 'Jeeva Paudel' <
>       *pjeeva01 at ca.ibm.com* <pjeeva01 at ca.ibm.com>>; 'Jeff Heath' <
>       *jrheath at ca.ibm.com* <jrheath at ca.ibm.com>>; Jeffrey Sandoval <
>       *sandoval at cray.com* <sandoval at cray.com>>; Jones, Jeff C <
>       *jeff.c.jones at intel.com* <jeff.c.jones at intel.com>>; '
>       *josem at udel.edu* <josem at udel.edu>' <*josem at udel.edu*
>       <josem at udel.edu>>; 'Kelvin Li' <*kli at ca.ibm.com* <kli at ca.ibm.com>>;
>       'Kevin K O'Brien' <*caomhin at us.ibm.com* <caomhin at us.ibm.com>>; '
>       *khaldi.dounia at gmail.com* <khaldi.dounia at gmail.com>' <
>       *khaldi.dounia at gmail.com* <khaldi.dounia at gmail.com>>; 'Kotsifakou,
>       Maria' <*kotsifa2 at illinois.edu* <kotsifa2 at illinois.edu>>;
>       Krishnaiyer, Rakesh <*rakesh.krishnaiyer at intel.com*
>       <rakesh.krishnaiyer at intel.com>>; Lieberman, Ron <
>       *Ron.Lieberman at amd.com* <Ron.Lieberman at amd.com>>; 'Lopez, Matthew
>       Graham' <*lopezmg at ornl.gov* <lopezmg at ornl.gov>>; '*lopezmg at ornl.org*
>       <lopezmg at ornl.org>' <*lopezmg at ornl.org* <lopezmg at ornl.org>>;
>       'Martin Kong' <*martin.richard.kong at gmail.com*
>       <martin.richard.kong at gmail.com>>; 'Matt Martineau' <
>       *m.martineau at bristol.ac.uk* <m.martineau at bristol.ac.uk>>; Menard,
>       Lorri <*lorri.menard at intel.com* <lorri.menard at intel.com>>;
>       Monteleone, Robert <*robert.monteleone at intel.com*
>       <robert.monteleone at intel.com>>; *oscar at ornl.gov* <oscar at ornl.gov>;
>       Rao, Premanand M <*premanand.m.rao at intel.com*
>       <premanand.m.rao at intel.com>>; Rice, Michael P <
>       *michael.p.rice at intel.com* <michael.p.rice at intel.com>>; Robichaux,
>       Joseph <*joseph.robichaux at intel.com* <joseph.robichaux at intel.com>>;
>       *gregory.rodgers at amd.com* <gregory.rodgers at amd.com>; Rokos,
>       Georgios <*georgios.rokos at intel.com* <georgios.rokos at intel.com>>; '
>       *samuel.antao at ibm.com* <samuel.antao at ibm.com>' <
>       *samuel.antao at ibm.com* <samuel.antao at ibm.com>>; 'Sarah McNamara' <
>       *mcnamara at ca.ibm.com* <mcnamara at ca.ibm.com>>; '
>       *sergey.y.ostanevich at gmail.com* <sergey.y.ostanevich at gmail.com>' <
>       *sergey.y.ostanevich at gmail.com* <sergey.y.ostanevich at gmail.com>>;
>       'Sergio Pino Gallardo' <*sergiop at udel.edu* <sergiop at udel.edu>>;
>       'Sharif, Hashim' <*hsharif3 at illinois.edu* <hsharif3 at illinois.edu>>;
>       Sjodin, Jan <*Jan.Sjodin at amd.com* <Jan.Sjodin at amd.com>>; Sunil
>       Shrestha <*sshrestha at cray.com* <sshrestha at cray.com>>; 'Sunita
>       Chandrasekaran' <*schandra at udel.edu* <schandra at udel.edu>>; Tian,
>       Xinmin <*xinmin.tian at intel.com* <xinmin.tian at intel.com>>; Tianyi
>       Zhang <*tzhan18 at lsu.edu* <tzhan18 at lsu.edu>>; '*vadve at illinois.edu*
>       <vadve at illinois.edu>' <*vadve at illinois.edu* <vadve at illinois.edu>>;
>       'Wael Yehia' <*wyehia at ca.ibm.com* <wyehia at ca.ibm.com>>; 'Wang Chen'
>       <*wdchen at ca.ibm.com* <wdchen at ca.ibm.com>>; Wilmarth, Terry L <
>       *terry.l.wilmarth at intel.com* <terry.l.wilmarth at intel.com>>
> * Subject:* Re: Comparison of 2 schemes to implement OpenMP 5.0 declare
>       mapper codegen
>
>       When 2 threads try to concurrently map the same data, it behaves
>       the same as when 2 threads concurrently map the same data using map
>       clauses, and mappers don't introduce extra considerations here. For
>       instance, both threads use #omp target enter data concurrently.
>
>       When 2 threads concurrently maps the same data, my understanding
>       based on the current code is, it will create 2 copies of the same data,
>       either copy is correctly to use. It may have a problem when both copies are
>       mapped back if not synchronized correctly, but this is a programming issue,
>       not the responsibility of OpenMP.
>
>       Thanks,
>       Lingda Li
>       ------------------------------
>       *From:* Deepak Eachempati <*deachempat at cray.com*
>       <deachempat at cray.com>>
> * Sent:* Thursday, June 27, 2019 3:17 PM
> * To:* Li, Lingda; Narayanaswamy, Ravi; 'Alexandre Eichenberger'; 'Alexey
>       Bataev'; Chapman, Barbara (Contact); Bobrovsky, Konstantin S; 'Carlo
>       Bertolli'; 'Chan, SiuChi'; Cownie, James H; David Oehmke; 'Denny, Joel E.';
>       Dmitriev, Serguei N; Doerfert, Johannes Rudolf ; 'Ettore Tiotto'; '
>       *fraggamuffin at gmail.com* <fraggamuffin at gmail.com>'; 'Gheorghe-Teod
>       Bercea'; Hal Finkel; '*jbeyer at nvidia.com* <jbeyer at nvidia.com>';
>       'Jeeva Paudel'; 'Jeff Heath'; Jeffrey Sandoval; Jones, Jeff C; '
>       *josem at udel.edu* <josem at udel.edu>'; 'Kelvin Li'; 'Kevin K O'Brien';
>       '*khaldi.dounia at gmail.com* <khaldi.dounia at gmail.com>'; 'Kotsifakou,
>       Maria'; Krishnaiyer, Rakesh; Lieberman, Ron ; 'Lopez, Matthew Graham'; '
>       *lopezmg at ornl.org* <lopezmg at ornl.org>'; 'Martin Kong'; 'Matt
>       Martineau'; Menard, Lorri; Monteleone, Robert; *oscar at ornl.gov*
>       <oscar at ornl.gov>; Rao, Premanand M; Rice, Michael P; Robichaux,
>       Joseph; *gregory.rodgers at amd.com* <gregory.rodgers at amd.com>; Rokos,
>       Georgios; '*samuel.antao at ibm.com* <samuel.antao at ibm.com>'; 'Sarah
>       McNamara'; '*sergey.y.ostanevich at gmail.com*
>       <sergey.y.ostanevich at gmail.com>'; 'Sergio Pino Gallardo'; 'Sharif,
>       Hashim'; Sjodin, Jan ; Sunil Shrestha; 'Sunita Chandrasekaran'; Tian,
>       Xinmin; Tianyi Zhang; '*vadve at illinois.edu* <vadve at illinois.edu>';
>       'Wael Yehia'; 'Wang Chen'; Wilmarth, Terry L
> * Subject:* RE: Comparison of 2 schemes to implement OpenMP 5.0 declare
>       mapper codegen
>
>       Thanks.
>
>       Is it possible for another thread to be concurrently mapped
>       something else while the maps from the mapper function are taking place? If
>       so, how do you guarantee that the allocation for each component will get
>       you the right addresses in device memory? Sorry if this was covered before
>       and I missed it.
>
>       -- Deepak
>
>       *From:* Li, Lingda [*mailto:lli at bnl.gov* <lli at bnl.gov>]
> * Sent:* Thursday, June 27, 2019 2:08 PM
> * To:* Deepak Eachempati <*deachempat at cray.com* <deachempat at cray.com>>;
>       Narayanaswamy, Ravi <*ravi.narayanaswamy at intel.com*
>       <ravi.narayanaswamy at intel.com>>; 'Alexandre Eichenberger' <
>       *alexe at us.ibm.com* <alexe at us.ibm.com>>; 'Alexey Bataev' <
>       *Alexey.Bataev at ibm.com* <Alexey.Bataev at ibm.com>>; Chapman, Barbara
>       (Contact) <*barbara.chapman at stonybrook.edu*
>       <barbara.chapman at stonybrook.edu>>; Bobrovsky, Konstantin S <
>       *konstantin.s.bobrovsky at intel.com*
>       <konstantin.s.bobrovsky at intel.com>>; 'Carlo Bertolli' <
>       *cbertol at us.ibm.com* <cbertol at us.ibm.com>>; 'Chan, SiuChi' <
>       *siuchi.chan at amd.com* <siuchi.chan at amd.com>>; Cownie, James H <
>       *james.h.cownie at intel.com* <james.h.cownie at intel.com>>; David
>       Oehmke <*doehmke at cray.com* <doehmke at cray.com>>; 'Denny, Joel E.' <
>       *dennyje at ornl.gov* <dennyje at ornl.gov>>; Dmitriev, Serguei N <
>       *serguei.n.dmitriev at intel.com* <serguei.n.dmitriev at intel.com>>;
>       Doerfert, Johannes Rudolf <*jdoerfert at anl.gov* <jdoerfert at anl.gov>>;
>       'Ettore Tiotto' <*etiotto at ca.ibm.com* <etiotto at ca.ibm.com>>; '
>       *fraggamuffin at gmail.com* <fraggamuffin at gmail.com>' <
>       *fraggamuffin at gmail.com* <fraggamuffin at gmail.com>>; 'Gheorghe-Teod
>       Bercea' <*Gheorghe-Teod.Bercea at ibm.com*
>       <Gheorghe-Teod.Bercea at ibm.com>>; Hal Finkel <*hfinkel at anl.gov*
>       <hfinkel at anl.gov>>; '*jbeyer at nvidia.com* <jbeyer at nvidia.com>' <
>       *jbeyer at nvidia.com* <jbeyer at nvidia.com>>; 'Jeeva Paudel' <
>       *pjeeva01 at ca.ibm.com* <pjeeva01 at ca.ibm.com>>; 'Jeff Heath' <
>       *jrheath at ca.ibm.com* <jrheath at ca.ibm.com>>; Jeffrey Sandoval <
>       *sandoval at cray.com* <sandoval at cray.com>>; Jones, Jeff C <
>       *jeff.c.jones at intel.com* <jeff.c.jones at intel.com>>; '
>       *josem at udel.edu* <josem at udel.edu>' <*josem at udel.edu*
>       <josem at udel.edu>>; 'Kelvin Li' <*kli at ca.ibm.com* <kli at ca.ibm.com>>;
>       'Kevin K O'Brien' <*caomhin at us.ibm.com* <caomhin at us.ibm.com>>; '
>       *khaldi.dounia at gmail.com* <khaldi.dounia at gmail.com>' <
>       *khaldi.dounia at gmail.com* <khaldi.dounia at gmail.com>>; 'Kotsifakou,
>       Maria' <*kotsifa2 at illinois.edu* <kotsifa2 at illinois.edu>>;
>       Krishnaiyer, Rakesh <*rakesh.krishnaiyer at intel.com*
>       <rakesh.krishnaiyer at intel.com>>; Lieberman, Ron <
>       *Ron.Lieberman at amd.com* <Ron.Lieberman at amd.com>>; 'Lopez, Matthew
>       Graham' <*lopezmg at ornl.gov* <lopezmg at ornl.gov>>; '*lopezmg at ornl.org*
>       <lopezmg at ornl.org>' <*lopezmg at ornl.org* <lopezmg at ornl.org>>;
>       'Martin Kong' <*martin.richard.kong at gmail.com*
>       <martin.richard.kong at gmail.com>>; 'Matt Martineau' <
>       *m.martineau at bristol.ac.uk* <m.martineau at bristol.ac.uk>>; Menard,
>       Lorri <*lorri.menard at intel.com* <lorri.menard at intel.com>>;
>       Monteleone, Robert <*robert.monteleone at intel.com*
>       <robert.monteleone at intel.com>>; *oscar at ornl.gov* <oscar at ornl.gov>;
>       Rao, Premanand M <*premanand.m.rao at intel.com*
>       <premanand.m.rao at intel.com>>; Rice, Michael P <
>       *michael.p.rice at intel.com* <michael.p.rice at intel.com>>; Robichaux,
>       Joseph <*joseph.robichaux at intel.com* <joseph.robichaux at intel.com>>;
>       *gregory.rodgers at amd.com* <gregory.rodgers at amd.com>; Rokos,
>       Georgios <*georgios.rokos at intel.com* <georgios.rokos at intel.com>>; '
>       *samuel.antao at ibm.com* <samuel.antao at ibm.com>' <
>       *samuel.antao at ibm.com* <samuel.antao at ibm.com>>; 'Sarah McNamara' <
>       *mcnamara at ca.ibm.com* <mcnamara at ca.ibm.com>>; '
>       *sergey.y.ostanevich at gmail.com* <sergey.y.ostanevich at gmail.com>' <
>       *sergey.y.ostanevich at gmail.com* <sergey.y.ostanevich at gmail.com>>;
>       'Sergio Pino Gallardo' <*sergiop at udel.edu* <sergiop at udel.edu>>;
>       'Sharif, Hashim' <*hsharif3 at illinois.edu* <hsharif3 at illinois.edu>>;
>       Sjodin, Jan <*Jan.Sjodin at amd.com* <Jan.Sjodin at amd.com>>; Sunil
>       Shrestha <*sshrestha at cray.com* <sshrestha at cray.com>>; 'Sunita
>       Chandrasekaran' <*schandra at udel.edu* <schandra at udel.edu>>; Tian,
>       Xinmin <*xinmin.tian at intel.com* <xinmin.tian at intel.com>>; Tianyi
>       Zhang <*tzhan18 at lsu.edu* <tzhan18 at lsu.edu>>; '*vadve at illinois.edu*
>       <vadve at illinois.edu>' <*vadve at illinois.edu* <vadve at illinois.edu>>;
>       'Wael Yehia' <*wyehia at ca.ibm.com* <wyehia at ca.ibm.com>>; 'Wang Chen'
>       <*wdchen at ca.ibm.com* <wdchen at ca.ibm.com>>; Wilmarth, Terry L <
>       *terry.l.wilmarth at intel.com* <terry.l.wilmarth at intel.com>>
> * Subject:* Re: Comparison of 2 schemes to implement OpenMP 5.0 declare
>       mapper codegen
>
>       Hi Deepak,
>
>       Yes, it handles this case. The first part of mapper function
>       (initially allocate space for the whole array) is just an optimization, not
>       required for correctness, as suggested by you in an early discussion.
>
>       In your example, s.x and s.p will be allocated separately (not in a
>       single allocation). But Clang guarantees that their addresses will be
>       correct because s.x and s.p share the same base address, which is &s.
>
>       Thanks,
>       Lingda Li
>       ------------------------------
>       *From:* Deepak Eachempati <*deachempat at cray.com*
>       <deachempat at cray.com>>
> * Sent:* Thursday, June 27, 2019 2:49 PM
> * To:* Li, Lingda; Narayanaswamy, Ravi; 'Alexandre Eichenberger'; 'Alexey
>       Bataev'; Chapman, Barbara (Contact); Bobrovsky, Konstantin S; 'Carlo
>       Bertolli'; 'Chan, SiuChi'; Cownie, James H; David Oehmke; 'Denny, Joel E.';
>       Dmitriev, Serguei N; Doerfert, Johannes Rudolf ; '*estotzer at ti.com*
>       <estotzer at ti.com>'; 'Ettore Tiotto'; '*fraggamuffin at gmail.com*
>       <fraggamuffin at gmail.com>'; 'Gheorghe-Teod Bercea'; Hal Finkel; '
>       *jbeyer at nvidia.com* <jbeyer at nvidia.com>'; 'Jeeva Paudel'; 'Jeff
>       Heath'; Jeffrey Sandoval; Jones, Jeff C; '*josem at udel.edu*
>       <josem at udel.edu>'; 'Kelvin Li'; 'Kevin K O'Brien'; '
>       *khaldi.dounia at gmail.com* <khaldi.dounia at gmail.com>'; 'Kotsifakou,
>       Maria'; Krishnaiyer, Rakesh; Lieberman, Ron ; 'Lopez, Matthew Graham'; '
>       *lopezmg at ornl.org* <lopezmg at ornl.org>'; 'Martin Kong'; 'Matt
>       Martineau'; Menard, Lorri; Monteleone, Robert; *oscar at ornl.gov*
>       <oscar at ornl.gov>; Rao, Premanand M; Rice, Michael P; Robichaux,
>       Joseph; *gregory.rodgers at amd.com* <gregory.rodgers at amd.com>; Rokos,
>       Georgios; '*samuel.antao at ibm.com* <samuel.antao at ibm.com>'; 'Sarah
>       McNamara'; '*sergey.y.ostanevich at gmail.com*
>       <sergey.y.ostanevich at gmail.com>'; 'Sergio Pino Gallardo'; 'Sharif,
>       Hashim'; Sjodin, Jan ; Sunil Shrestha; 'Sunita Chandrasekaran'; Tian,
>       Xinmin; Tianyi Zhang; '*vadve at illinois.edu* <vadve at illinois.edu>';
>       'Wael Yehia'; 'Wang Chen'; Wilmarth, Terry L
> * Subject:* RE: Comparison of 2 schemes to implement OpenMP 5.0 declare
>       mapper codegen
>
>       For Scheme 1, it looks like you are doing separate maps for each
>       component when size == 1. It seems like the first and last if statements
>       should have “size >= 1” rather than “size > 1”.
>
>       If the mapper is declared like this:
>
>       struct S {
>       int x;
>       ... // other stuff
>       int *p;
>       };
>
>       #pragma omp declare mapper(S s) map(s.x) map(s.p[0:s.x])
>
>       And you have:
>
>       S s;
>       ...
>       #pragma omp target map(s)
>       {
>       for (int i = 0; i < s.x; i++) s.p[i] = i;
>       }
>
>       Since the target construct is just mapping a single structure of
>       type S, there should be one map that takes care of mapping storage for s.x
>       and s.p with a single allocation, and a separate map for the array section
>       s.p[0:s.x], and finally the pointer attachment of s.p to s.p[0:s.x]. Does
>       Scheme 1 handle this?
>
>       -- Deepak
>
>
>       *From:* Li, Lingda [*mailto:lli at bnl.gov* <lli at bnl.gov>]
> * Sent:* Thursday, June 27, 2019 1:07 PM
> * To:* Narayanaswamy, Ravi <*ravi.narayanaswamy at intel.com*
>       <ravi.narayanaswamy at intel.com>>; 'Alexandre Eichenberger' <
>       *alexe at us.ibm.com* <alexe at us.ibm.com>>; 'Alexey Bataev' <
>       *Alexey.Bataev at ibm.com* <Alexey.Bataev at ibm.com>>; Chapman, Barbara
>       (Contact) <*barbara.chapman at stonybrook.edu*
>       <barbara.chapman at stonybrook.edu>>; Bobrovsky, Konstantin S <
>       *konstantin.s.bobrovsky at intel.com*
>       <konstantin.s.bobrovsky at intel.com>>; 'Carlo Bertolli' <
>       *cbertol at us.ibm.com* <cbertol at us.ibm.com>>; 'Chan, SiuChi' <
>       *siuchi.chan at amd.com* <siuchi.chan at amd.com>>; Cownie, James H <
>       *james.h.cownie at intel.com* <james.h.cownie at intel.com>>; David
>       Oehmke <*doehmke at cray.com* <doehmke at cray.com>>; Deepak Eachempati <
>       *deachempat at cray.com* <deachempat at cray.com>>; 'Denny, Joel E.' <
>       *dennyje at ornl.gov* <dennyje at ornl.gov>>; Dmitriev, Serguei N <
>       *serguei.n.dmitriev at intel.com* <serguei.n.dmitriev at intel.com>>;
>       Doerfert, Johannes Rudolf <*jdoerfert at anl.gov* <jdoerfert at anl.gov>>;
>       '*estotzer at ti.com* <estotzer at ti.com>' <*estotzer at ti.com*
>       <estotzer at ti.com>>; 'Ettore Tiotto' <*etiotto at ca.ibm.com*
>       <etiotto at ca.ibm.com>>; '*fraggamuffin at gmail.com*
>       <fraggamuffin at gmail.com>' <*fraggamuffin at gmail.com*
>       <fraggamuffin at gmail.com>>; 'Gheorghe-Teod Bercea' <
>       *Gheorghe-Teod.Bercea at ibm.com* <Gheorghe-Teod.Bercea at ibm.com>>; Hal
>       Finkel <*hfinkel at anl.gov* <hfinkel at anl.gov>>; '*jbeyer at nvidia.com*
>       <jbeyer at nvidia.com>' <*jbeyer at nvidia.com* <jbeyer at nvidia.com>>;
>       'Jeeva Paudel' <*pjeeva01 at ca.ibm.com* <pjeeva01 at ca.ibm.com>>; 'Jeff
>       Heath' <*jrheath at ca.ibm.com* <jrheath at ca.ibm.com>>; Jeffrey
>       Sandoval <*sandoval at cray.com* <sandoval at cray.com>>; Jones, Jeff C <
>       *jeff.c.jones at intel.com* <jeff.c.jones at intel.com>>; '
>       *josem at udel.edu* <josem at udel.edu>' <*josem at udel.edu*
>       <josem at udel.edu>>; 'Kelvin Li' <*kli at ca.ibm.com* <kli at ca.ibm.com>>;
>       'Kevin K O'Brien' <*caomhin at us.ibm.com* <caomhin at us.ibm.com>>; '
>       *khaldi.dounia at gmail.com* <khaldi.dounia at gmail.com>' <
>       *khaldi.dounia at gmail.com* <khaldi.dounia at gmail.com>>; 'Kotsifakou,
>       Maria' <*kotsifa2 at illinois.edu* <kotsifa2 at illinois.edu>>;
>       Krishnaiyer, Rakesh <*rakesh.krishnaiyer at intel.com*
>       <rakesh.krishnaiyer at intel.com>>; Lieberman, Ron <
>       *Ron.Lieberman at amd.com* <Ron.Lieberman at amd.com>>; Li, Lingda <
>       *lli at bnl.gov* <lli at bnl.gov>>; 'Lopez, Matthew Graham' <
>       *lopezmg at ornl.gov* <lopezmg at ornl.gov>>; '*lopezmg at ornl.org*
>       <lopezmg at ornl.org>' <*lopezmg at ornl.org* <lopezmg at ornl.org>>;
>       'Martin Kong' <*martin.richard.kong at gmail.com*
>       <martin.richard.kong at gmail.com>>; 'Matt Martineau' <
>       *m.martineau at bristol.ac.uk* <m.martineau at bristol.ac.uk>>; Menard,
>       Lorri <*lorri.menard at intel.com* <lorri.menard at intel.com>>;
>       Monteleone, Robert <*robert.monteleone at intel.com*
>       <robert.monteleone at intel.com>>; *oscar at ornl.gov* <oscar at ornl.gov>;
>       Rao, Premanand M <*premanand.m.rao at intel.com*
>       <premanand.m.rao at intel.com>>; Rice, Michael P <
>       *michael.p.rice at intel.com* <michael.p.rice at intel.com>>; Robichaux,
>       Joseph <*joseph.robichaux at intel.com* <joseph.robichaux at intel.com>>;
>       *gregory.rodgers at amd.com* <gregory.rodgers at amd.com>; Rokos,
>       Georgios <*georgios.rokos at intel.com* <georgios.rokos at intel.com>>; '
>       *samuel.antao at ibm.com* <samuel.antao at ibm.com>' <
>       *samuel.antao at ibm.com* <samuel.antao at ibm.com>>; 'Sarah McNamara' <
>       *mcnamara at ca.ibm.com* <mcnamara at ca.ibm.com>>; '
>       *sergey.y.ostanevich at gmail.com* <sergey.y.ostanevich at gmail.com>' <
>       *sergey.y.ostanevich at gmail.com* <sergey.y.ostanevich at gmail.com>>;
>       'Sergio Pino Gallardo' <*sergiop at udel.edu* <sergiop at udel.edu>>;
>       'Sharif, Hashim' <*hsharif3 at illinois.edu* <hsharif3 at illinois.edu>>;
>       Sjodin, Jan <*Jan.Sjodin at amd.com* <Jan.Sjodin at amd.com>>; Sunil
>       Shrestha <*sshrestha at cray.com* <sshrestha at cray.com>>; 'Sunita
>       Chandrasekaran' <*schandra at udel.edu* <schandra at udel.edu>>; Tian,
>       Xinmin <*xinmin.tian at intel.com* <xinmin.tian at intel.com>>; Tianyi
>       Zhang <*tzhan18 at lsu.edu* <tzhan18 at lsu.edu>>; '*vadve at illinois.edu*
>       <vadve at illinois.edu>' <*vadve at illinois.edu* <vadve at illinois.edu>>;
>       'Wael Yehia' <*wyehia at ca.ibm.com* <wyehia at ca.ibm.com>>; 'Wang Chen'
>       <*wdchen at ca.ibm.com* <wdchen at ca.ibm.com>>; Wilmarth, Terry L <
>       *terry.l.wilmarth at intel.com* <terry.l.wilmarth at intel.com>>
> * Subject:* Comparison of 2 schemes to implement OpenMP 5.0 declare
>       mapper codegen
>
>       Hi,
>
>       Alexey and I would like to have your attention on an ongoing
>       discussion of 2 schemes to implement the declare mapper in OpenMP 5.0. The
>       detailed discussion can be found at
>       *https://reviews.llvm.org/D59474*
>       <https://urldefense.proofpoint.com/v2/url?u=https-3A__reviews.llvm.org_D59474&d=DwMFaQ&c=aTOVZmpUfPKZuaG9NO7J7Mh6imZbfhL47t9CpZ-pCOw&r=RLUU7gQynM_GwGu2QR7zHw&m=0c8CuLZZzM3R7PecCmFPYLuPYEOtCJHYTIGjSgIPaWU&s=EVaPRpEtSzi0Y56zmjD5fXRzN87UZDOaYp5PY3TXiVQ&e=>
>
>       Scheme 1 (the one has been implemented by me in
>       *https://reviews.llvm.org/D59474*
>       <https://urldefense.proofpoint.com/v2/url?u=https-3A__reviews.llvm.org_D59474&d=DwMFaQ&c=aTOVZmpUfPKZuaG9NO7J7Mh6imZbfhL47t9CpZ-pCOw&r=RLUU7gQynM_GwGu2QR7zHw&m=0c8CuLZZzM3R7PecCmFPYLuPYEOtCJHYTIGjSgIPaWU&s=EVaPRpEtSzi0Y56zmjD5fXRzN87UZDOaYp5PY3TXiVQ&e=>
>       ):
>       The detailed design can be found at
>       *https://github.com/lingda-li/public-sharing/blob/master/mapper_runtime_design.pptx*
>       <https://github.com/lingda-li/public-sharing/blob/master/mapper_runtime_design.pptx>
>       For each mapper function, the compiler generates a function like
>       this:
>
>       ```
>       void <type>.mapper(void *base, void *begin, size_t size, int64_t
>       type) {
>       // Allocate space for an array section first.
>       if (size > 1 && !maptype.IsDelete)
>       <push>(base, begin, size*sizeof(Ty), clearToFrom(type));
>
>       // Map members.
>       for (unsigned i = 0; i < size; i++) {
>       // For each component specified by this mapper:
>       for (auto c : components) {
>       ...; // code to generate c.arg_base, c.arg_begin, c.arg_size,
>       c.arg_type
>       if (c.hasMapper())
>       (*c.Mapper())(c.arg_base, c.arg_begin, c.arg_size, c.arg_type);
>       else
>       <push>(c.arg_base, c.arg_begin, c.arg_size, c.arg_type);
>       }
>       }
>       // Delete the array section.
>       if (size > 1 && maptype.IsDelete)
>       <push>(base, begin, size*sizeof(Ty), clearToFrom(type));
>       }
>       ```
>       This function is passed to the OpenMP runtime, and the runtime will
>       call this function to finish the data mapping.
>
>
>       Scheme 2 (which Alexey proposes):
>       Alexey proposes to move parts of the mapper function above into the
>       OpenMP runtime, so the compiler will generate code below:
>       ```
>       void <type>.mapper(void *base, void *begin, size_t size, int64_t
>       type) {
>       ...; // code to generate arg_base, arg_begin, arg_size, arg_type,
>       arg_mapper.
>       auto sub_components[] = {...}; // fill in generated begin, base, ...
>       __tgt_mapper(base, begin, size, type, sub_components);
>       }
>       ```
>
>       `__tgt_mapper` is a runtime function as below:
>       ```
>       void __tgt_mapper(void *base, void *begin, size_t size, int64_t
>       type, auto components[]) {
>       // Allocate space for an array section first.
>       if (size > 1 && !maptype.IsDelete)
>       <push>(base, begin, size*sizeof(Ty), clearToFrom(type));
>
>       // Map members.
>       for (unsigned i = 0; i < size; i++) {
>       // For each component specified by this mapper:
>       for (auto c : components) {
>       if (c.hasMapper())
>       (*c.Mapper())(c.arg_base, c.arg_begin, c.arg_size, c.arg_type);
>       else
>       <push>(c.arg_base, c.arg_begin, c.arg_size, c.arg_type);
>       }
>       }
>       // Delete the array section.
>       if (size > 1 && maptype.IsDelete)
>       <push>(base, begin, size*sizeof(Ty), clearToFrom(type));
>       }
>       ```
>
>       Comparison:
>       Why to choose 1 over 2:
>       1. In scheme 2, the compiler needs to generate all map types and
>       pass them to __tgt_mapper through sub_components. But in this case, the
>       compiler won't be able to generate the correct MEMBER_OF field in map type.
>       As a result, the runtime has to fix it using the mechanism we already have
>       here: __tgt_mapper_num_components. This not only increases complexity, but
>       also, it means the runtime needs further manipulation of the map type,
>       which creates locality issues. While in the current scheme, the map type is
>       generated by compiler once, so the data locality will be very good in this
>       case.
>       2. In scheme 2, sub_components includes all components that should
>       be mapped. If we are mapping an array, this means we need to map many
>       components, which will need to allocate memory for sub_components in the
>       heap. This creates further memory management burden and is not an efficient
>       way to use memory.
>       3. In scheme 1, we are able to inline nested mapper functions. As a
>       result, the compiler can do further optimizations to optimize the mapper
>       function, e.g., eliminate redundant computation, loop unrolling, and thus
>       achieve potentially better performance. We cannot achieve these
>       optimizations in scheme 2.
>
>       Why to choose 2 over 1:
>       1. Less code in the mapper function codegen (I doubt this because
>       the codegen function of scheme 1 uses less than 200 loc)
>       Alexey may have other reasons.
>
>       We will appreciate if you can share your thoughts.
>
>       Thanks,
>       Lingda Li
>       ------------------------------
>       *From:* Narayanaswamy, Ravi <*ravi.narayanaswamy at intel.com*
>       <ravi.narayanaswamy at intel.com>>
> * Sent:* Wednesday, June 19, 2019 3:09 PM
> * To:* 'Alexandre Eichenberger'; 'Alexey Bataev'; '
>       *barbara.chapman at stonybrook.edu* <barbara.chapman at stonybrook.edu>';
>       Bobrovsky, Konstantin S; 'Carlo Bertolli'; 'Chan, SiuChi'; Cownie, James H;
>       David Oehmke; Deepak Eachempati; 'Denny, Joel E.'; Dmitriev, Serguei N;
>       Doerfert, Johannes Rudolf ; '*estotzer at ti.com* <estotzer at ti.com>';
>       'Ettore Tiotto'; '*fraggamuffin at gmail.com* <fraggamuffin at gmail.com>';
>       'Gheorghe-Teod Bercea'; '*hfinkel at anl.gov* <hfinkel at anl.gov>'; '
>       *jbeyer at nvidia.com* <jbeyer at nvidia.com>'; 'Jeeva Paudel'; 'Jeff
>       Heath'; Jeffrey Sandoval; Jones, Jeff C; '*josem at udel.edu*
>       <josem at udel.edu>'; 'Kelvin Li'; 'Kevin K O'Brien'; '
>       *khaldi.dounia at gmail.com* <khaldi.dounia at gmail.com>'; 'Kotsifakou,
>       Maria'; Krishnaiyer, Rakesh; Lieberman, Ron ; '*lli at bnl.gov*
>       <lli at bnl.gov>'; 'Lopez, Matthew Graham'; '*lopezmg at ornl.org*
>       <lopezmg at ornl.org>'; 'Martin Kong'; 'Matt Martineau'; Menard,
>       Lorri; Monteleone, Robert; Narayanaswamy, Ravi; 'Oscar R. Hernandez'; Rao,
>       Premanand M; Rice, Michael P; Robichaux, Joseph; Rodgers, Gregory; Rokos,
>       Georgios; '*samuel.antao at ibm.com* <samuel.antao at ibm.com>'; 'Sarah
>       McNamara'; '*sergey.y.ostanevich at gmail.com*
>       <sergey.y.ostanevich at gmail.com>'; 'Sergio Pino Gallardo'; 'Sharif,
>       Hashim'; Sjodin, Jan ; Sunil Shrestha (*sshrestha at cray.com*
>       <sshrestha at cray.com>); 'Sunita Chandrasekaran'; Tian, Xinmin;
>       Tianyi Zhang; '*vadve at illinois.edu* <vadve at illinois.edu>'; 'Wael
>       Yehia'; 'Wang Chen'; Wilmarth, Terry L
> * Subject:* OpenMP / HPC in Clang / LLVM Multi-company Telecom Meeting
>       Minutes June 19th 2019
>
>       *Next Meeting : July 10**th** (Moved from July 3**rd**)*
>
>       *Opens :*
>       *- Documentation*
>       - Greg : Can we have documents for libopenmp and Libomptarget.
>       - Alexey suggested having 3 documents: libopenmp, Libomptarget and
>       device plugin
>       - Hal will convert the existing libomptarget document. Once done
>       others can update document to capture the existing implementation
>       Future addition to libomptarget will also require update to
>       document.
>       - Next libopenmp document will be created if it does not exist or
>       updated if one exists.
>
>       *LTO for fat binary linking*
>       - Serguei (Intel) has an implementation which enables LTO and doing
>       away with linker scripts.
>       Everybody agreed this is a good idea, especially some linkers don’t
>       have support for linker scripts.
>       AMD is interested in enabling enabling LTO and will like to see the
>       code
>       Serguei to post the code to get feedback from all
>       - Hal to present in next meeting his proposal to support static fat
>       archives using LTO.
>
>       *OpenMP 5.0 Features*
>       - No update on setting up the public website. Johannes was out
>       attending ISC.
>       - New features added since last release (courtesy of Kelvin)
>       - allocate clause/allocate directive - parsing+sema, codegen
>       - mutexinout dependence-type for task
>       - user-defined mapper (declare mapper) - parsing+sema.
>       - omp_get_device_num() API routine
>
>       *Development Activity*
>       *- Async API*
>       Support in Clang and libopenmp including lit test had been checked
>       in by Doru
>
>       *- Mapper support*
>       Initial support for Mapper has been posted for review Lingda. Once
>       approved, the rest of the support will be done
>       Lingda : Should the old API being replaced by the new similar API
>       with extra mapper argument be obsoleted
>       Suggestion was for clang to not generated but keep the API in
>       libomptarget for backward compatible. In the future it can be obsoleted
>
>       *- Required Directives*
>       Support for required directives has been checked in by Doru.
>       There was one issue with checking for requires directive and
>       confirming it the Declare type is TO or LINK.
>       Doru removed the check and added note to make sure if things change
>       in future need to modify this code.
>
>       *Roll Call :*
>
> *Company* *Attendees*
> *19-Jun*
> *AMD*
> Greg Rodgers
> x
> Ashwin Aji
> Jan Sjodin
> x
> Ron Lieberman
> x
> sameer Sahasrabuddhe
> Andrey Kasaurov
> *ANL * Hal Finkel
> x
> Johannes Doerfert
> *IBM* Alexandre Eichenberger
> Carlo Bertolli
> Kelvin Li
> Doru
> x
> Alexey Bataev
> x
> *Intel* Andrey Churbanov
> Ravi Narayanaswamy
> x
> Serguei Dmitriev
> x
> Rajiv Deodhar
> Lorri Menard
> Terry Wilmarth
> Rao, Prem
> Hansang Bae
> George Rokos
> x
> *Cray* Deepak Eachempati
> x
> *Micron* John Leidel
> *Nvidia* James Beyer
> x
> *ORNL* Graham Lopez
> Joel Denny
> Geoffroy Vallee
> Oscar Hernandez
> *SBU/BNL* Lingda Li
> x
> Jose Monlsave
> Martin Kong
> *TI* Eric Stotzer
> *U of Bristol* Mat Martineau
> *U of Delaware* Sunita Chandrasekaran
> *U of Illinois* Hashim Sharif
> *Rice* John Mellor-Crummey
> *LSU* Tianyi Zhang
>
>
>
>
>
>       .........................................................................................................................................
>       à*Join Skype Meeting*
>       <https://urldefense.proofpoint.com/v2/url?u=https-3A__meet.intel.com_ravi.narayanaswamy_DK7943NR&d=DwMFaQ&c=aTOVZmpUfPKZuaG9NO7J7Mh6imZbfhL47t9CpZ-pCOw&r=RLUU7gQynM_GwGu2QR7zHw&m=0c8CuLZZzM3R7PecCmFPYLuPYEOtCJHYTIGjSgIPaWU&s=K4msFCmDvK4n0MdVQd7UTXRRvRkaNwLzMaP8fnX0iOg&e=>
>          Trouble Joining? *Try Skype Web App*
>          <https://urldefense.proofpoint.com/v2/url?u=https-3A__meet.intel.com_ravi.narayanaswamy_DK7943NR-3Fsl-3D1&d=DwMFaQ&c=aTOVZmpUfPKZuaG9NO7J7Mh6imZbfhL47t9CpZ-pCOw&r=RLUU7gQynM_GwGu2QR7zHw&m=0c8CuLZZzM3R7PecCmFPYLuPYEOtCJHYTIGjSgIPaWU&s=krI3wEp2z8GhcZt6feFq3WgaBjcEoTDRk-GvI1BIdO8&e=>
>       Join by phone
>       *+1(916)356-2663 (or your local bridge access #) Choose bridge 5.*
>       <+1(916)356-2663%20(or%20your%20local%20bridge%20access%20#)%20Choose%20bridge%205.>
>       (Global) English (United States)
>       *Find a local number*
>       <https://urldefense.proofpoint.com/v2/url?u=https-3A__dial.intel.com&d=DwMFaQ&c=aTOVZmpUfPKZuaG9NO7J7Mh6imZbfhL47t9CpZ-pCOw&r=RLUU7gQynM_GwGu2QR7zHw&m=0c8CuLZZzM3R7PecCmFPYLuPYEOtCJHYTIGjSgIPaWU&s=g2dQtoTqaRXyBMaIUpfyoPFDRTtrQbgbWbb9b90tgBg&e=>
>
>       Conference ID: 7607896966
>       *Forgot your dial-in PIN?*
>       <https://urldefense.proofpoint.com/v2/url?u=https-3A__dial.intel.com&d=DwMFaQ&c=aTOVZmpUfPKZuaG9NO7J7Mh6imZbfhL47t9CpZ-pCOw&r=RLUU7gQynM_GwGu2QR7zHw&m=0c8CuLZZzM3R7PecCmFPYLuPYEOtCJHYTIGjSgIPaWU&s=g2dQtoTqaRXyBMaIUpfyoPFDRTtrQbgbWbb9b90tgBg&e=>
>       |*Help*
>       <https://urldefense.proofpoint.com/v2/url?u=https-3A__o15.officeredir.microsoft.com_r_rlidLync15-3Fclid-3D1033-26p1-3D5-26p2-3D2009&d=DwMFaQ&c=aTOVZmpUfPKZuaG9NO7J7Mh6imZbfhL47t9CpZ-pCOw&r=RLUU7gQynM_GwGu2QR7zHw&m=0c8CuLZZzM3R7PecCmFPYLuPYEOtCJHYTIGjSgIPaWU&s=6OCBXxzOIJfra2Pewq_p-l2pY3MyKnuG-TLr7M1xq-s&e=>
>
>       [!OC([1033])!]
>
>       .........................................................................................................................................
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190628/85b384e8/attachment.html>


More information about the cfe-dev mailing list