[cfe-dev] Comparison of 2 schemes to implement OpenMP 5.0 declare mapper codegen
Lingda Li via cfe-dev
cfe-dev at lists.llvm.org
Fri Jun 28 06:59:05 PDT 2019
On Fri, Jun 28, 2019 at 9:49 AM Li, Lingda <lli at bnl.gov> wrote:
> I don't think we can have the buffer allocated within the mapper function.
> It has to be done in the runtime, because of nested mappers.
> First, all mapper functions are born in the same way. We cannot make the
> outer most mapper function allocate memory, whether the inner one doesn't
> and has to use what is allocated by the outer most mapper function.
> I suppose we still need to allocate memory in the runtime, so the runtime
> can pass the pointer and size to the mapper function, and the outer mapper
> function can then pass them into inner ones.
> Again, this is just like the current implementation, except that we don't
> use vecter::push_back(), instead we use something like a manual
> implementation of vector::push_back() (because we need to use the pointer
> and the current index)
>
> I believe the key question here is whether it is true that (the overhead
> of push_back() > the overhead of precalculating the total number + the
> memory allocation overhead + directly memory write). This will decide
> whether this change is necessary. Any opinions?
>
> Thanks,
> Lingda Li
> ------------------------------
> *From:* Alexey Bataev <Alexey.Bataev at ibm.com>
> *Sent:* Thursday, June 27, 2019 5:05 PM
> *To:* Li, Lingda
> *Cc:* Alexandre Eichenberger; Chapman, Barbara (Contact); Kevin K
> O'Brien; Carlo Bertolli; Deepak Eachempati; Denny, Joel E.; David Oehmke;
> Ettore Tiotto; fraggamuffin at gmail.com; Rokos, Georgios; Gheorghe-Teod
> Bercea; gregory.rodgers at amd.com; Hal Finkel; Sharif, Hashim; Cownie,
> James H; Sjodin, Jan; jbeyer at nvidia.com; Doerfert, Johannes Rudolf;
> Jones, Jeff C; josem at udel.edu; Robichaux, Joseph; Jeff Heath;
> khaldi.dounia at gmail.com; Kelvin Li; Bobrovsky, Konstantin S; Kotsifakou,
> Maria; lopezmg at ornl.org; Lopez, Matthew Graham; Menard, Lorri; Martin
> Kong; Sarah McNamara; Rice, Michael P; Matt Martineau; oscar at ornl.gov;
> Jeeva Paudel; Rao, Premanand M; Krishnaiyer, Rakesh; Narayanaswamy, Ravi;
> Monteleone, Robert; Lieberman, Ron; Samuel Antao; Jeffrey Sandoval; Sunita
> Chandrasekaran; sergey.y.ostanevich at gmail.com; Sergio Pino Gallardo;
> Dmitriev, Serguei N; Chan, SiuChi; Sunil Shrestha; Wilmarth, Terry L;
> Tianyi Zhang; vadve at illinois.edu; Wang Chen; Wael Yehia; Tian, Xinmin
> *Subject:* Re: Re: Re: RE: Comparison of 2 schemes to implement OpenMP
> 5.0 declare mapper codegen
>
> Yes, we need 2 functions, but thw first one can be optimized very
> effectively. After the optimizations and inlining it will end up with just
> return s1+s2+s3... I think, inost cases those sizes will be constant, since
> the mapper maps constant number of elements. And, thus, this expression
> will be optimized to just a constant value.
> You don't need to pass these functions to runtime. We can call the
> directly from the compiler.
> 1st call: get number of elements.
> 2nd: allocate the buffer
> 3rd call: call mapper with this preallocated buffer that fills this buffer
> without any calls of the runtime functions.
> 4th call: call the runtime to pass the buffer to the runtime.
>
> Best regards,
> Alexey Bataev
>
> 27 июня 2019 г., в 16:53, Li, Lingda <lli at bnl.gov> написал(а):
>
> If we precalculate the size, first, it means we need to generate 2
> functions for each mapper, rather than 1 now. One for mapping information
> filling as we have, the other for size calculation (This will not return
> constant values, because size depends on how many instances we are
> mapping). Both these 2 functions will need to be passed to the runtime. The
> runtime will need to precalculate the number of components first, then
> allocate memory, then call the mapper function to fill it up.
>
> Compared with the scheme 1, the differences are:
> 1) An extra call to calculate the total number, while scheme 1 does not;
> 2) A preallocated buffer, whose pointer and the current number should be
> passed to the mapper function, then the mapper function uses them to fill
> components, while scheme 1 uses push_back() to do the same thing.
>
> Is there really a benefit doing this? push_back() should be efficient
> enough compared with directly writing to memory.
>
> If people here think that, the overhead of push_back() > the overhead of
> precalculating the total number + the memory allocation overhead + directly
> memory write, then we can consider this scheme.
>
> Thanks,
> Lingda Li
>
>
> ------------------------------
> *From:* Alexey Bataev <Alexey.Bataev at ibm.com>
> *Sent:* Thursday, June 27, 2019 4:26 PM
> *To:* Li, Lingda
> *Cc:* Alexandre Eichenberger; Chapman, Barbara (Contact); Kevin K
> O'Brien; Carlo Bertolli; Deepak Eachempati; Denny, Joel E.; David Oehmke;
> Ettore Tiotto; fraggamuffin at gmail.com; Rokos, Georgios; Gheorghe-Teod
> Bercea; gregory.rodgers at amd.com; Hal Finkel; Sharif, Hashim; Cownie,
> James H; Sjodin, Jan; jbeyer at nvidia.com; Doerfert, Johannes Rudolf;
> Jones, Jeff C; josem at udel.edu; Robichaux, Joseph; Jeff Heath;
> khaldi.dounia at gmail.com; Kelvin Li; Bobrovsky, Konstantin S; Kotsifakou,
> Maria; lopezmg at ornl.org; Lopez, Matthew Graham; Menard, Lorri; Martin
> Kong; Sarah McNamara; Rice, Michael P; Matt Martineau; oscar at ornl.gov;
> Jeeva Paudel; Rao, Premanand M; Krishnaiyer, Rakesh; Narayanaswamy, Ravi;
> Monteleone, Robert; Lieberman, Ron; Samuel Antao; Jeffrey Sandoval; Sunita
> Chandrasekaran; sergey.y.ostanevich at gmail.com; Sergio Pino Gallardo;
> Dmitriev, Serguei N; Chan, SiuChi; Sunil Shrestha; Wilmarth, Terry L;
> Tianyi Zhang; vadve at illinois.edu; Wang Chen; Wael Yehia; Tian, Xinmin
> *Subject:* Re: Re: RE: Comparison of 2 schemes to implement OpenMP 5.0
> declare mapper codegen
>
>
> If the functions are inlined (the ines, intended for size precalculation).
> They can be optimized out very effectively since in most cases they will
> return constant values.
> If we could do this, we won't need vectors and oush_backs, we can use
> preallocated memory and internal counter.
> --------------
> Best regards,
> Alexey Bataev
>
> <graycol.gif>"Li, Lingda" ---06/27/2019 04:13:03 PM---Hi Alexey, I think
> that's why we choose to use variable size storage like std::vector to store
> the m
>
> From: "Li, Lingda" <lli at bnl.gov>
> To: Alexey Bataev <Alexey.Bataev at ibm.com>, Deepak Eachempati <
> deachempat at cray.com>
> Cc: "Narayanaswamy, Ravi" <ravi.narayanaswamy at intel.com>, "Alexandre
> Eichenberger" <alexe at us.ibm.com>, "Chapman, Barbara (Contact)" <
> barbara.chapman at stonybrook.edu>, "Bobrovsky, Konstantin S" <
> konstantin.s.bobrovsky at intel.com>, Carlo Bertolli <cbertol at us.ibm.com>,
> "Chan, SiuChi" <siuchi.chan at amd.com>, "Cownie, James H" <
> james.h.cownie at intel.com>, David Oehmke <doehmke at cray.com>, "Denny, Joel
> E." <dennyje at ornl.gov>, "Dmitriev, Serguei N" <
> serguei.n.dmitriev at intel.com>, "Doerfert, Johannes Rudolf" <
> jdoerfert at anl.gov>, Ettore Tiotto <etiotto at ca.ibm.com>, "
> fraggamuffin at gmail.com" <fraggamuffin at gmail.com>, Gheorghe-Teod Bercea <
> Gheorghe-Teod.Bercea at ibm.com>, Hal Finkel <hfinkel at anl.gov>, "
> jbeyer at nvidia.com" <jbeyer at nvidia.com>, Jeeva Paudel <pjeeva01 at ca.ibm.com>,
> Jeff Heath <jrheath at ca.ibm.com>, Jeffrey Sandoval <sandoval at cray.com>,
> "Jones, Jeff C" <jeff.c.jones at intel.com>, "josem at udel.edu" <josem at udel.edu>,
> Kelvin Li <kli at ca.ibm.com>, "Kevin K O'Brien" <caomhin at us.ibm.com>, "
> khaldi.dounia at gmail.com" <khaldi.dounia at gmail.com>, "Kotsifakou, Maria" <
> kotsifa2 at illinois.edu>, "Krishnaiyer, Rakesh" <
> rakesh.krishnaiyer at intel.com>, "Lieberman, Ron" <Ron.Lieberman at amd.com>,
> "Lopez, Matthew Graham" <lopezmg at ornl.gov>, "lopezmg at ornl.org" <
> lopezmg at ornl.org>, Martin Kong <martin.richard.kong at gmail.com>, Matt
> Martineau <m.martineau at bristol.ac.uk>, "Menard, Lorri" <
> lorri.menard at intel.com>, "Monteleone, Robert" <robert.monteleone at intel.com>,
> "oscar at ornl.gov" <oscar at ornl.gov>, "Rao, Premanand M" <
> premanand.m.rao at intel.com>, "Rice, Michael P" <michael.p.rice at intel.com>,
> "Robichaux, Joseph" <joseph.robichaux at intel.com>, "gregory.rodgers at amd.com"
> <gregory.rodgers at amd.com>, "Rokos, Georgios" <georgios.rokos at intel.com>,
> Samuel Antao <Samuel.Antao at ibm.com>, "Sarah McNamara" <mcnamara at ca.ibm.com>,
> "sergey.y.ostanevich at gmail.com" <sergey.y.ostanevich at gmail.com>, Sergio
> Pino Gallardo <sergiop at udel.edu>, "Sharif, Hashim" <hsharif3 at illinois.edu>,
> "Sjodin, Jan" <Jan.Sjodin at amd.com>, Sunil Shrestha <sshrestha at cray.com>,
> Sunita Chandrasekaran <schandra at udel.edu>, "Tian, Xinmin" <
> xinmin.tian at intel.com>, Tianyi Zhang <tzhan18 at lsu.edu>, "
> vadve at illinois.edu" <vadve at illinois.edu>, Wael Yehia <wyehia at ca.ibm.com>,
> Wang Chen <wdchen at ca.ibm.com>, "Wilmarth, Terry L" <
> terry.l.wilmarth at intel.com>
> Date: 06/27/2019 04:13 PM
> Subject: [EXTERNAL] Re: RE: Comparison of 2 schemes to implement OpenMP
> 5.0 declare mapper codegen
> ------------------------------
>
>
>
> Hi Alexey,
>
> I think that's why we choose to use variable size storage like std::vector
> to store the mapping information at the first place, right? It'll be costly
> to precalculate the total number of components, especially in the presence
> of nested mappers. Besides, a runtime function call is just a
> std::vector::push, so I think it's okay to have multiple function calls.
>
> Thanks,
> Lingda Li
> ------------------------------
>
> *From:* Alexey Bataev <Alexey.Bataev at ibm.com>
> * Sent:* Thursday, June 27, 2019 3:52 PM
> * To:* Deepak Eachempati
> * Cc:* Li, Lingda; Narayanaswamy, Ravi; Alexandre Eichenberger; Chapman,
> Barbara (Contact); Bobrovsky, Konstantin S; Carlo Bertolli; Chan, SiuChi;
> Cownie, James H; David Oehmke; Denny, Joel E.; Dmitriev, Serguei N;
> Doerfert, Johannes Rudolf; Ettore Tiotto; fraggamuffin at gmail.com;
> Gheorghe-Teod Bercea; Hal Finkel; jbeyer at nvidia.com; Jeeva Paudel; Jeff
> Heath; Jeffrey Sandoval; Jones, Jeff C; josem at udel.edu; Kelvin Li; Kevin
> K O'Brien; khaldi.dounia at gmail.com; Kotsifakou, Maria; Krishnaiyer,
> Rakesh; Lieberman, Ron; Lopez, Matthew Graham; lopezmg at ornl.org; Martin
> Kong; Matt Martineau; Menard, Lorri; Monteleone, Robert; oscar at ornl.gov;
> Rao, Premanand M; Rice, Michael P; Robichaux, Joseph;
> gregory.rodgers at amd.com; Rokos, Georgios; Samuel Antao; Sarah McNamara;
> sergey.y.ostanevich at gmail.com; Sergio Pino Gallardo; Sharif, Hashim;
> Sjodin, Jan; Sunil Shrestha; Sunita Chandrasekaran; Tian, Xinmin; Tianyi
> Zhang; vadve at illinois.edu; Wael Yehia; Wang Chen; Wilmarth, Terry L
> * Subject:* Re: RE: Comparison of 2 schemes to implement OpenMP 5.0
> declare mapper codegen
>
> Lingda, can we in scheme 1 precalculate the total number of components,
> allocate memory for these precalculate number of elements, then fill it
> with mappers and only after that call the runtime function (only once!) to
> transfer the mappings to the runtime?
>
> Best regards,
> Alexey Bataev
>
> 27 июня 2019 г., в 15:44, Deepak Eachempati <*deachempat at cray.com*
> <deachempat at cray.com>> написал(а):
>
> Got it. Thanks.
>
> -- Deepak
>
> *From:* Li, Lingda [*mailto:lli at bnl.gov* <lli at bnl.gov>]
> * Sent:* Thursday, June 27, 2019 2:41 PM
> * To:* Deepak Eachempati <*deachempat at cray.com* <deachempat at cray.com>>;
> Narayanaswamy, Ravi <*ravi.narayanaswamy at intel.com*
> <ravi.narayanaswamy at intel.com>>; 'Alexandre Eichenberger' <
> *alexe at us.ibm.com* <alexe at us.ibm.com>>; 'Alexey Bataev' <
> *Alexey.Bataev at ibm.com* <Alexey.Bataev at ibm.com>>; Chapman, Barbara
> (Contact) <*barbara.chapman at stonybrook.edu*
> <barbara.chapman at stonybrook.edu>>; Bobrovsky, Konstantin S <
> *konstantin.s.bobrovsky at intel.com*
> <konstantin.s.bobrovsky at intel.com>>; 'Carlo Bertolli' <
> *cbertol at us.ibm.com* <cbertol at us.ibm.com>>; 'Chan, SiuChi' <
> *siuchi.chan at amd.com* <siuchi.chan at amd.com>>; Cownie, James H <
> *james.h.cownie at intel.com* <james.h.cownie at intel.com>>; David
> Oehmke <*doehmke at cray.com* <doehmke at cray.com>>; 'Denny, Joel E.' <
> *dennyje at ornl.gov* <dennyje at ornl.gov>>; Dmitriev, Serguei N <
> *serguei.n.dmitriev at intel.com* <serguei.n.dmitriev at intel.com>>;
> Doerfert, Johannes Rudolf <*jdoerfert at anl.gov* <jdoerfert at anl.gov>>;
> 'Ettore Tiotto' <*etiotto at ca.ibm.com* <etiotto at ca.ibm.com>>; '
> *fraggamuffin at gmail.com* <fraggamuffin at gmail.com>' <
> *fraggamuffin at gmail.com* <fraggamuffin at gmail.com>>; 'Gheorghe-Teod
> Bercea' <*Gheorghe-Teod.Bercea at ibm.com*
> <Gheorghe-Teod.Bercea at ibm.com>>; Hal Finkel <*hfinkel at anl.gov*
> <hfinkel at anl.gov>>; '*jbeyer at nvidia.com* <jbeyer at nvidia.com>' <
> *jbeyer at nvidia.com* <jbeyer at nvidia.com>>; 'Jeeva Paudel' <
> *pjeeva01 at ca.ibm.com* <pjeeva01 at ca.ibm.com>>; 'Jeff Heath' <
> *jrheath at ca.ibm.com* <jrheath at ca.ibm.com>>; Jeffrey Sandoval <
> *sandoval at cray.com* <sandoval at cray.com>>; Jones, Jeff C <
> *jeff.c.jones at intel.com* <jeff.c.jones at intel.com>>; '
> *josem at udel.edu* <josem at udel.edu>' <*josem at udel.edu*
> <josem at udel.edu>>; 'Kelvin Li' <*kli at ca.ibm.com* <kli at ca.ibm.com>>;
> 'Kevin K O'Brien' <*caomhin at us.ibm.com* <caomhin at us.ibm.com>>; '
> *khaldi.dounia at gmail.com* <khaldi.dounia at gmail.com>' <
> *khaldi.dounia at gmail.com* <khaldi.dounia at gmail.com>>; 'Kotsifakou,
> Maria' <*kotsifa2 at illinois.edu* <kotsifa2 at illinois.edu>>;
> Krishnaiyer, Rakesh <*rakesh.krishnaiyer at intel.com*
> <rakesh.krishnaiyer at intel.com>>; Lieberman, Ron <
> *Ron.Lieberman at amd.com* <Ron.Lieberman at amd.com>>; 'Lopez, Matthew
> Graham' <*lopezmg at ornl.gov* <lopezmg at ornl.gov>>; '*lopezmg at ornl.org*
> <lopezmg at ornl.org>' <*lopezmg at ornl.org* <lopezmg at ornl.org>>;
> 'Martin Kong' <*martin.richard.kong at gmail.com*
> <martin.richard.kong at gmail.com>>; 'Matt Martineau' <
> *m.martineau at bristol.ac.uk* <m.martineau at bristol.ac.uk>>; Menard,
> Lorri <*lorri.menard at intel.com* <lorri.menard at intel.com>>;
> Monteleone, Robert <*robert.monteleone at intel.com*
> <robert.monteleone at intel.com>>; *oscar at ornl.gov* <oscar at ornl.gov>;
> Rao, Premanand M <*premanand.m.rao at intel.com*
> <premanand.m.rao at intel.com>>; Rice, Michael P <
> *michael.p.rice at intel.com* <michael.p.rice at intel.com>>; Robichaux,
> Joseph <*joseph.robichaux at intel.com* <joseph.robichaux at intel.com>>;
> *gregory.rodgers at amd.com* <gregory.rodgers at amd.com>; Rokos,
> Georgios <*georgios.rokos at intel.com* <georgios.rokos at intel.com>>; '
> *samuel.antao at ibm.com* <samuel.antao at ibm.com>' <
> *samuel.antao at ibm.com* <samuel.antao at ibm.com>>; 'Sarah McNamara' <
> *mcnamara at ca.ibm.com* <mcnamara at ca.ibm.com>>; '
> *sergey.y.ostanevich at gmail.com* <sergey.y.ostanevich at gmail.com>' <
> *sergey.y.ostanevich at gmail.com* <sergey.y.ostanevich at gmail.com>>;
> 'Sergio Pino Gallardo' <*sergiop at udel.edu* <sergiop at udel.edu>>;
> 'Sharif, Hashim' <*hsharif3 at illinois.edu* <hsharif3 at illinois.edu>>;
> Sjodin, Jan <*Jan.Sjodin at amd.com* <Jan.Sjodin at amd.com>>; Sunil
> Shrestha <*sshrestha at cray.com* <sshrestha at cray.com>>; 'Sunita
> Chandrasekaran' <*schandra at udel.edu* <schandra at udel.edu>>; Tian,
> Xinmin <*xinmin.tian at intel.com* <xinmin.tian at intel.com>>; Tianyi
> Zhang <*tzhan18 at lsu.edu* <tzhan18 at lsu.edu>>; '*vadve at illinois.edu*
> <vadve at illinois.edu>' <*vadve at illinois.edu* <vadve at illinois.edu>>;
> 'Wael Yehia' <*wyehia at ca.ibm.com* <wyehia at ca.ibm.com>>; 'Wang Chen'
> <*wdchen at ca.ibm.com* <wdchen at ca.ibm.com>>; Wilmarth, Terry L <
> *terry.l.wilmarth at intel.com* <terry.l.wilmarth at intel.com>>
> * Subject:* Re: Comparison of 2 schemes to implement OpenMP 5.0 declare
> mapper codegen
>
> In the current scheme, all mappings within a mapper function is
> done atomically by one thread. In the mapper function of the example in the
> original email, <push> will just push the mapping information into an
> internal data structure. Once all mapping information is available, the
> runtime will do the real mapping together. For your example, the behavior
> is the same as the code below:
>
> ...
> #pragma omp parallel num_threads(2)
> {
> if (omp_get_thread_num() == 0) {
> #pragma omp target map(s.x, s.p[0:s.x])
> {
> for (int i = 0; i < s.x; i++) s.p[i] = i;
> }
> } else {
> #pragma omp target map(other_data)
> {
> // work on other_data
> }
> }
> ...
> ------------------------------
> *From:* Deepak Eachempati <*deachempat at cray.com*
> <deachempat at cray.com>>
> * Sent:* Thursday, June 27, 2019 3:34 PM
> * To:* Li, Lingda; Narayanaswamy, Ravi; 'Alexandre Eichenberger'; 'Alexey
> Bataev'; Chapman, Barbara (Contact); Bobrovsky, Konstantin S; 'Carlo
> Bertolli'; 'Chan, SiuChi'; Cownie, James H; David Oehmke; 'Denny, Joel E.';
> Dmitriev, Serguei N; Doerfert, Johannes Rudolf ; 'Ettore Tiotto'; '
> *fraggamuffin at gmail.com* <fraggamuffin at gmail.com>'; 'Gheorghe-Teod
> Bercea'; Hal Finkel; '*jbeyer at nvidia.com* <jbeyer at nvidia.com>';
> 'Jeeva Paudel'; 'Jeff Heath'; Jeffrey Sandoval; Jones, Jeff C; '
> *josem at udel.edu* <josem at udel.edu>'; 'Kelvin Li'; 'Kevin K O'Brien';
> '*khaldi.dounia at gmail.com* <khaldi.dounia at gmail.com>'; 'Kotsifakou,
> Maria'; Krishnaiyer, Rakesh; Lieberman, Ron ; 'Lopez, Matthew Graham'; '
> *lopezmg at ornl.org* <lopezmg at ornl.org>'; 'Martin Kong'; 'Matt
> Martineau'; Menard, Lorri; Monteleone, Robert; *oscar at ornl.gov*
> <oscar at ornl.gov>; Rao, Premanand M; Rice, Michael P; Robichaux,
> Joseph; *gregory.rodgers at amd.com* <gregory.rodgers at amd.com>; Rokos,
> Georgios; '*samuel.antao at ibm.com* <samuel.antao at ibm.com>'; 'Sarah
> McNamara'; '*sergey.y.ostanevich at gmail.com*
> <sergey.y.ostanevich at gmail.com>'; 'Sergio Pino Gallardo'; 'Sharif,
> Hashim'; Sjodin, Jan ; Sunil Shrestha; 'Sunita Chandrasekaran'; Tian,
> Xinmin; Tianyi Zhang; '*vadve at illinois.edu* <vadve at illinois.edu>';
> 'Wael Yehia'; 'Wang Chen'; Wilmarth, Terry L
> * Subject:* RE: Comparison of 2 schemes to implement OpenMP 5.0 declare
> mapper codegen
>
> I was referring to something like this, where another thread is not
> trying to map the same data:
>
> #pragma omp declare mapper(S s) map(s.x) map(s.p[0:s.x])
> S s;
> ...
> #pragma omp parallel num_threads(2)
> {
> if (omp_get_thread_num() == 0) {
> #pragma omp target map(s)
> {
> for (int i = 0; i < s.x; i++) s.p[i] = i;
> }
> } else {
> #pragma omp target map(other_data)
> {
> // work on other_data
> }
> }
> ...
>
> Since I believe you are mapping s.x and s.p as separate map
> operations, it is possible that another thread could map ‘other_data’ in
> between those two maps. If this happens, will your implementation still
> ensure that s.x and s.p are positioned at the right offsets with respect to
> the same base address (&s)?
>
> -- Deepak
>
> *From:* Li, Lingda [*mailto:lli at bnl.gov* <lli at bnl.gov>]
> * Sent:* Thursday, June 27, 2019 2:26 PM
> * To:* Deepak Eachempati <*deachempat at cray.com* <deachempat at cray.com>>;
> Narayanaswamy, Ravi <*ravi.narayanaswamy at intel.com*
> <ravi.narayanaswamy at intel.com>>; 'Alexandre Eichenberger' <
> *alexe at us.ibm.com* <alexe at us.ibm.com>>; 'Alexey Bataev' <
> *Alexey.Bataev at ibm.com* <Alexey.Bataev at ibm.com>>; Chapman, Barbara
> (Contact) <*barbara.chapman at stonybrook.edu*
> <barbara.chapman at stonybrook.edu>>; Bobrovsky, Konstantin S <
> *konstantin.s.bobrovsky at intel.com*
> <konstantin.s.bobrovsky at intel.com>>; 'Carlo Bertolli' <
> *cbertol at us.ibm.com* <cbertol at us.ibm.com>>; 'Chan, SiuChi' <
> *siuchi.chan at amd.com* <siuchi.chan at amd.com>>; Cownie, James H <
> *james.h.cownie at intel.com* <james.h.cownie at intel.com>>; David
> Oehmke <*doehmke at cray.com* <doehmke at cray.com>>; 'Denny, Joel E.' <
> *dennyje at ornl.gov* <dennyje at ornl.gov>>; Dmitriev, Serguei N <
> *serguei.n.dmitriev at intel.com* <serguei.n.dmitriev at intel.com>>;
> Doerfert, Johannes Rudolf <*jdoerfert at anl.gov* <jdoerfert at anl.gov>>;
> 'Ettore Tiotto' <*etiotto at ca.ibm.com* <etiotto at ca.ibm.com>>; '
> *fraggamuffin at gmail.com* <fraggamuffin at gmail.com>' <
> *fraggamuffin at gmail.com* <fraggamuffin at gmail.com>>; 'Gheorghe-Teod
> Bercea' <*Gheorghe-Teod.Bercea at ibm.com*
> <Gheorghe-Teod.Bercea at ibm.com>>; Hal Finkel <*hfinkel at anl.gov*
> <hfinkel at anl.gov>>; '*jbeyer at nvidia.com* <jbeyer at nvidia.com>' <
> *jbeyer at nvidia.com* <jbeyer at nvidia.com>>; 'Jeeva Paudel' <
> *pjeeva01 at ca.ibm.com* <pjeeva01 at ca.ibm.com>>; 'Jeff Heath' <
> *jrheath at ca.ibm.com* <jrheath at ca.ibm.com>>; Jeffrey Sandoval <
> *sandoval at cray.com* <sandoval at cray.com>>; Jones, Jeff C <
> *jeff.c.jones at intel.com* <jeff.c.jones at intel.com>>; '
> *josem at udel.edu* <josem at udel.edu>' <*josem at udel.edu*
> <josem at udel.edu>>; 'Kelvin Li' <*kli at ca.ibm.com* <kli at ca.ibm.com>>;
> 'Kevin K O'Brien' <*caomhin at us.ibm.com* <caomhin at us.ibm.com>>; '
> *khaldi.dounia at gmail.com* <khaldi.dounia at gmail.com>' <
> *khaldi.dounia at gmail.com* <khaldi.dounia at gmail.com>>; 'Kotsifakou,
> Maria' <*kotsifa2 at illinois.edu* <kotsifa2 at illinois.edu>>;
> Krishnaiyer, Rakesh <*rakesh.krishnaiyer at intel.com*
> <rakesh.krishnaiyer at intel.com>>; Lieberman, Ron <
> *Ron.Lieberman at amd.com* <Ron.Lieberman at amd.com>>; 'Lopez, Matthew
> Graham' <*lopezmg at ornl.gov* <lopezmg at ornl.gov>>; '*lopezmg at ornl.org*
> <lopezmg at ornl.org>' <*lopezmg at ornl.org* <lopezmg at ornl.org>>;
> 'Martin Kong' <*martin.richard.kong at gmail.com*
> <martin.richard.kong at gmail.com>>; 'Matt Martineau' <
> *m.martineau at bristol.ac.uk* <m.martineau at bristol.ac.uk>>; Menard,
> Lorri <*lorri.menard at intel.com* <lorri.menard at intel.com>>;
> Monteleone, Robert <*robert.monteleone at intel.com*
> <robert.monteleone at intel.com>>; *oscar at ornl.gov* <oscar at ornl.gov>;
> Rao, Premanand M <*premanand.m.rao at intel.com*
> <premanand.m.rao at intel.com>>; Rice, Michael P <
> *michael.p.rice at intel.com* <michael.p.rice at intel.com>>; Robichaux,
> Joseph <*joseph.robichaux at intel.com* <joseph.robichaux at intel.com>>;
> *gregory.rodgers at amd.com* <gregory.rodgers at amd.com>; Rokos,
> Georgios <*georgios.rokos at intel.com* <georgios.rokos at intel.com>>; '
> *samuel.antao at ibm.com* <samuel.antao at ibm.com>' <
> *samuel.antao at ibm.com* <samuel.antao at ibm.com>>; 'Sarah McNamara' <
> *mcnamara at ca.ibm.com* <mcnamara at ca.ibm.com>>; '
> *sergey.y.ostanevich at gmail.com* <sergey.y.ostanevich at gmail.com>' <
> *sergey.y.ostanevich at gmail.com* <sergey.y.ostanevich at gmail.com>>;
> 'Sergio Pino Gallardo' <*sergiop at udel.edu* <sergiop at udel.edu>>;
> 'Sharif, Hashim' <*hsharif3 at illinois.edu* <hsharif3 at illinois.edu>>;
> Sjodin, Jan <*Jan.Sjodin at amd.com* <Jan.Sjodin at amd.com>>; Sunil
> Shrestha <*sshrestha at cray.com* <sshrestha at cray.com>>; 'Sunita
> Chandrasekaran' <*schandra at udel.edu* <schandra at udel.edu>>; Tian,
> Xinmin <*xinmin.tian at intel.com* <xinmin.tian at intel.com>>; Tianyi
> Zhang <*tzhan18 at lsu.edu* <tzhan18 at lsu.edu>>; '*vadve at illinois.edu*
> <vadve at illinois.edu>' <*vadve at illinois.edu* <vadve at illinois.edu>>;
> 'Wael Yehia' <*wyehia at ca.ibm.com* <wyehia at ca.ibm.com>>; 'Wang Chen'
> <*wdchen at ca.ibm.com* <wdchen at ca.ibm.com>>; Wilmarth, Terry L <
> *terry.l.wilmarth at intel.com* <terry.l.wilmarth at intel.com>>
> * Subject:* Re: Comparison of 2 schemes to implement OpenMP 5.0 declare
> mapper codegen
>
> When 2 threads try to concurrently map the same data, it behaves
> the same as when 2 threads concurrently map the same data using map
> clauses, and mappers don't introduce extra considerations here. For
> instance, both threads use #omp target enter data concurrently.
>
> When 2 threads concurrently maps the same data, my understanding
> based on the current code is, it will create 2 copies of the same data,
> either copy is correctly to use. It may have a problem when both copies are
> mapped back if not synchronized correctly, but this is a programming issue,
> not the responsibility of OpenMP.
>
> Thanks,
> Lingda Li
> ------------------------------
> *From:* Deepak Eachempati <*deachempat at cray.com*
> <deachempat at cray.com>>
> * Sent:* Thursday, June 27, 2019 3:17 PM
> * To:* Li, Lingda; Narayanaswamy, Ravi; 'Alexandre Eichenberger'; 'Alexey
> Bataev'; Chapman, Barbara (Contact); Bobrovsky, Konstantin S; 'Carlo
> Bertolli'; 'Chan, SiuChi'; Cownie, James H; David Oehmke; 'Denny, Joel E.';
> Dmitriev, Serguei N; Doerfert, Johannes Rudolf ; 'Ettore Tiotto'; '
> *fraggamuffin at gmail.com* <fraggamuffin at gmail.com>'; 'Gheorghe-Teod
> Bercea'; Hal Finkel; '*jbeyer at nvidia.com* <jbeyer at nvidia.com>';
> 'Jeeva Paudel'; 'Jeff Heath'; Jeffrey Sandoval; Jones, Jeff C; '
> *josem at udel.edu* <josem at udel.edu>'; 'Kelvin Li'; 'Kevin K O'Brien';
> '*khaldi.dounia at gmail.com* <khaldi.dounia at gmail.com>'; 'Kotsifakou,
> Maria'; Krishnaiyer, Rakesh; Lieberman, Ron ; 'Lopez, Matthew Graham'; '
> *lopezmg at ornl.org* <lopezmg at ornl.org>'; 'Martin Kong'; 'Matt
> Martineau'; Menard, Lorri; Monteleone, Robert; *oscar at ornl.gov*
> <oscar at ornl.gov>; Rao, Premanand M; Rice, Michael P; Robichaux,
> Joseph; *gregory.rodgers at amd.com* <gregory.rodgers at amd.com>; Rokos,
> Georgios; '*samuel.antao at ibm.com* <samuel.antao at ibm.com>'; 'Sarah
> McNamara'; '*sergey.y.ostanevich at gmail.com*
> <sergey.y.ostanevich at gmail.com>'; 'Sergio Pino Gallardo'; 'Sharif,
> Hashim'; Sjodin, Jan ; Sunil Shrestha; 'Sunita Chandrasekaran'; Tian,
> Xinmin; Tianyi Zhang; '*vadve at illinois.edu* <vadve at illinois.edu>';
> 'Wael Yehia'; 'Wang Chen'; Wilmarth, Terry L
> * Subject:* RE: Comparison of 2 schemes to implement OpenMP 5.0 declare
> mapper codegen
>
> Thanks.
>
> Is it possible for another thread to be concurrently mapped
> something else while the maps from the mapper function are taking place? If
> so, how do you guarantee that the allocation for each component will get
> you the right addresses in device memory? Sorry if this was covered before
> and I missed it.
>
> -- Deepak
>
> *From:* Li, Lingda [*mailto:lli at bnl.gov* <lli at bnl.gov>]
> * Sent:* Thursday, June 27, 2019 2:08 PM
> * To:* Deepak Eachempati <*deachempat at cray.com* <deachempat at cray.com>>;
> Narayanaswamy, Ravi <*ravi.narayanaswamy at intel.com*
> <ravi.narayanaswamy at intel.com>>; 'Alexandre Eichenberger' <
> *alexe at us.ibm.com* <alexe at us.ibm.com>>; 'Alexey Bataev' <
> *Alexey.Bataev at ibm.com* <Alexey.Bataev at ibm.com>>; Chapman, Barbara
> (Contact) <*barbara.chapman at stonybrook.edu*
> <barbara.chapman at stonybrook.edu>>; Bobrovsky, Konstantin S <
> *konstantin.s.bobrovsky at intel.com*
> <konstantin.s.bobrovsky at intel.com>>; 'Carlo Bertolli' <
> *cbertol at us.ibm.com* <cbertol at us.ibm.com>>; 'Chan, SiuChi' <
> *siuchi.chan at amd.com* <siuchi.chan at amd.com>>; Cownie, James H <
> *james.h.cownie at intel.com* <james.h.cownie at intel.com>>; David
> Oehmke <*doehmke at cray.com* <doehmke at cray.com>>; 'Denny, Joel E.' <
> *dennyje at ornl.gov* <dennyje at ornl.gov>>; Dmitriev, Serguei N <
> *serguei.n.dmitriev at intel.com* <serguei.n.dmitriev at intel.com>>;
> Doerfert, Johannes Rudolf <*jdoerfert at anl.gov* <jdoerfert at anl.gov>>;
> 'Ettore Tiotto' <*etiotto at ca.ibm.com* <etiotto at ca.ibm.com>>; '
> *fraggamuffin at gmail.com* <fraggamuffin at gmail.com>' <
> *fraggamuffin at gmail.com* <fraggamuffin at gmail.com>>; 'Gheorghe-Teod
> Bercea' <*Gheorghe-Teod.Bercea at ibm.com*
> <Gheorghe-Teod.Bercea at ibm.com>>; Hal Finkel <*hfinkel at anl.gov*
> <hfinkel at anl.gov>>; '*jbeyer at nvidia.com* <jbeyer at nvidia.com>' <
> *jbeyer at nvidia.com* <jbeyer at nvidia.com>>; 'Jeeva Paudel' <
> *pjeeva01 at ca.ibm.com* <pjeeva01 at ca.ibm.com>>; 'Jeff Heath' <
> *jrheath at ca.ibm.com* <jrheath at ca.ibm.com>>; Jeffrey Sandoval <
> *sandoval at cray.com* <sandoval at cray.com>>; Jones, Jeff C <
> *jeff.c.jones at intel.com* <jeff.c.jones at intel.com>>; '
> *josem at udel.edu* <josem at udel.edu>' <*josem at udel.edu*
> <josem at udel.edu>>; 'Kelvin Li' <*kli at ca.ibm.com* <kli at ca.ibm.com>>;
> 'Kevin K O'Brien' <*caomhin at us.ibm.com* <caomhin at us.ibm.com>>; '
> *khaldi.dounia at gmail.com* <khaldi.dounia at gmail.com>' <
> *khaldi.dounia at gmail.com* <khaldi.dounia at gmail.com>>; 'Kotsifakou,
> Maria' <*kotsifa2 at illinois.edu* <kotsifa2 at illinois.edu>>;
> Krishnaiyer, Rakesh <*rakesh.krishnaiyer at intel.com*
> <rakesh.krishnaiyer at intel.com>>; Lieberman, Ron <
> *Ron.Lieberman at amd.com* <Ron.Lieberman at amd.com>>; 'Lopez, Matthew
> Graham' <*lopezmg at ornl.gov* <lopezmg at ornl.gov>>; '*lopezmg at ornl.org*
> <lopezmg at ornl.org>' <*lopezmg at ornl.org* <lopezmg at ornl.org>>;
> 'Martin Kong' <*martin.richard.kong at gmail.com*
> <martin.richard.kong at gmail.com>>; 'Matt Martineau' <
> *m.martineau at bristol.ac.uk* <m.martineau at bristol.ac.uk>>; Menard,
> Lorri <*lorri.menard at intel.com* <lorri.menard at intel.com>>;
> Monteleone, Robert <*robert.monteleone at intel.com*
> <robert.monteleone at intel.com>>; *oscar at ornl.gov* <oscar at ornl.gov>;
> Rao, Premanand M <*premanand.m.rao at intel.com*
> <premanand.m.rao at intel.com>>; Rice, Michael P <
> *michael.p.rice at intel.com* <michael.p.rice at intel.com>>; Robichaux,
> Joseph <*joseph.robichaux at intel.com* <joseph.robichaux at intel.com>>;
> *gregory.rodgers at amd.com* <gregory.rodgers at amd.com>; Rokos,
> Georgios <*georgios.rokos at intel.com* <georgios.rokos at intel.com>>; '
> *samuel.antao at ibm.com* <samuel.antao at ibm.com>' <
> *samuel.antao at ibm.com* <samuel.antao at ibm.com>>; 'Sarah McNamara' <
> *mcnamara at ca.ibm.com* <mcnamara at ca.ibm.com>>; '
> *sergey.y.ostanevich at gmail.com* <sergey.y.ostanevich at gmail.com>' <
> *sergey.y.ostanevich at gmail.com* <sergey.y.ostanevich at gmail.com>>;
> 'Sergio Pino Gallardo' <*sergiop at udel.edu* <sergiop at udel.edu>>;
> 'Sharif, Hashim' <*hsharif3 at illinois.edu* <hsharif3 at illinois.edu>>;
> Sjodin, Jan <*Jan.Sjodin at amd.com* <Jan.Sjodin at amd.com>>; Sunil
> Shrestha <*sshrestha at cray.com* <sshrestha at cray.com>>; 'Sunita
> Chandrasekaran' <*schandra at udel.edu* <schandra at udel.edu>>; Tian,
> Xinmin <*xinmin.tian at intel.com* <xinmin.tian at intel.com>>; Tianyi
> Zhang <*tzhan18 at lsu.edu* <tzhan18 at lsu.edu>>; '*vadve at illinois.edu*
> <vadve at illinois.edu>' <*vadve at illinois.edu* <vadve at illinois.edu>>;
> 'Wael Yehia' <*wyehia at ca.ibm.com* <wyehia at ca.ibm.com>>; 'Wang Chen'
> <*wdchen at ca.ibm.com* <wdchen at ca.ibm.com>>; Wilmarth, Terry L <
> *terry.l.wilmarth at intel.com* <terry.l.wilmarth at intel.com>>
> * Subject:* Re: Comparison of 2 schemes to implement OpenMP 5.0 declare
> mapper codegen
>
> Hi Deepak,
>
> Yes, it handles this case. The first part of mapper function
> (initially allocate space for the whole array) is just an optimization, not
> required for correctness, as suggested by you in an early discussion.
>
> In your example, s.x and s.p will be allocated separately (not in a
> single allocation). But Clang guarantees that their addresses will be
> correct because s.x and s.p share the same base address, which is &s.
>
> Thanks,
> Lingda Li
> ------------------------------
> *From:* Deepak Eachempati <*deachempat at cray.com*
> <deachempat at cray.com>>
> * Sent:* Thursday, June 27, 2019 2:49 PM
> * To:* Li, Lingda; Narayanaswamy, Ravi; 'Alexandre Eichenberger'; 'Alexey
> Bataev'; Chapman, Barbara (Contact); Bobrovsky, Konstantin S; 'Carlo
> Bertolli'; 'Chan, SiuChi'; Cownie, James H; David Oehmke; 'Denny, Joel E.';
> Dmitriev, Serguei N; Doerfert, Johannes Rudolf ; '*estotzer at ti.com*
> <estotzer at ti.com>'; 'Ettore Tiotto'; '*fraggamuffin at gmail.com*
> <fraggamuffin at gmail.com>'; 'Gheorghe-Teod Bercea'; Hal Finkel; '
> *jbeyer at nvidia.com* <jbeyer at nvidia.com>'; 'Jeeva Paudel'; 'Jeff
> Heath'; Jeffrey Sandoval; Jones, Jeff C; '*josem at udel.edu*
> <josem at udel.edu>'; 'Kelvin Li'; 'Kevin K O'Brien'; '
> *khaldi.dounia at gmail.com* <khaldi.dounia at gmail.com>'; 'Kotsifakou,
> Maria'; Krishnaiyer, Rakesh; Lieberman, Ron ; 'Lopez, Matthew Graham'; '
> *lopezmg at ornl.org* <lopezmg at ornl.org>'; 'Martin Kong'; 'Matt
> Martineau'; Menard, Lorri; Monteleone, Robert; *oscar at ornl.gov*
> <oscar at ornl.gov>; Rao, Premanand M; Rice, Michael P; Robichaux,
> Joseph; *gregory.rodgers at amd.com* <gregory.rodgers at amd.com>; Rokos,
> Georgios; '*samuel.antao at ibm.com* <samuel.antao at ibm.com>'; 'Sarah
> McNamara'; '*sergey.y.ostanevich at gmail.com*
> <sergey.y.ostanevich at gmail.com>'; 'Sergio Pino Gallardo'; 'Sharif,
> Hashim'; Sjodin, Jan ; Sunil Shrestha; 'Sunita Chandrasekaran'; Tian,
> Xinmin; Tianyi Zhang; '*vadve at illinois.edu* <vadve at illinois.edu>';
> 'Wael Yehia'; 'Wang Chen'; Wilmarth, Terry L
> * Subject:* RE: Comparison of 2 schemes to implement OpenMP 5.0 declare
> mapper codegen
>
> For Scheme 1, it looks like you are doing separate maps for each
> component when size == 1. It seems like the first and last if statements
> should have “size >= 1” rather than “size > 1”.
>
> If the mapper is declared like this:
>
> struct S {
> int x;
> ... // other stuff
> int *p;
> };
>
> #pragma omp declare mapper(S s) map(s.x) map(s.p[0:s.x])
>
> And you have:
>
> S s;
> ...
> #pragma omp target map(s)
> {
> for (int i = 0; i < s.x; i++) s.p[i] = i;
> }
>
> Since the target construct is just mapping a single structure of
> type S, there should be one map that takes care of mapping storage for s.x
> and s.p with a single allocation, and a separate map for the array section
> s.p[0:s.x], and finally the pointer attachment of s.p to s.p[0:s.x]. Does
> Scheme 1 handle this?
>
> -- Deepak
>
>
> *From:* Li, Lingda [*mailto:lli at bnl.gov* <lli at bnl.gov>]
> * Sent:* Thursday, June 27, 2019 1:07 PM
> * To:* Narayanaswamy, Ravi <*ravi.narayanaswamy at intel.com*
> <ravi.narayanaswamy at intel.com>>; 'Alexandre Eichenberger' <
> *alexe at us.ibm.com* <alexe at us.ibm.com>>; 'Alexey Bataev' <
> *Alexey.Bataev at ibm.com* <Alexey.Bataev at ibm.com>>; Chapman, Barbara
> (Contact) <*barbara.chapman at stonybrook.edu*
> <barbara.chapman at stonybrook.edu>>; Bobrovsky, Konstantin S <
> *konstantin.s.bobrovsky at intel.com*
> <konstantin.s.bobrovsky at intel.com>>; 'Carlo Bertolli' <
> *cbertol at us.ibm.com* <cbertol at us.ibm.com>>; 'Chan, SiuChi' <
> *siuchi.chan at amd.com* <siuchi.chan at amd.com>>; Cownie, James H <
> *james.h.cownie at intel.com* <james.h.cownie at intel.com>>; David
> Oehmke <*doehmke at cray.com* <doehmke at cray.com>>; Deepak Eachempati <
> *deachempat at cray.com* <deachempat at cray.com>>; 'Denny, Joel E.' <
> *dennyje at ornl.gov* <dennyje at ornl.gov>>; Dmitriev, Serguei N <
> *serguei.n.dmitriev at intel.com* <serguei.n.dmitriev at intel.com>>;
> Doerfert, Johannes Rudolf <*jdoerfert at anl.gov* <jdoerfert at anl.gov>>;
> '*estotzer at ti.com* <estotzer at ti.com>' <*estotzer at ti.com*
> <estotzer at ti.com>>; 'Ettore Tiotto' <*etiotto at ca.ibm.com*
> <etiotto at ca.ibm.com>>; '*fraggamuffin at gmail.com*
> <fraggamuffin at gmail.com>' <*fraggamuffin at gmail.com*
> <fraggamuffin at gmail.com>>; 'Gheorghe-Teod Bercea' <
> *Gheorghe-Teod.Bercea at ibm.com* <Gheorghe-Teod.Bercea at ibm.com>>; Hal
> Finkel <*hfinkel at anl.gov* <hfinkel at anl.gov>>; '*jbeyer at nvidia.com*
> <jbeyer at nvidia.com>' <*jbeyer at nvidia.com* <jbeyer at nvidia.com>>;
> 'Jeeva Paudel' <*pjeeva01 at ca.ibm.com* <pjeeva01 at ca.ibm.com>>; 'Jeff
> Heath' <*jrheath at ca.ibm.com* <jrheath at ca.ibm.com>>; Jeffrey
> Sandoval <*sandoval at cray.com* <sandoval at cray.com>>; Jones, Jeff C <
> *jeff.c.jones at intel.com* <jeff.c.jones at intel.com>>; '
> *josem at udel.edu* <josem at udel.edu>' <*josem at udel.edu*
> <josem at udel.edu>>; 'Kelvin Li' <*kli at ca.ibm.com* <kli at ca.ibm.com>>;
> 'Kevin K O'Brien' <*caomhin at us.ibm.com* <caomhin at us.ibm.com>>; '
> *khaldi.dounia at gmail.com* <khaldi.dounia at gmail.com>' <
> *khaldi.dounia at gmail.com* <khaldi.dounia at gmail.com>>; 'Kotsifakou,
> Maria' <*kotsifa2 at illinois.edu* <kotsifa2 at illinois.edu>>;
> Krishnaiyer, Rakesh <*rakesh.krishnaiyer at intel.com*
> <rakesh.krishnaiyer at intel.com>>; Lieberman, Ron <
> *Ron.Lieberman at amd.com* <Ron.Lieberman at amd.com>>; Li, Lingda <
> *lli at bnl.gov* <lli at bnl.gov>>; 'Lopez, Matthew Graham' <
> *lopezmg at ornl.gov* <lopezmg at ornl.gov>>; '*lopezmg at ornl.org*
> <lopezmg at ornl.org>' <*lopezmg at ornl.org* <lopezmg at ornl.org>>;
> 'Martin Kong' <*martin.richard.kong at gmail.com*
> <martin.richard.kong at gmail.com>>; 'Matt Martineau' <
> *m.martineau at bristol.ac.uk* <m.martineau at bristol.ac.uk>>; Menard,
> Lorri <*lorri.menard at intel.com* <lorri.menard at intel.com>>;
> Monteleone, Robert <*robert.monteleone at intel.com*
> <robert.monteleone at intel.com>>; *oscar at ornl.gov* <oscar at ornl.gov>;
> Rao, Premanand M <*premanand.m.rao at intel.com*
> <premanand.m.rao at intel.com>>; Rice, Michael P <
> *michael.p.rice at intel.com* <michael.p.rice at intel.com>>; Robichaux,
> Joseph <*joseph.robichaux at intel.com* <joseph.robichaux at intel.com>>;
> *gregory.rodgers at amd.com* <gregory.rodgers at amd.com>; Rokos,
> Georgios <*georgios.rokos at intel.com* <georgios.rokos at intel.com>>; '
> *samuel.antao at ibm.com* <samuel.antao at ibm.com>' <
> *samuel.antao at ibm.com* <samuel.antao at ibm.com>>; 'Sarah McNamara' <
> *mcnamara at ca.ibm.com* <mcnamara at ca.ibm.com>>; '
> *sergey.y.ostanevich at gmail.com* <sergey.y.ostanevich at gmail.com>' <
> *sergey.y.ostanevich at gmail.com* <sergey.y.ostanevich at gmail.com>>;
> 'Sergio Pino Gallardo' <*sergiop at udel.edu* <sergiop at udel.edu>>;
> 'Sharif, Hashim' <*hsharif3 at illinois.edu* <hsharif3 at illinois.edu>>;
> Sjodin, Jan <*Jan.Sjodin at amd.com* <Jan.Sjodin at amd.com>>; Sunil
> Shrestha <*sshrestha at cray.com* <sshrestha at cray.com>>; 'Sunita
> Chandrasekaran' <*schandra at udel.edu* <schandra at udel.edu>>; Tian,
> Xinmin <*xinmin.tian at intel.com* <xinmin.tian at intel.com>>; Tianyi
> Zhang <*tzhan18 at lsu.edu* <tzhan18 at lsu.edu>>; '*vadve at illinois.edu*
> <vadve at illinois.edu>' <*vadve at illinois.edu* <vadve at illinois.edu>>;
> 'Wael Yehia' <*wyehia at ca.ibm.com* <wyehia at ca.ibm.com>>; 'Wang Chen'
> <*wdchen at ca.ibm.com* <wdchen at ca.ibm.com>>; Wilmarth, Terry L <
> *terry.l.wilmarth at intel.com* <terry.l.wilmarth at intel.com>>
> * Subject:* Comparison of 2 schemes to implement OpenMP 5.0 declare
> mapper codegen
>
> Hi,
>
> Alexey and I would like to have your attention on an ongoing
> discussion of 2 schemes to implement the declare mapper in OpenMP 5.0. The
> detailed discussion can be found at
> *https://reviews.llvm.org/D59474*
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__reviews.llvm.org_D59474&d=DwMFaQ&c=aTOVZmpUfPKZuaG9NO7J7Mh6imZbfhL47t9CpZ-pCOw&r=RLUU7gQynM_GwGu2QR7zHw&m=0c8CuLZZzM3R7PecCmFPYLuPYEOtCJHYTIGjSgIPaWU&s=EVaPRpEtSzi0Y56zmjD5fXRzN87UZDOaYp5PY3TXiVQ&e=>
>
> Scheme 1 (the one has been implemented by me in
> *https://reviews.llvm.org/D59474*
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__reviews.llvm.org_D59474&d=DwMFaQ&c=aTOVZmpUfPKZuaG9NO7J7Mh6imZbfhL47t9CpZ-pCOw&r=RLUU7gQynM_GwGu2QR7zHw&m=0c8CuLZZzM3R7PecCmFPYLuPYEOtCJHYTIGjSgIPaWU&s=EVaPRpEtSzi0Y56zmjD5fXRzN87UZDOaYp5PY3TXiVQ&e=>
> ):
> The detailed design can be found at
> *https://github.com/lingda-li/public-sharing/blob/master/mapper_runtime_design.pptx*
> <https://github.com/lingda-li/public-sharing/blob/master/mapper_runtime_design.pptx>
> For each mapper function, the compiler generates a function like
> this:
>
> ```
> void <type>.mapper(void *base, void *begin, size_t size, int64_t
> type) {
> // Allocate space for an array section first.
> if (size > 1 && !maptype.IsDelete)
> <push>(base, begin, size*sizeof(Ty), clearToFrom(type));
>
> // Map members.
> for (unsigned i = 0; i < size; i++) {
> // For each component specified by this mapper:
> for (auto c : components) {
> ...; // code to generate c.arg_base, c.arg_begin, c.arg_size,
> c.arg_type
> if (c.hasMapper())
> (*c.Mapper())(c.arg_base, c.arg_begin, c.arg_size, c.arg_type);
> else
> <push>(c.arg_base, c.arg_begin, c.arg_size, c.arg_type);
> }
> }
> // Delete the array section.
> if (size > 1 && maptype.IsDelete)
> <push>(base, begin, size*sizeof(Ty), clearToFrom(type));
> }
> ```
> This function is passed to the OpenMP runtime, and the runtime will
> call this function to finish the data mapping.
>
>
> Scheme 2 (which Alexey proposes):
> Alexey proposes to move parts of the mapper function above into the
> OpenMP runtime, so the compiler will generate code below:
> ```
> void <type>.mapper(void *base, void *begin, size_t size, int64_t
> type) {
> ...; // code to generate arg_base, arg_begin, arg_size, arg_type,
> arg_mapper.
> auto sub_components[] = {...}; // fill in generated begin, base, ...
> __tgt_mapper(base, begin, size, type, sub_components);
> }
> ```
>
> `__tgt_mapper` is a runtime function as below:
> ```
> void __tgt_mapper(void *base, void *begin, size_t size, int64_t
> type, auto components[]) {
> // Allocate space for an array section first.
> if (size > 1 && !maptype.IsDelete)
> <push>(base, begin, size*sizeof(Ty), clearToFrom(type));
>
> // Map members.
> for (unsigned i = 0; i < size; i++) {
> // For each component specified by this mapper:
> for (auto c : components) {
> if (c.hasMapper())
> (*c.Mapper())(c.arg_base, c.arg_begin, c.arg_size, c.arg_type);
> else
> <push>(c.arg_base, c.arg_begin, c.arg_size, c.arg_type);
> }
> }
> // Delete the array section.
> if (size > 1 && maptype.IsDelete)
> <push>(base, begin, size*sizeof(Ty), clearToFrom(type));
> }
> ```
>
> Comparison:
> Why to choose 1 over 2:
> 1. In scheme 2, the compiler needs to generate all map types and
> pass them to __tgt_mapper through sub_components. But in this case, the
> compiler won't be able to generate the correct MEMBER_OF field in map type.
> As a result, the runtime has to fix it using the mechanism we already have
> here: __tgt_mapper_num_components. This not only increases complexity, but
> also, it means the runtime needs further manipulation of the map type,
> which creates locality issues. While in the current scheme, the map type is
> generated by compiler once, so the data locality will be very good in this
> case.
> 2. In scheme 2, sub_components includes all components that should
> be mapped. If we are mapping an array, this means we need to map many
> components, which will need to allocate memory for sub_components in the
> heap. This creates further memory management burden and is not an efficient
> way to use memory.
> 3. In scheme 1, we are able to inline nested mapper functions. As a
> result, the compiler can do further optimizations to optimize the mapper
> function, e.g., eliminate redundant computation, loop unrolling, and thus
> achieve potentially better performance. We cannot achieve these
> optimizations in scheme 2.
>
> Why to choose 2 over 1:
> 1. Less code in the mapper function codegen (I doubt this because
> the codegen function of scheme 1 uses less than 200 loc)
> Alexey may have other reasons.
>
> We will appreciate if you can share your thoughts.
>
> Thanks,
> Lingda Li
> ------------------------------
> *From:* Narayanaswamy, Ravi <*ravi.narayanaswamy at intel.com*
> <ravi.narayanaswamy at intel.com>>
> * Sent:* Wednesday, June 19, 2019 3:09 PM
> * To:* 'Alexandre Eichenberger'; 'Alexey Bataev'; '
> *barbara.chapman at stonybrook.edu* <barbara.chapman at stonybrook.edu>';
> Bobrovsky, Konstantin S; 'Carlo Bertolli'; 'Chan, SiuChi'; Cownie, James H;
> David Oehmke; Deepak Eachempati; 'Denny, Joel E.'; Dmitriev, Serguei N;
> Doerfert, Johannes Rudolf ; '*estotzer at ti.com* <estotzer at ti.com>';
> 'Ettore Tiotto'; '*fraggamuffin at gmail.com* <fraggamuffin at gmail.com>';
> 'Gheorghe-Teod Bercea'; '*hfinkel at anl.gov* <hfinkel at anl.gov>'; '
> *jbeyer at nvidia.com* <jbeyer at nvidia.com>'; 'Jeeva Paudel'; 'Jeff
> Heath'; Jeffrey Sandoval; Jones, Jeff C; '*josem at udel.edu*
> <josem at udel.edu>'; 'Kelvin Li'; 'Kevin K O'Brien'; '
> *khaldi.dounia at gmail.com* <khaldi.dounia at gmail.com>'; 'Kotsifakou,
> Maria'; Krishnaiyer, Rakesh; Lieberman, Ron ; '*lli at bnl.gov*
> <lli at bnl.gov>'; 'Lopez, Matthew Graham'; '*lopezmg at ornl.org*
> <lopezmg at ornl.org>'; 'Martin Kong'; 'Matt Martineau'; Menard,
> Lorri; Monteleone, Robert; Narayanaswamy, Ravi; 'Oscar R. Hernandez'; Rao,
> Premanand M; Rice, Michael P; Robichaux, Joseph; Rodgers, Gregory; Rokos,
> Georgios; '*samuel.antao at ibm.com* <samuel.antao at ibm.com>'; 'Sarah
> McNamara'; '*sergey.y.ostanevich at gmail.com*
> <sergey.y.ostanevich at gmail.com>'; 'Sergio Pino Gallardo'; 'Sharif,
> Hashim'; Sjodin, Jan ; Sunil Shrestha (*sshrestha at cray.com*
> <sshrestha at cray.com>); 'Sunita Chandrasekaran'; Tian, Xinmin;
> Tianyi Zhang; '*vadve at illinois.edu* <vadve at illinois.edu>'; 'Wael
> Yehia'; 'Wang Chen'; Wilmarth, Terry L
> * Subject:* OpenMP / HPC in Clang / LLVM Multi-company Telecom Meeting
> Minutes June 19th 2019
>
> *Next Meeting : July 10**th** (Moved from July 3**rd**)*
>
> *Opens :*
> *- Documentation*
> - Greg : Can we have documents for libopenmp and Libomptarget.
> - Alexey suggested having 3 documents: libopenmp, Libomptarget and
> device plugin
> - Hal will convert the existing libomptarget document. Once done
> others can update document to capture the existing implementation
> Future addition to libomptarget will also require update to
> document.
> - Next libopenmp document will be created if it does not exist or
> updated if one exists.
>
> *LTO for fat binary linking*
> - Serguei (Intel) has an implementation which enables LTO and doing
> away with linker scripts.
> Everybody agreed this is a good idea, especially some linkers don’t
> have support for linker scripts.
> AMD is interested in enabling enabling LTO and will like to see the
> code
> Serguei to post the code to get feedback from all
> - Hal to present in next meeting his proposal to support static fat
> archives using LTO.
>
> *OpenMP 5.0 Features*
> - No update on setting up the public website. Johannes was out
> attending ISC.
> - New features added since last release (courtesy of Kelvin)
> - allocate clause/allocate directive - parsing+sema, codegen
> - mutexinout dependence-type for task
> - user-defined mapper (declare mapper) - parsing+sema.
> - omp_get_device_num() API routine
>
> *Development Activity*
> *- Async API*
> Support in Clang and libopenmp including lit test had been checked
> in by Doru
>
> *- Mapper support*
> Initial support for Mapper has been posted for review Lingda. Once
> approved, the rest of the support will be done
> Lingda : Should the old API being replaced by the new similar API
> with extra mapper argument be obsoleted
> Suggestion was for clang to not generated but keep the API in
> libomptarget for backward compatible. In the future it can be obsoleted
>
> *- Required Directives*
> Support for required directives has been checked in by Doru.
> There was one issue with checking for requires directive and
> confirming it the Declare type is TO or LINK.
> Doru removed the check and added note to make sure if things change
> in future need to modify this code.
>
> *Roll Call :*
>
> *Company* *Attendees*
> *19-Jun*
> *AMD*
> Greg Rodgers
> x
> Ashwin Aji
> Jan Sjodin
> x
> Ron Lieberman
> x
> sameer Sahasrabuddhe
> Andrey Kasaurov
> *ANL * Hal Finkel
> x
> Johannes Doerfert
> *IBM* Alexandre Eichenberger
> Carlo Bertolli
> Kelvin Li
> Doru
> x
> Alexey Bataev
> x
> *Intel* Andrey Churbanov
> Ravi Narayanaswamy
> x
> Serguei Dmitriev
> x
> Rajiv Deodhar
> Lorri Menard
> Terry Wilmarth
> Rao, Prem
> Hansang Bae
> George Rokos
> x
> *Cray* Deepak Eachempati
> x
> *Micron* John Leidel
> *Nvidia* James Beyer
> x
> *ORNL* Graham Lopez
> Joel Denny
> Geoffroy Vallee
> Oscar Hernandez
> *SBU/BNL* Lingda Li
> x
> Jose Monlsave
> Martin Kong
> *TI* Eric Stotzer
> *U of Bristol* Mat Martineau
> *U of Delaware* Sunita Chandrasekaran
> *U of Illinois* Hashim Sharif
> *Rice* John Mellor-Crummey
> *LSU* Tianyi Zhang
>
>
>
>
>
> .........................................................................................................................................
> à*Join Skype Meeting*
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__meet.intel.com_ravi.narayanaswamy_DK7943NR&d=DwMFaQ&c=aTOVZmpUfPKZuaG9NO7J7Mh6imZbfhL47t9CpZ-pCOw&r=RLUU7gQynM_GwGu2QR7zHw&m=0c8CuLZZzM3R7PecCmFPYLuPYEOtCJHYTIGjSgIPaWU&s=K4msFCmDvK4n0MdVQd7UTXRRvRkaNwLzMaP8fnX0iOg&e=>
> Trouble Joining? *Try Skype Web App*
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__meet.intel.com_ravi.narayanaswamy_DK7943NR-3Fsl-3D1&d=DwMFaQ&c=aTOVZmpUfPKZuaG9NO7J7Mh6imZbfhL47t9CpZ-pCOw&r=RLUU7gQynM_GwGu2QR7zHw&m=0c8CuLZZzM3R7PecCmFPYLuPYEOtCJHYTIGjSgIPaWU&s=krI3wEp2z8GhcZt6feFq3WgaBjcEoTDRk-GvI1BIdO8&e=>
> Join by phone
> *+1(916)356-2663 (or your local bridge access #) Choose bridge 5.*
> <+1(916)356-2663%20(or%20your%20local%20bridge%20access%20#)%20Choose%20bridge%205.>
> (Global) English (United States)
> *Find a local number*
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__dial.intel.com&d=DwMFaQ&c=aTOVZmpUfPKZuaG9NO7J7Mh6imZbfhL47t9CpZ-pCOw&r=RLUU7gQynM_GwGu2QR7zHw&m=0c8CuLZZzM3R7PecCmFPYLuPYEOtCJHYTIGjSgIPaWU&s=g2dQtoTqaRXyBMaIUpfyoPFDRTtrQbgbWbb9b90tgBg&e=>
>
> Conference ID: 7607896966
> *Forgot your dial-in PIN?*
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__dial.intel.com&d=DwMFaQ&c=aTOVZmpUfPKZuaG9NO7J7Mh6imZbfhL47t9CpZ-pCOw&r=RLUU7gQynM_GwGu2QR7zHw&m=0c8CuLZZzM3R7PecCmFPYLuPYEOtCJHYTIGjSgIPaWU&s=g2dQtoTqaRXyBMaIUpfyoPFDRTtrQbgbWbb9b90tgBg&e=>
> |*Help*
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__o15.officeredir.microsoft.com_r_rlidLync15-3Fclid-3D1033-26p1-3D5-26p2-3D2009&d=DwMFaQ&c=aTOVZmpUfPKZuaG9NO7J7Mh6imZbfhL47t9CpZ-pCOw&r=RLUU7gQynM_GwGu2QR7zHw&m=0c8CuLZZzM3R7PecCmFPYLuPYEOtCJHYTIGjSgIPaWU&s=6OCBXxzOIJfra2Pewq_p-l2pY3MyKnuG-TLr7M1xq-s&e=>
>
> [!OC([1033])!]
>
> .........................................................................................................................................
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190628/85b384e8/attachment.html>
More information about the cfe-dev
mailing list