[cfe-dev] Comparison of 2 schemes to implement OpenMP 5.0 declare mapper codegen

Fri Jun 28 07:00:35 PDT 2019

Hi Lingda, thanks for your comments.
We can allocate the buffer either by allocating it on the stack or calling
OpenMP allocate function.
With this solution, we allocate memory only once (no need to resize buffer
after push_backs) and we do not need to call the runtime function to put
map data to the buffer, compiler generated code can do it.
But anyway, I agree, it would be good to hear some other opinions.
--------------
Best regards,
Alexey Bataev

From:	"Li, Lingda" <lli at bnl.gov>
To:	Alexey Bataev <Alexey.Bataev at ibm.com>
Cc:	Alexandre Eichenberger <alexe at us.ibm.com>, "Chapman, Barbara
            (Contact)" <barbara.chapman at stonybrook.edu>, Kevin K O'Brien
            <caomhin at us.ibm.com>, "Carlo Bertolli" <cbertol at us.ibm.com>,
            Deepak Eachempati <deachempat at cray.com>, "Denny, Joel E."
            <dennyje at ornl.gov>, David Oehmke <doehmke at cray.com>, "Ettore
            Tiotto" <etiotto at ca.ibm.com>, "fraggamuffin at gmail.com"
            <fraggamuffin at gmail.com>, "Rokos, Georgios"
            <georgios.rokos at intel.com>, Gheorghe-Teod Bercea
            <Gheorghe-Teod.Bercea at ibm.com>, "gregory.rodgers at amd.com"
            <gregory.rodgers at amd.com>, Hal Finkel <hfinkel at anl.gov>,
            "Sharif, Hashim" <hsharif3 at illinois.edu>, "Cownie, James H"
            <james.h.cownie at intel.com>, "Sjodin, Jan" <Jan.Sjodin at amd.com>,
            "jbeyer at nvidia.com" <jbeyer at nvidia.com>, "Doerfert, Johannes
            Rudolf" <jdoerfert at anl.gov>, "Jones, Jeff C"
            <jeff.c.jones at intel.com>, "josem at udel.edu" <josem at udel.edu>,
            "Robichaux, Joseph" <joseph.robichaux at intel.com>, Jeff Heath
            <jrheath at ca.ibm.com>, "khaldi.dounia at gmail.com"
            <khaldi.dounia at gmail.com>, Kelvin Li <kli at ca.ibm.com>,
            "Bobrovsky, Konstantin S" <konstantin.s.bobrovsky at intel.com>,
            "Kotsifakou, Maria" <kotsifa2 at illinois.edu>, "lopezmg at ornl.org"
            <lopezmg at ornl.org>, "Lopez, Matthew Graham" <lopezmg at ornl.gov>,
            "Menard, Lorri" <lorri.menard at intel.com>, Martin Kong
            <martin.richard.kong at gmail.com>, Sarah McNamara
            <mcnamara at ca.ibm.com>, "Rice, Michael P"
            <michael.p.rice at intel.com>, "Matt Martineau"
            <m.martineau at bristol.ac.uk>, "oscar at ornl.gov" <oscar at ornl.gov>,
            Jeeva Paudel <pjeeva01 at ca.ibm.com>, "Rao, Premanand M"
            <premanand.m.rao at intel.com>, "Krishnaiyer, Rakesh"
            <rakesh.krishnaiyer at intel.com>, "Narayanaswamy, Ravi"
            <ravi.narayanaswamy at intel.com>, "Monteleone, Robert"
            <robert.monteleone at intel.com>, "Lieberman, Ron"
            <Ron.Lieberman at amd.com>, Samuel Antao <Samuel.Antao at ibm.com>,
            Jeffrey Sandoval <sandoval at cray.com>, Sunita Chandrasekaran
            <schandra at udel.edu>, "sergey.y.ostanevich at gmail.com"
            <sergey.y.ostanevich at gmail.com>, Sergio Pino Gallardo
            <sergiop at udel.edu>, "Dmitriev, Serguei N"
            <serguei.n.dmitriev at intel.com>, "Chan, SiuChi"
            <siuchi.chan at amd.com>, Sunil Shrestha <sshrestha at cray.com>,
            "Wilmarth, Terry L" <terry.l.wilmarth at intel.com>, Tianyi Zhang
            <tzhan18 at lsu.edu>, "vadve at illinois.edu" <vadve at illinois.edu>,
            Wang Chen <wdchen at ca.ibm.com>, Wael Yehia <wyehia at ca.ibm.com>,
            "Tian, Xinmin" <xinmin.tian at intel.com>, "Li, Lingda (Contact)"
            <lildmh at gmail.com>
Date:	06/28/2019 09:49 AM
Subject:	[EXTERNAL] Re:  Re:  Re:  RE: Comparison of 2 schemes to
            implement OpenMP 5.0 declare mapper codegen

I don't think we can have the buffer allocated within the mapper function.
It has to be done in the runtime, because of nested mappers.
First, all mapper functions are born in the same way. We cannot make the
outer most mapper function allocate memory, whether the inner one doesn't
and has to use what is allocated by the outer most mapper function.
I suppose we still need to allocate memory in the runtime, so the runtime
can pass the pointer and size to the mapper function, and the outer mapper
function can then pass them into inner ones.
Again, this is just like the current implementation, except that we don't
use vecter::push_back(), instead we use something like a manual
implementation of vector::push_back() (because we need to use the pointer
and the current index)

I believe the key question here is whether it is true that (the overhead of
push_back() > the overhead of precalculating the total number + the memory
allocation overhead + directly memory write). This will decide whether this
change is necessary. Any opinions?

Thanks,
Lingda Li

From: Alexey Bataev <Alexey.Bataev at ibm.com>
Sent: Thursday, June 27, 2019 5:05 PM
To: Li, Lingda
Cc: Alexandre Eichenberger; Chapman, Barbara (Contact); Kevin K O'Brien;
Carlo Bertolli; Deepak Eachempati; Denny, Joel E.; David Oehmke; Ettore
Tiotto; fraggamuffin at gmail.com; Rokos, Georgios; Gheorghe-Teod Bercea;
gregory.rodgers at amd.com; Hal Finkel; Sharif, Hashim; Cownie, James H;
Sjodin, Jan; jbeyer at nvidia.com; Doerfert, Johannes Rudolf; Jones, Jeff C;
josem at udel.edu; Robichaux, Joseph; Jeff Heath; khaldi.dounia at gmail.com;
Kelvin Li; Bobrovsky, Konstantin S; Kotsifakou, Maria; lopezmg at ornl.org;
Lopez, Matthew Graham; Menard, Lorri; Martin Kong; Sarah McNamara; Rice,
Michael P; Matt Martineau; oscar at ornl.gov; Jeeva Paudel; Rao, Premanand M;
Krishnaiyer, Rakesh; Narayanaswamy, Ravi; Monteleone, Robert; Lieberman,
Ron; Samuel Antao; Jeffrey Sandoval; Sunita Chandrasekaran;
sergey.y.ostanevich at gmail.com; Sergio Pino Gallardo; Dmitriev, Serguei N;
Chan, SiuChi; Sunil Shrestha; Wilmarth, Terry L; Tianyi Zhang;
vadve at illinois.edu; Wang Chen; Wael Yehia; Tian, Xinmin
Subject: Re: Re: Re: RE: Comparison of 2 schemes to implement OpenMP 5.0
declare mapper codegen

Yes, we need 2 functions, but thw first one can be optimized very
effectively. After the optimizations and inlining it will end up with just
return s1+s2+s3... I think, inost cases those sizes will be constant, since
the mapper maps constant number of elements. And, thus, this expression
will be optimized to just a constant value.
You don't need to pass these functions to runtime. We can call the directly
from the compiler.
1st call: get number of elements.
2nd: allocate the buffer
3rd call: call mapper with this preallocated buffer that fills this buffer
without any calls of the runtime functions.
4th call: call the runtime to pass the buffer to the runtime.

Best regards,
Alexey Bataev

27 июня 2019 г., в 16:53, Li, Lingda <lli at bnl.gov> написал(а):

      If we precalculate the size, first, it means we need to generate 2
      functions for each mapper, rather than 1 now. One for mapping
      information filling as we have, the other for size calculation (This
      will not return constant values, because size depends on how many
      instances we are mapping). Both these 2 functions will need to be
      passed to the runtime. The runtime will need to precalculate the
      number of components first, then allocate memory, then call the
      mapper function to fill it up.

      Compared with the scheme 1, the differences are:
      1) An extra call to calculate the total number, while scheme 1 does
      not;
      2) A preallocated buffer, whose pointer and the current number should
      be passed to the mapper function, then the mapper function uses them
      to fill components, while scheme 1 uses push_back() to do the same
      thing.

      Is there really a benefit doing this? push_back() should be efficient
      enough compared with directly writing to memory.

      If people here think that, the overhead of push_back() > the overhead
      of precalculating the total number + the memory allocation overhead +
      directly memory write, then we can consider this scheme.

      Thanks,
      Lingda Li

      From: Alexey Bataev <Alexey.Bataev at ibm.com>
      Sent: Thursday, June 27, 2019 4:26 PM
      To: Li, Lingda
      Cc: Alexandre Eichenberger; Chapman, Barbara (Contact); Kevin K
      O'Brien; Carlo Bertolli; Deepak Eachempati; Denny, Joel E.; David
      Oehmke; Ettore Tiotto; fraggamuffin at gmail.com; Rokos, Georgios;
      Gheorghe-Teod Bercea; gregory.rodgers at amd.com; Hal Finkel; Sharif,
      Hashim; Cownie, James H; Sjodin, Jan; jbeyer at nvidia.com; Doerfert,
      Johannes Rudolf; Jones, Jeff C; josem at udel.edu; Robichaux, Joseph;
      Jeff Heath; khaldi.dounia at gmail.com; Kelvin Li; Bobrovsky, Konstantin
      S; Kotsifakou, Maria; lopezmg at ornl.org; Lopez, Matthew Graham;
      Menard, Lorri; Martin Kong; Sarah McNamara; Rice, Michael P; Matt
      Martineau; oscar at ornl.gov; Jeeva Paudel; Rao, Premanand M;
      Krishnaiyer, Rakesh; Narayanaswamy, Ravi; Monteleone, Robert;
      Lieberman, Ron; Samuel Antao; Jeffrey Sandoval; Sunita
      Chandrasekaran; sergey.y.ostanevich at gmail.com; Sergio Pino Gallardo;
      Dmitriev, Serguei N; Chan, SiuChi; Sunil Shrestha; Wilmarth, Terry L;
      Tianyi Zhang; vadve at illinois.edu; Wang Chen; Wael Yehia; Tian, Xinmin
      Subject: Re: Re: RE: Comparison of 2 schemes to implement OpenMP 5.0
      declare mapper codegen

      If the functions are inlined (the ines, intended for size
      precalculation). They can be optimized out very effectively since in
      most cases they will return constant values.
      If we could do this, we won't need vectors and oush_backs, we can use
      preallocated memory and internal counter.
      --------------
      Best regards,
      Alexey Bataev

      <graycol.gif>"Li, Lingda" ---06/27/2019 04:13:03 PM---Hi Alexey, I
      think that's why we choose to use variable size storage like
      std::vector to store the m

      From: "Li, Lingda" <lli at bnl.gov>
      To: Alexey Bataev <Alexey.Bataev at ibm.com>, Deepak Eachempati <
      deachempat at cray.com>
      Cc: "Narayanaswamy, Ravi" <ravi.narayanaswamy at intel.com>, "Alexandre
      Eichenberger" <alexe at us.ibm.com>, "Chapman, Barbara (Contact)" <
      barbara.chapman at stonybrook.edu>, "Bobrovsky, Konstantin S" <
      konstantin.s.bobrovsky at intel.com>, Carlo Bertolli <cbertol at us.ibm.com
      >, "Chan, SiuChi" <siuchi.chan at amd.com>, "Cownie, James H" <
      james.h.cownie at intel.com>, David Oehmke <doehmke at cray.com>, "Denny,
      Joel E." <dennyje at ornl.gov>, "Dmitriev, Serguei N" <
      serguei.n.dmitriev at intel.com>, "Doerfert, Johannes Rudolf" <
      jdoerfert at anl.gov>, Ettore Tiotto <etiotto at ca.ibm.com>, "
      fraggamuffin at gmail.com" <fraggamuffin at gmail.com>, Gheorghe-Teod
      Bercea <Gheorghe-Teod.Bercea at ibm.com>, Hal Finkel <hfinkel at anl.gov>,
      "jbeyer at nvidia.com" <jbeyer at nvidia.com>, Jeeva Paudel <
      pjeeva01 at ca.ibm.com>, Jeff Heath <jrheath at ca.ibm.com>, Jeffrey
      Sandoval <sandoval at cray.com>, "Jones, Jeff C" <jeff.c.jones at intel.com
      >, "josem at udel.edu" <josem at udel.edu>, Kelvin Li <kli at ca.ibm.com>,
      "Kevin K O'Brien" <caomhin at us.ibm.com>, "khaldi.dounia at gmail.com" <
      khaldi.dounia at gmail.com>, "Kotsifakou, Maria" <kotsifa2 at illinois.edu
      >, "Krishnaiyer, Rakesh" <rakesh.krishnaiyer at intel.com>, "Lieberman,
      Ron" <Ron.Lieberman at amd.com>, "Lopez, Matthew Graham" <
      lopezmg at ornl.gov>, "lopezmg at ornl.org" <lopezmg at ornl.org>, Martin Kong
      <martin.richard.kong at gmail.com>, Matt Martineau <
      m.martineau at bristol.ac.uk>, "Menard, Lorri" <lorri.menard at intel.com>,
      "Monteleone, Robert" <robert.monteleone at intel.com>, "oscar at ornl.gov"
      <oscar at ornl.gov>, "Rao, Premanand M" <premanand.m.rao at intel.com>,
      "Rice, Michael P" <michael.p.rice at intel.com>, "Robichaux, Joseph" <
      joseph.robichaux at intel.com>, "gregory.rodgers at amd.com" <
      gregory.rodgers at amd.com>, "Rokos, Georgios" <georgios.rokos at intel.com
      >, Samuel Antao <Samuel.Antao at ibm.com>, "Sarah McNamara" <
      mcnamara at ca.ibm.com>, "sergey.y.ostanevich at gmail.com" <
      sergey.y.ostanevich at gmail.com>, Sergio Pino Gallardo <
      sergiop at udel.edu>, "Sharif, Hashim" <hsharif3 at illinois.edu>, "Sjodin,
      Jan" <Jan.Sjodin at amd.com>, Sunil Shrestha <sshrestha at cray.com>,
      Sunita Chandrasekaran <schandra at udel.edu>, "Tian, Xinmin" <
      xinmin.tian at intel.com>, Tianyi Zhang <tzhan18 at lsu.edu>, "
      vadve at illinois.edu" <vadve at illinois.edu>, Wael Yehia <
      wyehia at ca.ibm.com>, Wang Chen <wdchen at ca.ibm.com>, "Wilmarth, Terry
      L" <terry.l.wilmarth at intel.com>
      Date: 06/27/2019 04:13 PM
      Subject: [EXTERNAL] Re: RE: Comparison of 2 schemes to implement
      OpenMP 5.0 declare mapper codegen

      Hi Alexey,

      I think that's why we choose to use variable size storage like
      std::vector to store the mapping information at the first place,
      right? It'll be costly to precalculate the total number of
      components, especially in the presence of nested mappers. Besides, a
      runtime function call is just a std::vector::push, so I think it's
      okay to have multiple function calls.

      Thanks,
      Lingda Li

      From: Alexey Bataev <Alexey.Bataev at ibm.com>
      Sent: Thursday, June 27, 2019 3:52 PM
      To: Deepak Eachempati
      Cc: Li, Lingda; Narayanaswamy, Ravi; Alexandre Eichenberger; Chapman,
      Barbara (Contact); Bobrovsky, Konstantin S; Carlo Bertolli; Chan,
      SiuChi; Cownie, James H; David Oehmke; Denny, Joel E.; Dmitriev,
      Serguei N; Doerfert, Johannes Rudolf; Ettore Tiotto;
      fraggamuffin at gmail.com; Gheorghe-Teod Bercea; Hal Finkel;
      jbeyer at nvidia.com; Jeeva Paudel; Jeff Heath; Jeffrey Sandoval; Jones,
      Jeff C; josem at udel.edu; Kelvin Li; Kevin K O'Brien;
      khaldi.dounia at gmail.com; Kotsifakou, Maria; Krishnaiyer, Rakesh;
      Lieberman, Ron; Lopez, Matthew Graham; lopezmg at ornl.org; Martin Kong;
      Matt Martineau; Menard, Lorri; Monteleone, Robert; oscar at ornl.gov;
      Rao, Premanand M; Rice, Michael P; Robichaux, Joseph;
      gregory.rodgers at amd.com; Rokos, Georgios; Samuel Antao; Sarah
      McNamara; sergey.y.ostanevich at gmail.com; Sergio Pino Gallardo;
      Sharif, Hashim; Sjodin, Jan; Sunil Shrestha; Sunita Chandrasekaran;
      Tian, Xinmin; Tianyi Zhang; vadve at illinois.edu; Wael Yehia; Wang
      Chen; Wilmarth, Terry L
      Subject: Re: RE: Comparison of 2 schemes to implement OpenMP 5.0
      declare mapper codegen

      Lingda, can we in scheme 1 precalculate the total number of
      components, allocate memory for these precalculate number of
      elements, then fill it with mappers and only after that call the
      runtime function (only once!) to transfer the mappings to the
      runtime?

      Best regards,
      Alexey Bataev

      27 июня 2019 г., в 15:44, Deepak Eachempati <deachempat at cray.com>
      написал(а):
                  Got it. Thanks.

                  -- Deepak

                  From: Li, Lingda [mailto:lli at bnl.gov]
                  Sent: Thursday, June 27, 2019 2:41 PM
                  To: Deepak Eachempati <deachempat at cray.com>;
                  Narayanaswamy, Ravi <ravi.narayanaswamy at intel.com>;
                  'Alexandre Eichenberger' <alexe at us.ibm.com>; 'Alexey
                  Bataev' <Alexey.Bataev at ibm.com>; Chapman, Barbara
                  (Contact) <barbara.chapman at stonybrook.edu>; Bobrovsky,
                  Konstantin S <konstantin.s.bobrovsky at intel.com>; 'Carlo
                  Bertolli' <cbertol at us.ibm.com>; 'Chan, SiuChi' <
                  siuchi.chan at amd.com>; Cownie, James H <
                  james.h.cownie at intel.com>; David Oehmke <doehmke at cray.com
                  >; 'Denny, Joel E.' <dennyje at ornl.gov>; Dmitriev, Serguei
                  N <serguei.n.dmitriev at intel.com>; Doerfert, Johannes
                  Rudolf <jdoerfert at anl.gov>; 'Ettore Tiotto' <
                  etiotto at ca.ibm.com>; 'fraggamuffin at gmail.com' <
                  fraggamuffin at gmail.com>; 'Gheorghe-Teod Bercea' <
                  Gheorghe-Teod.Bercea at ibm.com>; Hal Finkel <
                  hfinkel at anl.gov>; 'jbeyer at nvidia.com' <jbeyer at nvidia.com
                  >; 'Jeeva Paudel' <pjeeva01 at ca.ibm.com>; 'Jeff Heath' <
                  jrheath at ca.ibm.com>; Jeffrey Sandoval <sandoval at cray.com
                  >; Jones, Jeff C <jeff.c.jones at intel.com>; '
                  josem at udel.edu' <josem at udel.edu>; 'Kelvin Li' <
                  kli at ca.ibm.com>; 'Kevin K O'Brien' <caomhin at us.ibm.com>;
                  'khaldi.dounia at gmail.com' <khaldi.dounia at gmail.com>;
                  'Kotsifakou, Maria' <kotsifa2 at illinois.edu>; Krishnaiyer,
                  Rakesh <rakesh.krishnaiyer at intel.com>; Lieberman, Ron <
                  Ron.Lieberman at amd.com>; 'Lopez, Matthew Graham' <
                  lopezmg at ornl.gov>; 'lopezmg at ornl.org' <lopezmg at ornl.org>;
                  'Martin Kong' <martin.richard.kong at gmail.com>; 'Matt
                  Martineau' <m.martineau at bristol.ac.uk>; Menard, Lorri <
                  lorri.menard at intel.com>; Monteleone, Robert <
                  robert.monteleone at intel.com>; oscar at ornl.gov; Rao,
                  Premanand M <premanand.m.rao at intel.com>; Rice, Michael P
                  <michael.p.rice at intel.com>; Robichaux, Joseph <
                  joseph.robichaux at intel.com>; gregory.rodgers at amd.com;
                  Rokos, Georgios <georgios.rokos at intel.com>; '
                  samuel.antao at ibm.com' <samuel.antao at ibm.com>; 'Sarah
                  McNamara' <mcnamara at ca.ibm.com>; '
                  sergey.y.ostanevich at gmail.com' <
                  sergey.y.ostanevich at gmail.com>; 'Sergio Pino Gallardo' <
                  sergiop at udel.edu>; 'Sharif, Hashim' <
                  hsharif3 at illinois.edu>; Sjodin, Jan <Jan.Sjodin at amd.com>;
                  Sunil Shrestha <sshrestha at cray.com>; 'Sunita
                  Chandrasekaran' <schandra at udel.edu>; Tian, Xinmin <
                  xinmin.tian at intel.com>; Tianyi Zhang <tzhan18 at lsu.edu>; '
                  vadve at illinois.edu' <vadve at illinois.edu>; 'Wael Yehia' <
                  wyehia at ca.ibm.com>; 'Wang Chen' <wdchen at ca.ibm.com>;
                  Wilmarth, Terry L <terry.l.wilmarth at intel.com>
                  Subject: Re: Comparison of 2 schemes to implement OpenMP
                  5.0 declare mapper codegen

                  In the current scheme, all mappings within a mapper
                  function is done atomically by one thread. In the mapper
                  function of the example in the original email, <push>
                  will just push the mapping information into an internal
                  data structure. Once all mapping information is
                  available, the runtime will do the real mapping together.
                  For your example, the behavior is the same as the code
                  below:

                  ...
                  #pragma omp parallel num_threads(2)
                  {
                  if (omp_get_thread_num() == 0) {
                  #pragma omp target map(s.x, s.p[0:s.x])
                  {
                  for (int i = 0; i < s.x; i++) s.p[i] = i;
                  }
                  } else {
                  #pragma omp target map(other_data)
                  {
                  // work on other_data
                  }
                  }
                  ...

                  From: Deepak Eachempati <deachempat at cray.com>
                  Sent: Thursday, June 27, 2019 3:34 PM
                  To: Li, Lingda; Narayanaswamy, Ravi; 'Alexandre
                  Eichenberger'; 'Alexey Bataev'; Chapman, Barbara
                  (Contact); Bobrovsky, Konstantin S; 'Carlo Bertolli';
                  'Chan, SiuChi'; Cownie, James H; David Oehmke; 'Denny,
                  Joel E.'; Dmitriev, Serguei N; Doerfert, Johannes
                  Rudolf ; 'Ettore Tiotto'; 'fraggamuffin at gmail.com';
                  'Gheorghe-Teod Bercea'; Hal Finkel; 'jbeyer at nvidia.com';
                  'Jeeva Paudel'; 'Jeff Heath'; Jeffrey Sandoval; Jones,
                  Jeff C; 'josem at udel.edu'; 'Kelvin Li'; 'Kevin K O'Brien';
                  'khaldi.dounia at gmail.com'; 'Kotsifakou, Maria';
                  Krishnaiyer, Rakesh; Lieberman, Ron ; 'Lopez, Matthew
                  Graham'; 'lopezmg at ornl.org'; 'Martin Kong'; 'Matt
                  Martineau'; Menard, Lorri; Monteleone, Robert;
                  oscar at ornl.gov; Rao, Premanand M; Rice, Michael P;
                  Robichaux, Joseph; gregory.rodgers at amd.com; Rokos,
                  Georgios; 'samuel.antao at ibm.com'; 'Sarah McNamara'; '
                  sergey.y.ostanevich at gmail.com'; 'Sergio Pino Gallardo';
                  'Sharif, Hashim'; Sjodin, Jan ; Sunil Shrestha; 'Sunita
                  Chandrasekaran'; Tian, Xinmin; Tianyi Zhang; '
                  vadve at illinois.edu'; 'Wael Yehia'; 'Wang Chen'; Wilmarth,
                  Terry L
                  Subject: RE: Comparison of 2 schemes to implement OpenMP
                  5.0 declare mapper codegen

                  I was referring to something like this, where another
                  thread is not trying to map the same data:

                  #pragma omp declare mapper(S s) map(s.x) map(s.p[0:s.x])
                  S s;
                  ...
                  #pragma omp parallel num_threads(2)
                  {
                  if (omp_get_thread_num() == 0) {
                  #pragma omp target map(s)
                  {
                  for (int i = 0; i < s.x; i++) s.p[i] = i;
                  }
                  } else {
                  #pragma omp target map(other_data)
                  {
                  // work on other_data
                  }
                  }
                  ...

                  Since I believe you are mapping s.x and s.p as separate
                  map operations, it is possible that another thread could
                  map ‘other_data’ in between those two maps. If this
                  happens, will your implementation still ensure that s.x
                  and s.p are positioned at the right offsets with respect
                  to the same base address (&s)?

                  -- Deepak

                  From: Li, Lingda [mailto:lli at bnl.gov]
                  Sent: Thursday, June 27, 2019 2:26 PM
                  To: Deepak Eachempati <deachempat at cray.com>;
                  Narayanaswamy, Ravi <ravi.narayanaswamy at intel.com>;
                  'Alexandre Eichenberger' <alexe at us.ibm.com>; 'Alexey
                  Bataev' <Alexey.Bataev at ibm.com>; Chapman, Barbara
                  (Contact) <barbara.chapman at stonybrook.edu>; Bobrovsky,
                  Konstantin S <konstantin.s.bobrovsky at intel.com>; 'Carlo
                  Bertolli' <cbertol at us.ibm.com>; 'Chan, SiuChi' <
                  siuchi.chan at amd.com>; Cownie, James H <
                  james.h.cownie at intel.com>; David Oehmke <doehmke at cray.com
                  >; 'Denny, Joel E.' <dennyje at ornl.gov>; Dmitriev, Serguei
                  N <serguei.n.dmitriev at intel.com>; Doerfert, Johannes
                  Rudolf <jdoerfert at anl.gov>; 'Ettore Tiotto' <
                  etiotto at ca.ibm.com>; 'fraggamuffin at gmail.com' <
                  fraggamuffin at gmail.com>; 'Gheorghe-Teod Bercea' <
                  Gheorghe-Teod.Bercea at ibm.com>; Hal Finkel <
                  hfinkel at anl.gov>; 'jbeyer at nvidia.com' <jbeyer at nvidia.com
                  >; 'Jeeva Paudel' <pjeeva01 at ca.ibm.com>; 'Jeff Heath' <
                  jrheath at ca.ibm.com>; Jeffrey Sandoval <sandoval at cray.com
                  >; Jones, Jeff C <jeff.c.jones at intel.com>; '
                  josem at udel.edu' <josem at udel.edu>; 'Kelvin Li' <
                  kli at ca.ibm.com>; 'Kevin K O'Brien' <caomhin at us.ibm.com>;
                  'khaldi.dounia at gmail.com' <khaldi.dounia at gmail.com>;
                  'Kotsifakou, Maria' <kotsifa2 at illinois.edu>; Krishnaiyer,
                  Rakesh <rakesh.krishnaiyer at intel.com>; Lieberman, Ron <
                  Ron.Lieberman at amd.com>; 'Lopez, Matthew Graham' <
                  lopezmg at ornl.gov>; 'lopezmg at ornl.org' <lopezmg at ornl.org>;
                  'Martin Kong' <martin.richard.kong at gmail.com>; 'Matt
                  Martineau' <m.martineau at bristol.ac.uk>; Menard, Lorri <
                  lorri.menard at intel.com>; Monteleone, Robert <
                  robert.monteleone at intel.com>; oscar at ornl.gov; Rao,
                  Premanand M <premanand.m.rao at intel.com>; Rice, Michael P
                  <michael.p.rice at intel.com>; Robichaux, Joseph <
                  joseph.robichaux at intel.com>; gregory.rodgers at amd.com;
                  Rokos, Georgios <georgios.rokos at intel.com>; '
                  samuel.antao at ibm.com' <samuel.antao at ibm.com>; 'Sarah
                  McNamara' <mcnamara at ca.ibm.com>; '
                  sergey.y.ostanevich at gmail.com' <
                  sergey.y.ostanevich at gmail.com>; 'Sergio Pino Gallardo' <
                  sergiop at udel.edu>; 'Sharif, Hashim' <
                  hsharif3 at illinois.edu>; Sjodin, Jan <Jan.Sjodin at amd.com>;
                  Sunil Shrestha <sshrestha at cray.com>; 'Sunita
                  Chandrasekaran' <schandra at udel.edu>; Tian, Xinmin <
                  xinmin.tian at intel.com>; Tianyi Zhang <tzhan18 at lsu.edu>; '
                  vadve at illinois.edu' <vadve at illinois.edu>; 'Wael Yehia' <
                  wyehia at ca.ibm.com>; 'Wang Chen' <wdchen at ca.ibm.com>;
                  Wilmarth, Terry L <terry.l.wilmarth at intel.com>
                  Subject: Re: Comparison of 2 schemes to implement OpenMP
                  5.0 declare mapper codegen

                  When 2 threads try to concurrently map the same data, it
                  behaves the same as when 2 threads concurrently map the
                  same data using map clauses, and mappers don't introduce
                  extra considerations here. For instance, both threads use
                  #omp target enter data concurrently.

                  When 2 threads concurrently maps the same data, my
                  understanding based on the current code is, it will
                  create 2 copies of the same data, either copy is
                  correctly to use. It may have a problem when both copies
                  are mapped back if not synchronized correctly, but this
                  is a programming issue, not the responsibility of OpenMP.

                  Thanks,
                  Lingda Li

                  From: Deepak Eachempati <deachempat at cray.com>
                  Sent: Thursday, June 27, 2019 3:17 PM
                  To: Li, Lingda; Narayanaswamy, Ravi; 'Alexandre
                  Eichenberger'; 'Alexey Bataev'; Chapman, Barbara
                  (Contact); Bobrovsky, Konstantin S; 'Carlo Bertolli';
                  'Chan, SiuChi'; Cownie, James H; David Oehmke; 'Denny,
                  Joel E.'; Dmitriev, Serguei N; Doerfert, Johannes
                  Rudolf ; 'Ettore Tiotto'; 'fraggamuffin at gmail.com';
                  'Gheorghe-Teod Bercea'; Hal Finkel; 'jbeyer at nvidia.com';
                  'Jeeva Paudel'; 'Jeff Heath'; Jeffrey Sandoval; Jones,
                  Jeff C; 'josem at udel.edu'; 'Kelvin Li'; 'Kevin K O'Brien';
                  'khaldi.dounia at gmail.com'; 'Kotsifakou, Maria';
                  Krishnaiyer, Rakesh; Lieberman, Ron ; 'Lopez, Matthew
                  Graham'; 'lopezmg at ornl.org'; 'Martin Kong'; 'Matt
                  Martineau'; Menard, Lorri; Monteleone, Robert;
                  oscar at ornl.gov; Rao, Premanand M; Rice, Michael P;
                  Robichaux, Joseph; gregory.rodgers at amd.com; Rokos,
                  Georgios; 'samuel.antao at ibm.com'; 'Sarah McNamara'; '
                  sergey.y.ostanevich at gmail.com'; 'Sergio Pino Gallardo';
                  'Sharif, Hashim'; Sjodin, Jan ; Sunil Shrestha; 'Sunita
                  Chandrasekaran'; Tian, Xinmin; Tianyi Zhang; '
                  vadve at illinois.edu'; 'Wael Yehia'; 'Wang Chen'; Wilmarth,
                  Terry L
                  Subject: RE: Comparison of 2 schemes to implement OpenMP
                  5.0 declare mapper codegen

                  Thanks.

                  Is it possible for another thread to be concurrently
                  mapped something else while the maps from the mapper
                  function are taking place? If so, how do you guarantee
                  that the allocation for each component will get you the
                  right addresses in device memory? Sorry if this was
                  covered before and I missed it.

                  -- Deepak

                  From: Li, Lingda [mailto:lli at bnl.gov]
                  Sent: Thursday, June 27, 2019 2:08 PM
                  To: Deepak Eachempati <deachempat at cray.com>;
                  Narayanaswamy, Ravi <ravi.narayanaswamy at intel.com>;
                  'Alexandre Eichenberger' <alexe at us.ibm.com>; 'Alexey
                  Bataev' <Alexey.Bataev at ibm.com>; Chapman, Barbara
                  (Contact) <barbara.chapman at stonybrook.edu>; Bobrovsky,
                  Konstantin S <konstantin.s.bobrovsky at intel.com>; 'Carlo
                  Bertolli' <cbertol at us.ibm.com>; 'Chan, SiuChi' <
                  siuchi.chan at amd.com>; Cownie, James H <
                  james.h.cownie at intel.com>; David Oehmke <doehmke at cray.com
                  >; 'Denny, Joel E.' <dennyje at ornl.gov>; Dmitriev, Serguei
                  N <serguei.n.dmitriev at intel.com>; Doerfert, Johannes
                  Rudolf <jdoerfert at anl.gov>; 'Ettore Tiotto' <
                  etiotto at ca.ibm.com>; 'fraggamuffin at gmail.com' <
                  fraggamuffin at gmail.com>; 'Gheorghe-Teod Bercea' <
                  Gheorghe-Teod.Bercea at ibm.com>; Hal Finkel <
                  hfinkel at anl.gov>; 'jbeyer at nvidia.com' <jbeyer at nvidia.com
                  >; 'Jeeva Paudel' <pjeeva01 at ca.ibm.com>; 'Jeff Heath' <
                  jrheath at ca.ibm.com>; Jeffrey Sandoval <sandoval at cray.com
                  >; Jones, Jeff C <jeff.c.jones at intel.com>; '
                  josem at udel.edu' <josem at udel.edu>; 'Kelvin Li' <
                  kli at ca.ibm.com>; 'Kevin K O'Brien' <caomhin at us.ibm.com>;
                  'khaldi.dounia at gmail.com' <khaldi.dounia at gmail.com>;
                  'Kotsifakou, Maria' <kotsifa2 at illinois.edu>; Krishnaiyer,
                  Rakesh <rakesh.krishnaiyer at intel.com>; Lieberman, Ron <
                  Ron.Lieberman at amd.com>; 'Lopez, Matthew Graham' <
                  lopezmg at ornl.gov>; 'lopezmg at ornl.org' <lopezmg at ornl.org>;
                  'Martin Kong' <martin.richard.kong at gmail.com>; 'Matt
                  Martineau' <m.martineau at bristol.ac.uk>; Menard, Lorri <
                  lorri.menard at intel.com>; Monteleone, Robert <
                  robert.monteleone at intel.com>; oscar at ornl.gov; Rao,
                  Premanand M <premanand.m.rao at intel.com>; Rice, Michael P
                  <michael.p.rice at intel.com>; Robichaux, Joseph <
                  joseph.robichaux at intel.com>; gregory.rodgers at amd.com;
                  Rokos, Georgios <georgios.rokos at intel.com>; '
                  samuel.antao at ibm.com' <samuel.antao at ibm.com>; 'Sarah
                  McNamara' <mcnamara at ca.ibm.com>; '
                  sergey.y.ostanevich at gmail.com' <
                  sergey.y.ostanevich at gmail.com>; 'Sergio Pino Gallardo' <
                  sergiop at udel.edu>; 'Sharif, Hashim' <
                  hsharif3 at illinois.edu>; Sjodin, Jan <Jan.Sjodin at amd.com>;
                  Sunil Shrestha <sshrestha at cray.com>; 'Sunita
                  Chandrasekaran' <schandra at udel.edu>; Tian, Xinmin <
                  xinmin.tian at intel.com>; Tianyi Zhang <tzhan18 at lsu.edu>; '
                  vadve at illinois.edu' <vadve at illinois.edu>; 'Wael Yehia' <
                  wyehia at ca.ibm.com>; 'Wang Chen' <wdchen at ca.ibm.com>;
                  Wilmarth, Terry L <terry.l.wilmarth at intel.com>
                  Subject: Re: Comparison of 2 schemes to implement OpenMP
                  5.0 declare mapper codegen

                  Hi Deepak,

                  Yes, it handles this case. The first part of mapper
                  function (initially allocate space for the whole array)
                  is just an optimization, not required for correctness, as
                  suggested by you in an early discussion.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190628/7326d4a5/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190628/7326d4a5/attachment.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ecblank.gif
Type: image/gif
Size: 45 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190628/7326d4a5/attachment-0001.gif>